I studied computational linguistics at Grinnell.
During that time, I worked on a project (mentored by Dr John D. Stone), to generate human-readable texts from machine-readable biographical databases. We made some good progress (you can view the code on github), but parts of our my plan were (putting it mildly) ludicrously ambitious. The plan was something like this:
- Architect the project.
- Find a freely-available database of biographical data in RDF (hah, fat chance!), or failing that, create one from a freely-available database in a different format.
- Select only the subset of an RDF database relevant to a given person.
- Create a schema (!) for describing ANY HUMAN LANGUAGE. (!!!)
- Specify how to map (?) from RDF statements to grammatical structures and words (?!) in any language described in the schema created in step 4. (???)
I wound up getting through about steps 1-3, and making some fruitless stabs at step 4. And then the semester ended and I found other ways to spend my late nights.
Well, recently I've been smacking my head against steps 4 and 5 again, to see what I can come up with. The fruits of my labors are the Grammar Description Format. I don't pretend that this format can describe any human language; polysythesis or complicated movement phenomena are almost certain to have at best clumsy expression. But it describes a fair subset of languages.
That document attempts to capture the formal semantics of a schema for describing human languages. Its syntax (although no BNF grammar is given) is heavily influenced by the Turtle format for serializing RDF data, but it is not a dialect, extension, or schema for RDF. GDF is also influenced by the fascinating work with "Link Grammar" done at Carnegie Mellon University, although it does not explicitly use the framework or the terminology of that work. GDF is its own entity, and is (at the time of writing), solely my work.