Looking to get a feel of NLP, around December 2007 I joined Jonathan Berant, a PhD student at TAU, on a project in NLP under the supervision of Prof. Eytan Ruppin.
We looked into populating given linguistic dependency structures using a probability function based on
the part of speech, the head word and the whether the head was to the left or right. We learned the probability structure from an annotated corpus (sentences with parts of speech annotation and dependency structures). Through this we hoped to test the relationship between the various parameters that we deal with. We started by conducting a feasibility test using a quick sketch of the algorithm implemented in Python. For this phase we used ATIS and WSJ10 corpora divided into into train and test sets. In addition we generated sentences and evaluated them manually.
Although the algorithm was fast, the results were lacking. The sentences we generated showed some repeating faults and it seemed that the probability function lacked critical information to create accurate sentences. A possible step that could have improved the result was to add more variables to the dependent probability, such as type of relation to the head word, the part of speech or word of the grandpa or the children’s part of speech. However, we didn’t get to test it.
A few sources that we looked to while working on this:
Behavioral and computational aspects of language and its acquisition (Edelman, Waterfall)
Unsupervised learning of natural languages (Solan, Horn et al)
From ConText to Grammar : a step towards practical probabilistic context free grammar inference (Sandbank, Edelman, Ruppin)
Inducing syntactic categories by context distribution clustering (Clark)
The Python code and results are available upon request.