In recent years, grammars for a variety of formalisms (CFGs, CCGs, TAGs, LFGs) have been extracted from the Penn Treebank, for statistical parsing and realization. Compared to grammars that have been engineered by hand, these extracted grammars typically have broader coverage, but are lacking in depth of linguistic analysis. A question that has been largely unexplored is the extent to which one can successfully improve such extracted grammars by further grammar engineering.
In this seminar, we will explore methods for corpus-based grammar engineering, through readings and individual or group projects. At the beginning of the quarter, project teams and tasks will be arranged. During the quarter, project teams will present their ongoing work, starting with their task definitions and aims, continuing through intermediate milestones, and finishing with their empirical results, which they will then write up in a final project report and present in a poster session. Each person will also be expected to lead the discussion of one or two papers.
Projects are anticipated to involve one of the Penn Treebank (PTB), the English CCGbank (derived from the PTB), the German Tiger/Negra or Tüba-D/Z corpora, the Redwoods Treebank (for HPSG), or other treebanks. Possible topics include: making a CCGBank-extracted grammar more precise; methods for transforming the CCGbank to reflect more precise analyses; improving lexical coverage through lexical rules; evaluating the impact of more precise grammars on parsing or realization; comparing different evaluation measures; extracting a CCG from the Redwoods Treebank; and so forth. Students will also be welcome to propose possible projects, especially ones that would be synergistic with their own ongoing research. Projects using the OpenCCG library for parsing or realization are particularly encouraged.
The comp ling intro courses (684.1 and 2) or permission of the instructor.
Week 2: | Choose Project |
Weeks 3-4: | Present Project Plan |
Week 5: | Present Evaluation Plan |
Week 8: | Review Design / Code |
Week 10: | Present Results |
Finals Week: | Project Report due June 7 |
We'll use Carmen to schedule presentations and post advance questions on the readings. Carmen will also be used to provide local access to PDFs that are not readily available.
Note that the reading list represents a starting point for the papers we will read during the quarter, with the exact set of papers to be covered depending on student interest. More CCG papers can be found on the CCG site.
