Penn Treebank (PTB) - Syntactically annotated corpus (phrase structure)
- Contains 1 miilion words of Wall Street Journal sentences marked up with syntactic structure.
- PropBank
- PTB with some grammatical relations made explicit
Unification - Mechanism needed to pass and check constraints.
- Constraints, syntactic and semantic:
- Subject-verb agreement
- S NP VP
- the boy reads / the boys read / * the boys reads
- Subject/Auxiliary inversion: (Yes-no-question)
- Selectional restrictions:
- Need a mechanism to encode these constraints
- Refine the non-terminal set to encode these constraints.
- S 3sgAux 3sgNP VP ; 3sgAux does | has …
- S Non3sgAux Non3sgNP VP; Non3sgAux do | have | can
- We need to split the NP rule into the 3sgNP and Non3sgNP.
- Size of the grammar grows;
- can we factor these constraints out of the structure of the rules?
Unification – contd. - NP.number = VP.subj.agr.number
- NP.person = VP.subj.agr.person
- VP.number = V.subj.agr.number
- VP.person = V.subj.agr.person
Do'stlaringiz bilan baham: |