Probabilistic Parsing

23
Probabilistic Parsing CSPP 56553 Artificial Intelligence February 25, 2004

description

Probabilistic Parsing. CSPP 56553 Artificial Intelligence February 25, 2004. Roadmap. Probabilistic CFGs Handling ambiguity – more likely analyses Adding probabilities Grammar Issues with probabilities Resolving issues Lexicalized grammars Independence assumptions. - PowerPoint PPT Presentation

Transcript of Probabilistic Parsing

Page 1: Probabilistic Parsing

Probabilistic Parsing

CSPP 56553Artificial IntelligenceFebruary 25, 2004

Page 2: Probabilistic Parsing

Roadmap

• Probabilistic CFGs– Handling ambiguity – more likely analyses– Adding probabilities

• Grammar• Issues with probabilities

– Resolving issues• Lexicalized grammars

– Independence assumptions

Page 3: Probabilistic Parsing

Representation:Context-free Grammars

• CFGs: 4-tuple– A set of terminal symbols: Σ– A set of non-terminal symbols: N– A set of productions P: of the form A -> α

• Where A is a non-terminal and α in (Σ U N)*– A designated start symbol S

• L = W|w in Σ* and S=>*w– Where S=>*w means S derives w by some seq

Page 4: Probabilistic Parsing

Representation:Context-free Grammars

• Partial example– Σ: the, cat, dog, bit, bites, man– N: NP, VP, AdjP, Nominal– P: S-> NP VP; NP -> Det Nom; Nom-> N Nom|N – S S

NP VP

Det Nom V NP

N Det Nom

N

The dog bit the man

Page 5: Probabilistic Parsing

Parsing Goals

• Accepting:– Legal string in language?

• Formally: rigid• Practically: degrees of acceptability

• Analysis– What structure produced the string?

• Produce one (or all) parse trees for the string

Page 6: Probabilistic Parsing

Parsing Search Strategies

• Top-down constraints:– All analyses must start with start symbol: S– Successively expand non-terminals with RHS– Must match surface string

• Bottom-up constraints:– Analyses start from surface string– Identify POS – Match substring of ply with RHS to LHS– Must ultimately reach S

Page 7: Probabilistic Parsing

Integrating Strategies

• Left-corner parsing:– Top-down parsing with bottom-up constraints– Begin at start symbol– Apply depth-first search strategy

• Expand leftmost non-terminal• Parser can not consider rule if current input can

not be first word on left edge of some derivation• Tabulate all left-corners for a non-terminal

Page 8: Probabilistic Parsing

Issues

• Left recursion– If the first non-terminal of RHS is recursive ->

• Infinite path to terminal node• Could rewrite

• Ambiguity: pervasive (costly)– Lexical (POS) & structural

• Attachment, coordination, np bracketing

• Repeated subtree parsing– Duplicate subtrees with other failures

Page 9: Probabilistic Parsing

Earley Parsing

• Avoid repeated work/recursion problem– Dynamic programming

• Store partial parses in “chart”– Compactly encodes ambiguity

• O(N^3)

• Chart entries:– Subtree for a single grammar rule– Progress in completing subtree– Position of subtree wrt input

Page 10: Probabilistic Parsing

Earley Algorithm

• Uses dynamic programming to do parallel top-down search in (worst case) O(N3) time

• First, left-to-right pass fills out a chart with N+1 states– Think of chart entries as sitting between words in

the input string keeping track of states of the parse at these positions

– For each word position, chart contains set of states representing all partial parse trees generated to date. E.g. chart[0] contains all partial parse trees generated at the beginning of the sentence

Page 11: Probabilistic Parsing

Representation:Probabilistic Context-free

Grammars• PCFGs: 5-tuple

– A set of terminal symbols: Σ– A set of non-terminal symbols: N– A set of productions P: of the form A -> α

• Where A is a non-terminal and α in (Σ U N)*– A designated start symbol S– A function assigning probabilities to rules: D

• L = W|w in Σ* and S=>*w– Where S=>*w means S derives w by some seq

Page 12: Probabilistic Parsing

Parse Ambiguity

• Two parse trees

S

NP VP

N V NP PPDet N P NP

Det NI saw the man with the duck

S

NP VP

N V NP

NP PP Det N P NP

Det N

I saw the man with the duck

Page 13: Probabilistic Parsing

Small Probabilistic Grammar

NP -> NNP -> Det NNP -> Det Adj NNP -> NP PP

0.65

0.10

VP -> VVP -> V NPVP -> V NP PP

0.45

0.45

0.10

S -> NP VPS -> S conj S

0.85

0.15

0.05

1.0PP -> P NP

Page 14: Probabilistic Parsing

Parse Probabilities

– T(ree),S(entence),n(ode),R(ule)– T1 = 0.85*0.2*0.1*0.65*1*0.65 = 0.007– T2 = 0.85*0.2*0.45*0.05*0.65*1*0.65 = 0.003

• Select T1• Best systems achieve 92-93% accuracy

Tn

nrpSTP ))((),(

Page 15: Probabilistic Parsing

Issues with PCFGs

• Non-local dependencies– Rules are context-free; language isn’t

• Example: – Subject vs non-subject NPs

• Subject: 90% pronouns (SWB)• NP-> Pron vs NP-> Det Nom: doesn’t know if subj

• Lexical context:– Verb subcategorization:

• Send NP PP vs Saw NP PP

– One approach: lexicalization

Page 16: Probabilistic Parsing

Probabilistic Lexicalized CFGs

• Key notion: “head”– Each non-terminal assoc w/lexical head

• E.g. verb with verb phrase, noun with noun phrase– Each rule must identify RHS element as head

• Heads propagate up tree– Conceptually like adding 1 rule per head value

• VP(dumped) -> VBD(dumped)NP(sacks)PP(into)• VP(dumped) -> VBD(dumped)NP(cats)PP(into)

Page 17: Probabilistic Parsing

PLCFG with head featuresS(dumped)

NP(workers) VP(dumped)

NNS(workers) VBD(dumped) NP(sacks) PP(into)

NNS(sacks) P(into) NP(bin)

DT(a) NN(bin)

Workers dumped sacks into a bin

Page 18: Probabilistic Parsing

PLCFGs

• Issue: Too many rules– No way to find corpus with enough examples

• (Partial) Solution: Independence assumed– Condition rule on

• Category of LHS, head

– Condition head on• Category of LHS and parent’s head

Tn

nmhnnhpnhnnrpSTP )))((,|)((*))(.|)((),(

Page 19: Probabilistic Parsing

Disambiguation ExampleS(dumped)

NP(workers) VP(dumped)

NNS(workers) VBD(dumped) NP(sacks) PP(into)

NNS(sacks) P(into) NP(bin)

Workers dumped sacks into a bin

NP(sacks)

Dt(a) NN(bin)

Page 20: Probabilistic Parsing

Disambiguation Example II

67.09/6

))(())((

),|(

dumpedVPC

VBDNPPdumpedVPCdumpedVPVBDNPPPVPP

09/0

))(())((),|(

dumpedVPCNPVBDdumpedVPC

dumpedVPVBDNPVPp

22.09/2

...)...)(()..)(...)((

),|(

PPdumpedXCinPPdumpedXC

dumpedPPinp

0/0

...)...)(()...)(...)((

),|(

PPsacksXCinPPsacksXC

sacksPPinp

Page 21: Probabilistic Parsing

Evaluation• Compare to “Gold Standard” manual parse

– Correct if constituent has same start, end, label• Three measures:

– Labeled Recall:• # correct constituents in candidate parse of s------------------------------------------------------------

# correct constituents in treebank parse of c– Labeled Precision:

• # correct constituents in candidate parse of s------------------------------------------------------------

# total constituents in candidate parse of c

– Cross-brackets: (A (B C)) vs ((A B) C)• Standard: 90%,90%,1%

Page 22: Probabilistic Parsing

Dependency Grammars

• Pure lexical dependency– Relate words based on binary syntactic relns– No constituency, phrase structure rules

• Why?– Better for languages with flexible word orders– Abstracts away from surface form/order

• Root node + word nodes– Link with dependency relations – fixed set

• E.g. subj, obj, dat, loc, attr, mode, comp,…

Page 23: Probabilistic Parsing

Dependency Grammar Example

<Root>

Dumped

Workers Sacks Into

bin

a

Main:

objsubjloc

pcomp

det