Download - uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Transcript
Page 1: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Bitext parsingWilker Aziz

3/5/17

Page 2: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Context-Free GrammarsA CFG grammar G is denoted by

• a finite set of nonterminal symbols N

• a finite set of terminal symbols Σ with Σ ∩ N = ∅

• a finite set R of rules of the form X ➝ α where

• X ∈ N and α ∈ (Σ ∪ N)*

• S ∈ N a distinguished start symbol

Let ε denote an empty string

2

Page 3: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Example CFGS ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

3

Page 4: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Generative DeviceLeft-most derivation

• sequence of strings s1 ... sn

• s1 = S

• sn ∈ Σ*

• si≥2 derived from si-1 by picking the left-most nonterminal X

• replacing it by some α such that X ➝ α ∈ R

4

Page 5: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Example of Derivation

5

Substitutions1 = S S ➝ NP VPs2 = NP VP NP ➝ DT NNs3 = DT NN VP DT ➝ thes4 = the NN VP NN ➝ mans5 = the man VP VP ➝ Vis6 = the man Vi Vi ➝ sleepss7 = the man sleepss7 = S ⇒* the man sleeps

Page 6: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Example of Generation

6

Page 7: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

S

Example of Generation

6

Page 8: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

S

NP VP

Example of Generation

6

Page 9: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

S

NP VP

DT NN

Example of Generation

6

Page 10: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

S

NP VP

DT NN

the

Example of Generation

6

Page 11: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

S

NP VP

DT NN

the man

Example of Generation

6

Page 12: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

S

NP VP

DT NN

the man

Vi

Example of Generation

6

Page 13: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

S

NP VP

DT NN

the man

Vi

sleeps

Example of Generation

6

Page 14: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Example of Recognition

7

Page 15: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Example of Recognition

The man saw the dog

7

Page 16: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Example of Recognition

The man saw the dog

DT

7

Page 17: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Example of Recognition

The man saw the dog

DT NN

7

Page 18: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Example of Recognition

The man saw the dog

DT NN

Vt

7

Page 19: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Example of Recognition

The man saw the dog

DT NN

Vt

DT

7

Page 20: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Example of Recognition

The man saw the dog

DT NN

Vt

DT NN

7

Page 21: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Example of Recognition

The man saw the dog

DT NN

Vt

DT NN

NP

7

Page 22: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Example of Recognition

The man saw the dog

DT NN

Vt

DT NN

NP

NP

7

Page 23: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Example of Recognition

The man saw the dog

DT NN

Vt

DT NN

NP

NP

VP

7

Page 24: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Example of Recognition

The man saw the dog

DT NN

Vt

DT NN

NP

NP

VP

S

7

Page 25: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Language

A string s = s1 ... sn is generated/accepted by G if

S ⇒* s

⇒* denotes a sequence of rule applications

Language of G L(G) = {s: S ⇒* s} ⊆ Σ*

8

Page 26: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Chomsky Normal Form

Every CFG is weakly equivalent to another such that

• X ➝ Y Z where X, Y, Z ∈ N

• X ➝ w where w ∈ Σ

• and possibly S ➝ ε

9

[Hopcroft and Ullman, 1979]

Page 27: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Parsing as Deduction

10

Page 28: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Parsing as DeductionDeductive process to prove claims about grammaticality

[Shieber et al., 1995]

10

Page 29: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Parsing as DeductionDeductive process to prove claims about grammaticality

[Shieber et al., 1995]

• focus on strategy rather than implementation

10

Page 30: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Parsing as DeductionDeductive process to prove claims about grammaticality

[Shieber et al., 1995]

• focus on strategy rather than implementation

• soundness/completeness easier to prove

10

Page 31: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Parsing as DeductionDeductive process to prove claims about grammaticality

[Shieber et al., 1995]

• focus on strategy rather than implementation

• soundness/completeness easier to prove

• complexity determined by inspection

10

Page 32: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Parsing as DeductionDeductive process to prove claims about grammaticality

[Shieber et al., 1995]

• focus on strategy rather than implementation

• soundness/completeness easier to prove

• complexity determined by inspection

• dynamic program follows directly

10

Page 33: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Parsing as DeductionDeductive process to prove claims about grammaticality

[Shieber et al., 1995]

• focus on strategy rather than implementation

• soundness/completeness easier to prove

• complexity determined by inspection

• dynamic program follows directly

• generality

10

Page 34: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Deductive systemsItem: a statement / intermediate sound result

• formula or schemata expressed with variables

Inference rule: statement derived from existing items

• where Ai and B are items

• Ai are called antecedents

• B is called consequent

A1 . . . Am

B(condition)

11

Page 35: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Deductive programAxioms: trivial items

• do not depend on previous statements

Goal: states that a proof exists

Proof:

• start from axioms

• exhaustively deduce items

• never twice under the same premises

• accept if goal is proven

12

Page 36: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce ExampleInput: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 37: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement Queue

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 38: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 39: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 40: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1Shift: [1] 2 [the•,1] 2

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 41: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1Shift: [1] 2 [the•,1] 2

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 42: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1Shift: [1] 2 [the•,1] 2Reduce: [2] DT ➝ the 3 [DT•,1] 3

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 43: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1Shift: [1] 2 [the•,1] 2Reduce: [2] DT ➝ the 3 [DT•,1] 3

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 44: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1Shift: [1] 2 [the•,1] 2Reduce: [2] DT ➝ the 3 [DT•,1] 3Shift: [3] 4 [DT man •, 2] 4

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 45: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1Shift: [1] 2 [the•,1] 2Reduce: [2] DT ➝ the 3 [DT•,1] 3Shift: [3] 4 [DT man •, 2] 4

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 46: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1Shift: [1] 2 [the•,1] 2Reduce: [2] DT ➝ the 3 [DT•,1] 3Shift: [3] 4 [DT man •, 2] 4Reduce: [4] NN ➝ man 5 [DT NN •, 2] 5

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 47: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1Shift: [1] 2 [the•,1] 2Reduce: [2] DT ➝ the 3 [DT•,1] 3Shift: [3] 4 [DT man •, 2] 4Reduce: [4] NN ➝ man 5 [DT NN •, 2] 5

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 48: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1Shift: [1] 2 [the•,1] 2Reduce: [2] DT ➝ the 3 [DT•,1] 3Shift: [3] 4 [DT man •, 2] 4Reduce: [4] NN ➝ man 5 [DT NN •, 2] 5Reduce: [5] NP ➝ DT NN 6 [NP •, 2] 6

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 49: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1Shift: [1] 2 [the•,1] 2Reduce: [2] DT ➝ the 3 [DT•,1] 3Shift: [3] 4 [DT man •, 2] 4Reduce: [4] NN ➝ man 5 [DT NN •, 2] 5Reduce: [5] NP ➝ DT NN 6 [NP •, 2] 6

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 50: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1Shift: [1] 2 [the•,1] 2Reduce: [2] DT ➝ the 3 [DT•,1] 3Shift: [3] 4 [DT man •, 2] 4Reduce: [4] NN ➝ man 5 [DT NN •, 2] 5Reduce: [5] NP ➝ DT NN 6 [NP •, 2] 6Shift: [6] 7 [NP sleeps •, 3] 7

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 51: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1Shift: [1] 2 [the•,1] 2Reduce: [2] DT ➝ the 3 [DT•,1] 3Shift: [3] 4 [DT man •, 2] 4Reduce: [4] NN ➝ man 5 [DT NN •, 2] 5Reduce: [5] NP ➝ DT NN 6 [NP •, 2] 6Shift: [6] 7 [NP sleeps •, 3] 7

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 52: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1Shift: [1] 2 [the•,1] 2Reduce: [2] DT ➝ the 3 [DT•,1] 3Shift: [3] 4 [DT man •, 2] 4Reduce: [4] NN ➝ man 5 [DT NN •, 2] 5Reduce: [5] NP ➝ DT NN 6 [NP •, 2] 6Shift: [6] 7 [NP sleeps •, 3] 7Reduce: [7] Vi ➝ sleeps 8 [NP Vi •, 3] 8

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 53: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1Shift: [1] 2 [the•,1] 2Reduce: [2] DT ➝ the 3 [DT•,1] 3Shift: [3] 4 [DT man •, 2] 4Reduce: [4] NN ➝ man 5 [DT NN •, 2] 5Reduce: [5] NP ➝ DT NN 6 [NP •, 2] 6Shift: [6] 7 [NP sleeps •, 3] 7Reduce: [7] Vi ➝ sleeps 8 [NP Vi •, 3] 8

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 54: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1Shift: [1] 2 [the•,1] 2Reduce: [2] DT ➝ the 3 [DT•,1] 3Shift: [3] 4 [DT man •, 2] 4Reduce: [4] NN ➝ man 5 [DT NN •, 2] 5Reduce: [5] NP ➝ DT NN 6 [NP •, 2] 6Shift: [6] 7 [NP sleeps •, 3] 7Reduce: [7] Vi ➝ sleeps 8 [NP Vi •, 3] 8Reduce: [8] VP ➝ Vi 9 [NP VP •, 3] 9

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 55: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1Shift: [1] 2 [the•,1] 2Reduce: [2] DT ➝ the 3 [DT•,1] 3Shift: [3] 4 [DT man •, 2] 4Reduce: [4] NN ➝ man 5 [DT NN •, 2] 5Reduce: [5] NP ➝ DT NN 6 [NP •, 2] 6Shift: [6] 7 [NP sleeps •, 3] 7Reduce: [7] Vi ➝ sleeps 8 [NP Vi •, 3] 8Reduce: [8] VP ➝ Vi 9 [NP VP •, 3] 9

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 56: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1Shift: [1] 2 [the•,1] 2Reduce: [2] DT ➝ the 3 [DT•,1] 3Shift: [3] 4 [DT man •, 2] 4Reduce: [4] NN ➝ man 5 [DT NN •, 2] 5Reduce: [5] NP ➝ DT NN 6 [NP •, 2] 6Shift: [6] 7 [NP sleeps •, 3] 7Reduce: [7] Vi ➝ sleeps 8 [NP Vi •, 3] 8Reduce: [8] VP ➝ Vi 9 [NP VP •, 3] 9Reduce: [9] S ➝ NP VP 10 [S •, 3] 10

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 57: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-Reduce Example

Rule Condition Statement QueueAxiom 1 [•,0] 1Shift: [1] 2 [the•,1] 2Reduce: [2] DT ➝ the 3 [DT•,1] 3Shift: [3] 4 [DT man •, 2] 4Reduce: [4] NN ➝ man 5 [DT NN •, 2] 5Reduce: [5] NP ➝ DT NN 6 [NP •, 2] 6Shift: [6] 7 [NP sleeps •, 3] 7Reduce: [7] Vi ➝ sleeps 8 [NP Vi •, 3] 8Reduce: [8] VP ➝ Vi 9 [NP VP •, 3] 9Reduce: [9] S ➝ NP VP 10 [S •, 3] 10GOAL: [10] ∅

Input: the man sleeps

13

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Page 58: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Shift-ReduceInput: G and w1 ... wn Item form: [α•, j]

asserts that α ⇒* w1 ... wj or that α wj+1 ... wn ⇒* w1 ... wj

Axiom: [•,0]

Goal: [S•,n]

Scan (shift)asserts that α wj+1 ⇒* w1 ... wj wj+1

Complete (reduce)asserts that αB ⇒* w1 ... wj

Shift[↵•, j]

[↵wj+1, j + 1]

Reduce[↵�•, j][↵B•, j] B ! � 2 R

14

Page 59: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down recognitionInput: G and w1 ... wn Item form: [•β, j]

asserts that S ⇒* w1 ... wj β

Axiom: [•S,0]

Goal: [•,n]

Scanasserts that S ⇒* w1 ... wj wj+1 β

Predictasserts that S ⇒* w1 ... wj B β

Predict•B �, j

[•� �, j]B ! � 2 R

Scan[•wj+1�, j]

[•�, j + 1]

15

Page 60: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 61: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement Queue

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 62: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 63: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 64: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 65: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 66: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 67: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 68: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3Predict: [3] DT ➝ the 4 [• the NN VP,0] 4

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 69: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3Predict: [3] DT ➝ the 4 [• the NN VP,0] 4

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 70: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3Predict: [3] DT ➝ the 4 [• the NN VP,0] 4Scan: [4] 5 [• NN VP,1] 5

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 71: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3Predict: [3] DT ➝ the 4 [• the NN VP,0] 4Scan: [4] 5 [• NN VP,1] 5

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 72: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3Predict: [3] DT ➝ the 4 [• the NN VP,0] 4Scan: [4] 5 [• NN VP,1] 5Predict: [5] NN ➝ man 6 [• man VP, 1] 6

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 73: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3Predict: [3] DT ➝ the 4 [• the NN VP,0] 4Scan: [4] 5 [• NN VP,1] 5Predict: [5] NN ➝ man 6 [• man VP, 1] 6

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 74: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3Predict: [3] DT ➝ the 4 [• the NN VP,0] 4Scan: [4] 5 [• NN VP,1] 5Predict: [5] NN ➝ man 6 [• man VP, 1] 6Scan: [6] 7 [• VP, 2] 7

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 75: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3Predict: [3] DT ➝ the 4 [• the NN VP,0] 4Scan: [4] 5 [• NN VP,1] 5Predict: [5] NN ➝ man 6 [• man VP, 1] 6Scan: [6] 7 [• VP, 2] 7

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 76: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3Predict: [3] DT ➝ the 4 [• the NN VP,0] 4Scan: [4] 5 [• NN VP,1] 5Predict: [5] NN ➝ man 6 [• man VP, 1] 6Scan: [6] 7 [• VP, 2] 7

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 77: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3Predict: [3] DT ➝ the 4 [• the NN VP,0] 4Scan: [4] 5 [• NN VP,1] 5Predict: [5] NN ➝ man 6 [• man VP, 1] 6Scan: [6] 7 [• VP, 2] 7Predict: [7] VP ➝ Vi 8 [• Vi, 2] 8, 9

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 78: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3Predict: [3] DT ➝ the 4 [• the NN VP,0] 4Scan: [4] 5 [• NN VP,1] 5Predict: [5] NN ➝ man 6 [• man VP, 1] 6Scan: [6] 7 [• VP, 2] 7Predict: [7] VP ➝ Vi 8 [• Vi, 2] 8, 9

VP ➝ Vt NP 9 [• Vt NP, 2]

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 79: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3Predict: [3] DT ➝ the 4 [• the NN VP,0] 4Scan: [4] 5 [• NN VP,1] 5Predict: [5] NN ➝ man 6 [• man VP, 1] 6Scan: [6] 7 [• VP, 2] 7Predict: [7] VP ➝ Vi 8 [• Vi, 2] 8, 9

VP ➝ Vt NP 9 [• Vt NP, 2]

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 80: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3Predict: [3] DT ➝ the 4 [• the NN VP,0] 4Scan: [4] 5 [• NN VP,1] 5Predict: [5] NN ➝ man 6 [• man VP, 1] 6Scan: [6] 7 [• VP, 2] 7Predict: [7] VP ➝ Vi 8 [• Vi, 2] 8, 9

VP ➝ Vt NP 9 [• Vt NP, 2]Predict: [8] Vi ➝ sleeps 10 [• sleeps, 2] 9, 10

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 81: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3Predict: [3] DT ➝ the 4 [• the NN VP,0] 4Scan: [4] 5 [• NN VP,1] 5Predict: [5] NN ➝ man 6 [• man VP, 1] 6Scan: [6] 7 [• VP, 2] 7Predict: [7] VP ➝ Vi 8 [• Vi, 2] 8, 9

VP ➝ Vt NP 9 [• Vt NP, 2]Predict: [8] Vi ➝ sleeps 10 [• sleeps, 2] 9, 10

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 82: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3Predict: [3] DT ➝ the 4 [• the NN VP,0] 4Scan: [4] 5 [• NN VP,1] 5Predict: [5] NN ➝ man 6 [• man VP, 1] 6Scan: [6] 7 [• VP, 2] 7Predict: [7] VP ➝ Vi 8 [• Vi, 2] 8, 9

VP ➝ Vt NP 9 [• Vt NP, 2]Predict: [8] Vi ➝ sleeps 10 [• sleeps, 2] 9, 10 [9] 10

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 83: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3Predict: [3] DT ➝ the 4 [• the NN VP,0] 4Scan: [4] 5 [• NN VP,1] 5Predict: [5] NN ➝ man 6 [• man VP, 1] 6Scan: [6] 7 [• VP, 2] 7Predict: [7] VP ➝ Vi 8 [• Vi, 2] 8, 9

VP ➝ Vt NP 9 [• Vt NP, 2]Predict: [8] Vi ➝ sleeps 10 [• sleeps, 2] 9, 10 [9] 10

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 84: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3Predict: [3] DT ➝ the 4 [• the NN VP,0] 4Scan: [4] 5 [• NN VP,1] 5Predict: [5] NN ➝ man 6 [• man VP, 1] 6Scan: [6] 7 [• VP, 2] 7Predict: [7] VP ➝ Vi 8 [• Vi, 2] 8, 9

VP ➝ Vt NP 9 [• Vt NP, 2]Predict: [8] Vi ➝ sleeps 10 [• sleeps, 2] 9, 10 [9] 10Scan: [10] 11 [•, 3] 11

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 85: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Top-Down Example

Rule Condition Statement QueueAxiom 1 [• S, 0] 1Predict: [1] S ➝ NP VP 2 [• NP VP, 0] 2Predict: [2] NP ➝ DT NN 3 [• DT NN VP, 0] 3Predict: [3] DT ➝ the 4 [• the NN VP,0] 4Scan: [4] 5 [• NN VP,1] 5Predict: [5] NN ➝ man 6 [• man VP, 1] 6Scan: [6] 7 [• VP, 2] 7Predict: [7] VP ➝ Vi 8 [• Vi, 2] 8, 9

VP ➝ Vt NP 9 [• Vt NP, 2]Predict: [8] Vi ➝ sleeps 10 [• sleeps, 2] 9, 10 [9] 10Scan: [10] 11 [•, 3] 11GOAL: [11] ∅

16

S ➝ NP VPVP ➝ ViVP ➝ Vt NPVP ➝ VP PPNP ➝ DT NNNP ➝ NP PPPP ➝ IN NPVi ➝ sleepsVt ➝ sawNN ➝ manNN ➝ dogNN ➝ telescopeDT ➝ theIN ➝ with

Input: the man sleeps

Page 86: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY - CNF onlyInput: G and s = w1 ... wn Item form: [i, X, j]

asserts that X ⇒* wi+1 ... wj

Axioms: [i, X, i+1] X ➝ wi ∈ R

Goal: [0, S, n]

Merge: asserts that wi+1 ... wk wk+1 ... wj ⇒* wi+1 ... wj

[i, A, k] [k,B, j]

[i, C, j]C ! AB 2 R

17

Page 87: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY ExampleInput: the man saw the dog

18

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 88: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY Example

Rule Condition Statement Queue Passive

Input: the man saw the dog

18

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 89: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY Example

Rule Condition Statement Queue PassiveAxiom DT ➝ the 1 [0, DT, 1] 1

Input: the man saw the dog

18

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 90: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY Example

Rule Condition Statement Queue PassiveAxiom DT ➝ the 1 [0, DT, 1] 1

NN ➝ man 2 [1, NN, 2] 1, 2

Input: the man saw the dog

18

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 91: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY Example

Rule Condition Statement Queue PassiveAxiom DT ➝ the 1 [0, DT, 1] 1

NN ➝ man 2 [1, NN, 2] 1, 2Vt ➝ saw 3 [2, Vt, 3] 1, 2, 3

Input: the man saw the dog

18

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 92: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY Example

Rule Condition Statement Queue PassiveAxiom DT ➝ the 1 [0, DT, 1] 1

NN ➝ man 2 [1, NN, 2] 1, 2Vt ➝ saw 3 [2, Vt, 3] 1, 2, 3DT ➝ the 4 [3, DT, 4] 1, 2, 3, 4

Input: the man saw the dog

18

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 93: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY Example

Rule Condition Statement Queue PassiveAxiom DT ➝ the 1 [0, DT, 1] 1

NN ➝ man 2 [1, NN, 2] 1, 2Vt ➝ saw 3 [2, Vt, 3] 1, 2, 3DT ➝ the 4 [3, DT, 4] 1, 2, 3, 4NN ➝ dog 5 [4, NN, 5] 1, 2, 3, 4, 5

Input: the man saw the dog

18

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 94: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY Example

Rule Condition Statement Queue PassiveAxiom DT ➝ the 1 [0, DT, 1] 1

NN ➝ man 2 [1, NN, 2] 1, 2Vt ➝ saw 3 [2, Vt, 3] 1, 2, 3DT ➝ the 4 [3, DT, 4] 1, 2, 3, 4NN ➝ dog 5 [4, NN, 5] 1, 2, 3, 4, 5

2, 3, 4, 5 1

Input: the man saw the dog

18

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 95: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY Example

Rule Condition Statement Queue PassiveAxiom DT ➝ the 1 [0, DT, 1] 1

NN ➝ man 2 [1, NN, 2] 1, 2Vt ➝ saw 3 [2, Vt, 3] 1, 2, 3DT ➝ the 4 [3, DT, 4] 1, 2, 3, 4NN ➝ dog 5 [4, NN, 5] 1, 2, 3, 4, 5

2, 3, 4, 5 1Merge: [1][2] NP ➝ DT NN 6 [0, NP, 2] 3, 4, 5, 6 2

Input: the man saw the dog

18

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 96: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY Example

Rule Condition Statement Queue PassiveAxiom DT ➝ the 1 [0, DT, 1] 1

NN ➝ man 2 [1, NN, 2] 1, 2Vt ➝ saw 3 [2, Vt, 3] 1, 2, 3DT ➝ the 4 [3, DT, 4] 1, 2, 3, 4NN ➝ dog 5 [4, NN, 5] 1, 2, 3, 4, 5

2, 3, 4, 5 1Merge: [1][2] NP ➝ DT NN 6 [0, NP, 2] 3, 4, 5, 6 2

4, 5, 6 3

Input: the man saw the dog

18

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 97: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY Example

Rule Condition Statement Queue PassiveAxiom DT ➝ the 1 [0, DT, 1] 1

NN ➝ man 2 [1, NN, 2] 1, 2Vt ➝ saw 3 [2, Vt, 3] 1, 2, 3DT ➝ the 4 [3, DT, 4] 1, 2, 3, 4NN ➝ dog 5 [4, NN, 5] 1, 2, 3, 4, 5

2, 3, 4, 5 1Merge: [1][2] NP ➝ DT NN 6 [0, NP, 2] 3, 4, 5, 6 2

4, 5, 6 35, 6 4

Input: the man saw the dog

18

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 98: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY Example

Rule Condition Statement Queue PassiveAxiom DT ➝ the 1 [0, DT, 1] 1

NN ➝ man 2 [1, NN, 2] 1, 2Vt ➝ saw 3 [2, Vt, 3] 1, 2, 3DT ➝ the 4 [3, DT, 4] 1, 2, 3, 4NN ➝ dog 5 [4, NN, 5] 1, 2, 3, 4, 5

2, 3, 4, 5 1Merge: [1][2] NP ➝ DT NN 6 [0, NP, 2] 3, 4, 5, 6 2

4, 5, 6 35, 6 4

Merge: [4][5] NP ➝ DT NN 7 [3, NP, 5] 6, 7 5

Input: the man saw the dog

18

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 99: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY Example

Rule Condition Statement Queue PassiveAxiom DT ➝ the 1 [0, DT, 1] 1

NN ➝ man 2 [1, NN, 2] 1, 2Vt ➝ saw 3 [2, Vt, 3] 1, 2, 3DT ➝ the 4 [3, DT, 4] 1, 2, 3, 4NN ➝ dog 5 [4, NN, 5] 1, 2, 3, 4, 5

2, 3, 4, 5 1Merge: [1][2] NP ➝ DT NN 6 [0, NP, 2] 3, 4, 5, 6 2

4, 5, 6 35, 6 4

Merge: [4][5] NP ➝ DT NN 7 [3, NP, 5] 6, 7 57 6

Input: the man saw the dog

18

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 100: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY Example

Rule Condition Statement Queue PassiveAxiom DT ➝ the 1 [0, DT, 1] 1

NN ➝ man 2 [1, NN, 2] 1, 2Vt ➝ saw 3 [2, Vt, 3] 1, 2, 3DT ➝ the 4 [3, DT, 4] 1, 2, 3, 4NN ➝ dog 5 [4, NN, 5] 1, 2, 3, 4, 5

2, 3, 4, 5 1Merge: [1][2] NP ➝ DT NN 6 [0, NP, 2] 3, 4, 5, 6 2

4, 5, 6 35, 6 4

Merge: [4][5] NP ➝ DT NN 7 [3, NP, 5] 6, 7 57 6

Merge: [3] [7] VP ➝ Vt NP 8 [2, VP, 5] 8 7

Input: the man saw the dog

18

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 101: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY Example

Rule Condition Statement Queue PassiveAxiom DT ➝ the 1 [0, DT, 1] 1

NN ➝ man 2 [1, NN, 2] 1, 2Vt ➝ saw 3 [2, Vt, 3] 1, 2, 3DT ➝ the 4 [3, DT, 4] 1, 2, 3, 4NN ➝ dog 5 [4, NN, 5] 1, 2, 3, 4, 5

2, 3, 4, 5 1Merge: [1][2] NP ➝ DT NN 6 [0, NP, 2] 3, 4, 5, 6 2

4, 5, 6 35, 6 4

Merge: [4][5] NP ➝ DT NN 7 [3, NP, 5] 6, 7 57 6

Merge: [3] [7] VP ➝ Vt NP 8 [2, VP, 5] 8 7Merge: [6] [8] S ➝ NP VP 9 [0, S, 5] 9 8

Input: the man saw the dog

18

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 102: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY Example

Rule Condition Statement Queue PassiveAxiom DT ➝ the 1 [0, DT, 1] 1

NN ➝ man 2 [1, NN, 2] 1, 2Vt ➝ saw 3 [2, Vt, 3] 1, 2, 3DT ➝ the 4 [3, DT, 4] 1, 2, 3, 4NN ➝ dog 5 [4, NN, 5] 1, 2, 3, 4, 5

2, 3, 4, 5 1Merge: [1][2] NP ➝ DT NN 6 [0, NP, 2] 3, 4, 5, 6 2

4, 5, 6 35, 6 4

Merge: [4][5] NP ➝ DT NN 7 [3, NP, 5] 6, 7 57 6

Merge: [3] [7] VP ➝ Vt NP 8 [2, VP, 5] 8 7Merge: [6] [8] S ➝ NP VP 9 [0, S, 5] 9 8GOAL: [9] ∅ 9

Input: the man saw the dog

18

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 103: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Rule Segmentation: "Split Points"

0 1 2 3

0S3 ➝ 0NP2 2VP3

0NP2 ➝ 0DT1 1NN2

2VP3 ➝ 2Vi30DT1 ➝ the1NN2 ➝ man2Vi3 ➝ sleeps

19

Page 104: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Rule Segmentation: "Split Points"

the man sleeps0 1 2 3

0S3 ➝ 0NP2 2VP3

0NP2 ➝ 0DT1 1NN2

2VP3 ➝ 2Vi30DT1 ➝ the1NN2 ➝ man2Vi3 ➝ sleeps

19

Page 105: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Rule Segmentation: "Split Points"

0DT1

the man sleeps0 1 2 3

0S3 ➝ 0NP2 2VP3

0NP2 ➝ 0DT1 1NN2

2VP3 ➝ 2Vi30DT1 ➝ the1NN2 ➝ man2Vi3 ➝ sleeps

19

Page 106: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Rule Segmentation: "Split Points"

0DT1 1NN2

the man sleeps0 1 2 3

0S3 ➝ 0NP2 2VP3

0NP2 ➝ 0DT1 1NN2

2VP3 ➝ 2Vi30DT1 ➝ the1NN2 ➝ man2Vi3 ➝ sleeps

19

Page 107: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Rule Segmentation: "Split Points"

0DT1 1NN2

the man

2Vi3

sleeps0 1 2 3

0S3 ➝ 0NP2 2VP3

0NP2 ➝ 0DT1 1NN2

2VP3 ➝ 2Vi30DT1 ➝ the1NN2 ➝ man2Vi3 ➝ sleeps

19

Page 108: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Rule Segmentation: "Split Points"

0NP2

0DT1 1NN2

the man

2Vi3

sleeps0 1 2 3

0S3 ➝ 0NP2 2VP3

0NP2 ➝ 0DT1 1NN2

2VP3 ➝ 2Vi30DT1 ➝ the1NN2 ➝ man2Vi3 ➝ sleeps

19

Page 109: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Rule Segmentation: "Split Points"

0NP2 2VP3

0DT1 1NN2

the man

2Vi3

sleeps0 1 2 3

0S3 ➝ 0NP2 2VP3

0NP2 ➝ 0DT1 1NN2

2VP3 ➝ 2Vi30DT1 ➝ the1NN2 ➝ man2Vi3 ➝ sleeps

19

Page 110: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Rule Segmentation: "Split Points"

0S3

0NP2 2VP3

0DT1 1NN2

the man

2Vi3

sleeps0 1 2 3

0S3 ➝ 0NP2 2VP3

0NP2 ➝ 0DT1 1NN2

2VP3 ➝ 2Vi30DT1 ➝ the1NN2 ➝ man2Vi3 ➝ sleeps

19

Page 111: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

"Dotted items"

20

Page 112: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

"Dotted items"Parsing a CNF grammar is easy because we know the shape of rules

20

Page 113: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

"Dotted items"Parsing a CNF grammar is easy because we know the shape of rules

When that's not the case, we have to scan rules symbol by symbol using a general mechanism:

20

Page 114: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

"Dotted items"Parsing a CNF grammar is easy because we know the shape of rules

When that's not the case, we have to scan rules symbol by symbol using a general mechanism:

Item form: [i, X ➝ α •β , j] where X ➝ α β ∈ R is a rule

20

Page 115: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

"Dotted items"Parsing a CNF grammar is easy because we know the shape of rules

When that's not the case, we have to scan rules symbol by symbol using a general mechanism:

Item form: [i, X ➝ α •β , j] where X ➝ α β ∈ R is a rule

• In general, we segment rules with respect to the input w1 ... wn

20

Page 116: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

"Dotted items"Parsing a CNF grammar is easy because we know the shape of rules

When that's not the case, we have to scan rules symbol by symbol using a general mechanism:

Item form: [i, X ➝ α •β , j] where X ➝ α β ∈ R is a rule

• In general, we segment rules with respect to the input w1 ... wn

• The dot represents progress through the rule's right-hand side (RHS)

20

Page 117: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

"Dotted items"Parsing a CNF grammar is easy because we know the shape of rules

When that's not the case, we have to scan rules symbol by symbol using a general mechanism:

Item form: [i, X ➝ α •β , j] where X ➝ α β ∈ R is a rule

• In general, we segment rules with respect to the input w1 ... wn

• The dot represents progress through the rule's right-hand side (RHS)

• The prefix α has already been parsed and we are waiting for β

20

Page 118: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

"Dotted items"Parsing a CNF grammar is easy because we know the shape of rules

When that's not the case, we have to scan rules symbol by symbol using a general mechanism:

Item form: [i, X ➝ α •β , j] where X ➝ α β ∈ R is a rule

• In general, we segment rules with respect to the input w1 ... wn

• The dot represents progress through the rule's right-hand side (RHS)

• The prefix α has already been parsed and we are waiting for β

• The filled box represents a segmentation of [0 .. j] into |α| adjacent parts

20

Page 119: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

"Dotted items"Parsing a CNF grammar is easy because we know the shape of rules

When that's not the case, we have to scan rules symbol by symbol using a general mechanism:

Item form: [i, X ➝ α •β , j] where X ➝ α β ∈ R is a rule

• In general, we segment rules with respect to the input w1 ... wn

• The dot represents progress through the rule's right-hand side (RHS)

• The prefix α has already been parsed and we are waiting for β

• The filled box represents a segmentation of [0 .. j] into |α| adjacent parts

• The empty box has no actual role, it's just a reminder that the segmentation beyond j is unknown

20

Page 120: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY+Input: G and s = w1 ... wn Item form: [i, X ➝ α •β , j]

asserts that X ⇒* wi+1 ... wj β

Axioms: [i, X ➝ wi • α , i+1] X ➝ wi α ∈ R [i, X ➝ ε •, i] X ➝ ε ∈ R

Goal: [0, S ➝ α •, n]

Scan Prefix

Complete

21

[i,X ! ↵⌅ • wj+1 �⇤, j][i,X ! ↵⌅wj+1 • �⇤, j + 1]

[i, Y ! ↵⌅•, j][i,X ! Yi,j • �⇤, j]

X ! Y � 2 R

[i,X ! ↵⌅ • Y �⇤, k] [k, Y ! �⌅•, j][i,X ! ↵⌅ Yk,j • �⇤, j]

Page 121: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY+ ExampleInput: the man sleeps

22

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 122: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY+ Example

Rule Condition Item Active Passive

Input: the man sleeps

22

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 123: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY+ Example

Rule Condition Item Active PassiveAxiom DT ➝ the 1 [0, DT ➝ the •,1] 1

Input: the man sleeps

22

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 124: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY+ Example

Rule Condition Item Active PassiveAxiom DT ➝ the 1 [0, DT ➝ the •,1] 1

NN ➝ man 2 [1, NN ➝ man •, 2] 1, 2

Input: the man sleeps

22

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 125: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY+ Example

Rule Condition Item Active PassiveAxiom DT ➝ the 1 [0, DT ➝ the •,1] 1

NN ➝ man 2 [1, NN ➝ man •, 2] 1, 2Vi ➝ sleeps 3 [2, Vi ➝ sleeps •, 3] 1, 2, 3

Input: the man sleeps

22

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 126: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY+ Example

Rule Condition Item Active PassiveAxiom DT ➝ the 1 [0, DT ➝ the •,1] 1

NN ➝ man 2 [1, NN ➝ man •, 2] 1, 2Vi ➝ sleeps 3 [2, Vi ➝ sleeps •, 3] 1, 2, 3

Prefix: [1] NP ➝ DT NN 4 [0, NP ➝ DT0,1 • NN, 2] 2, 3, 4 1

Input: the man sleeps

22

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 127: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY+ Example

Rule Condition Item Active PassiveAxiom DT ➝ the 1 [0, DT ➝ the •,1] 1

NN ➝ man 2 [1, NN ➝ man •, 2] 1, 2Vi ➝ sleeps 3 [2, Vi ➝ sleeps •, 3] 1, 2, 3

Prefix: [1] NP ➝ DT NN 4 [0, NP ➝ DT0,1 • NN, 2] 2, 3, 4 13, 4 2

Input: the man sleeps

22

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 128: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY+ Example

Rule Condition Item Active PassiveAxiom DT ➝ the 1 [0, DT ➝ the •,1] 1

NN ➝ man 2 [1, NN ➝ man •, 2] 1, 2Vi ➝ sleeps 3 [2, Vi ➝ sleeps •, 3] 1, 2, 3

Prefix: [1] NP ➝ DT NN 4 [0, NP ➝ DT0,1 • NN, 2] 2, 3, 4 13, 4 2

Prefix: [3] VP ➝ Vi 5 [2, VP ➝ Vi2,3 •, 3] 4, 5 3

Input: the man sleeps

22

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 129: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY+ Example

Rule Condition Item Active PassiveAxiom DT ➝ the 1 [0, DT ➝ the •,1] 1

NN ➝ man 2 [1, NN ➝ man •, 2] 1, 2Vi ➝ sleeps 3 [2, Vi ➝ sleeps •, 3] 1, 2, 3

Prefix: [1] NP ➝ DT NN 4 [0, NP ➝ DT0,1 • NN, 2] 2, 3, 4 13, 4 2

Prefix: [3] VP ➝ Vi 5 [2, VP ➝ Vi2,3 •, 3] 4, 5 3Complete: [4] [2] 6 [0, NP ➝ DT0,1 NN1,2 •, 2] 5, 6 4

Input: the man sleeps

22

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 130: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY+ Example

Rule Condition Item Active PassiveAxiom DT ➝ the 1 [0, DT ➝ the •,1] 1

NN ➝ man 2 [1, NN ➝ man •, 2] 1, 2Vi ➝ sleeps 3 [2, Vi ➝ sleeps •, 3] 1, 2, 3

Prefix: [1] NP ➝ DT NN 4 [0, NP ➝ DT0,1 • NN, 2] 2, 3, 4 13, 4 2

Prefix: [3] VP ➝ Vi 5 [2, VP ➝ Vi2,3 •, 3] 4, 5 3Complete: [4] [2] 6 [0, NP ➝ DT0,1 NN1,2 •, 2] 5, 6 4

6 5

Input: the man sleeps

22

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 131: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY+ Example

Rule Condition Item Active PassiveAxiom DT ➝ the 1 [0, DT ➝ the •,1] 1

NN ➝ man 2 [1, NN ➝ man •, 2] 1, 2Vi ➝ sleeps 3 [2, Vi ➝ sleeps •, 3] 1, 2, 3

Prefix: [1] NP ➝ DT NN 4 [0, NP ➝ DT0,1 • NN, 2] 2, 3, 4 13, 4 2

Prefix: [3] VP ➝ Vi 5 [2, VP ➝ Vi2,3 •, 3] 4, 5 3Complete: [4] [2] 6 [0, NP ➝ DT0,1 NN1,2 •, 2] 5, 6 4

6 5Prefix: [6] S ➝ NP VP 7 [0, S ➝ NP0,2 • VP, 2] 7 6

Input: the man sleeps

22

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 132: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY+ Example

Rule Condition Item Active PassiveAxiom DT ➝ the 1 [0, DT ➝ the •,1] 1

NN ➝ man 2 [1, NN ➝ man •, 2] 1, 2Vi ➝ sleeps 3 [2, Vi ➝ sleeps •, 3] 1, 2, 3

Prefix: [1] NP ➝ DT NN 4 [0, NP ➝ DT0,1 • NN, 2] 2, 3, 4 13, 4 2

Prefix: [3] VP ➝ Vi 5 [2, VP ➝ Vi2,3 •, 3] 4, 5 3Complete: [4] [2] 6 [0, NP ➝ DT0,1 NN1,2 •, 2] 5, 6 4

6 5Prefix: [6] S ➝ NP VP 7 [0, S ➝ NP0,2 • VP, 2] 7 6Complete: [7] [5] 8 [0, S ➝ NP0,2 VP2,3 •, 3] 8 7

Input: the man sleeps

22

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 133: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

CKY+ Example

Rule Condition Item Active PassiveAxiom DT ➝ the 1 [0, DT ➝ the •,1] 1

NN ➝ man 2 [1, NN ➝ man •, 2] 1, 2Vi ➝ sleeps 3 [2, Vi ➝ sleeps •, 3] 1, 2, 3

Prefix: [1] NP ➝ DT NN 4 [0, NP ➝ DT0,1 • NN, 2] 2, 3, 4 13, 4 2

Prefix: [3] VP ➝ Vi 5 [2, VP ➝ Vi2,3 •, 3] 4, 5 3Complete: [4] [2] 6 [0, NP ➝ DT0,1 NN1,2 •, 2] 5, 6 4

6 5Prefix: [6] S ➝ NP VP 7 [0, S ➝ NP0,2 • VP, 2] 7 6Complete: [7] [5] 8 [0, S ➝ NP0,2 VP2,3 •, 3] 8 7GOAL: [8] ∅

Input: the man sleeps

22

S ➝ NP VP Vi ➝ sleepsVP ➝ Vi Vt ➝ sawVP ➝ Vt NP NN ➝ manVP ➝ VP PP NN ➝ dogNP ➝ DT NN NN ➝ telescopeNP ➝ NP PP DT ➝ thePP ➝ IN NP IN ➝ with

Page 134: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Correctness of Parsing Strategy

Soundness: if a goal item is proven for s

• then s ∈ L(G)

Completeness: if s ∈ L(G)

• then a goal item can be proven for s

23

Page 135: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Parse Forest

Efficient representation of the whole space TG(s)

• each and every possible tree yielding s

We must be able to represent partial derivations

• including alternative ones

24

Page 136: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

AmbiguitySome strings may have more than one derivation in G

25

Page 137: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Dealing with Ambiguity

Statistical model: weight steps in a derivation

• induces a partial ordering over derivations

• can be used to make a decision

• e.g. best tree under the model

26

Page 138: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Probabilistic CFGCFG extended with parameters 0 ≤ θr ≤ 1

• where r ∈ R and

X

↵:X!↵2R

✓X!↵ = 1

27

Page 139: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

and strings

Probabilistic CFGDistribution over trees

28

P (S = s) =X

t2TG(s)

P (T = t, S = s)

P (T = t, S = yield(t)) = P (T = hr1 . . . rni, S = s)

=nY

i=1

✓ri =nY

i=1

✓Xi!↵i =nY

r2t

✓rn(r,t)

Page 140: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

EstimationLet us assume the parametric form of θ is a multinomial

• one categorical distribution per X ∈ N

Suppose we can observe a treebank, then by MLE

✓X!↵ =n(X ! ↵)

n(X)

=n(X ! ↵)P↵0 n(X ! ↵0)

29

Page 141: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Weighted CKY+Input: G and s = w1 ... wn Item form: [i, X ➝ α •β , j]

asserts that X ⇒* wi+1 ... wj β

Axioms: [i, X ➝ wi • α , i+1]:θr r = X ➝ wi α ∈ R [i, X ➝ ε •, i]:θr r = X ➝ ε ∈ R

Goal: [0, S ➝ α •, n]

Scan Prefix

Complete

30

[i,X ! ↵⌅ • wj+1 �⇤, j] : ✓1[i,X ! ↵⌅wj+1 • �⇤, j + 1] : ✓1

[i, Y ! ↵⌅•, j] : ✓1[i,X ! Yi,j • �⇤, j] : ✓r

r = X ! Y � 2 R

[i,X ! ↵⌅ • Y �⇤, k] : ✓1 [k, Y ! �⌅•, j] : ✓2[i,X ! ↵⌅ Yk,j • �⇤, j] : ✓1

Page 142: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Joint Distribution

0S3 ➝ 0NP2 2VP3

0NP2 ➝ 0DT1 1NN2

2VP3 ➝ 2Vi30DT1 ➝ the1NN2 ➝ man2Vi3 ➝ sleeps

31

Page 143: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Joint Distribution

the man sleeps

0S3 ➝ 0NP2 2VP3

0NP2 ➝ 0DT1 1NN2

2VP3 ➝ 2Vi30DT1 ➝ the1NN2 ➝ man2Vi3 ➝ sleeps

31

Page 144: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Joint Distribution

0DT1

the man sleeps

θDT➝the

0S3 ➝ 0NP2 2VP3

0NP2 ➝ 0DT1 1NN2

2VP3 ➝ 2Vi30DT1 ➝ the1NN2 ➝ man2Vi3 ➝ sleeps

31

Page 145: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Joint Distribution

0DT1 1NN2

the man sleeps

θDT➝the θNN➝man

0S3 ➝ 0NP2 2VP3

0NP2 ➝ 0DT1 1NN2

2VP3 ➝ 2Vi30DT1 ➝ the1NN2 ➝ man2Vi3 ➝ sleeps

31

Page 146: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Joint Distribution

0DT1 1NN2 2Vi3

the man sleeps

θDT➝the θNN➝man θVi➝sleeps

0S3 ➝ 0NP2 2VP3

0NP2 ➝ 0DT1 1NN2

2VP3 ➝ 2Vi30DT1 ➝ the1NN2 ➝ man2Vi3 ➝ sleeps

31

Page 147: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Joint Distribution

0NP2

0DT1 1NN2 2Vi3

the man sleeps

θNP➝DT NN

θDT➝the θNN➝man θVi➝sleeps

0S3 ➝ 0NP2 2VP3

0NP2 ➝ 0DT1 1NN2

2VP3 ➝ 2Vi30DT1 ➝ the1NN2 ➝ man2Vi3 ➝ sleeps

31

Page 148: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Joint Distribution

0NP2 2VP3

0DT1 1NN2 2Vi3

the man sleeps

θNP➝DT NN θVP➝Vi

θDT➝the θNN➝man θVi➝sleeps

0S3 ➝ 0NP2 2VP3

0NP2 ➝ 0DT1 1NN2

2VP3 ➝ 2Vi30DT1 ➝ the1NN2 ➝ man2Vi3 ➝ sleeps

31

Page 149: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Joint Distribution0S3

0NP2 2VP3

0DT1 1NN2 2Vi3

θS➝NP VP

the man sleeps

θNP➝DT NN θVP➝Vi

θDT➝the θNN➝man θVi➝sleeps

0S3 ➝ 0NP2 2VP3

0NP2 ➝ 0DT1 1NN2

2VP3 ➝ 2Vi30DT1 ➝ the1NN2 ➝ man2Vi3 ➝ sleeps

31

Page 150: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Ambiguity

32

Page 151: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Ambiguity

the man

saw

the dog

the telescopewith32

Page 152: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Ambiguity

the man

saw

0DT1

θDT➝the

the dog

the telescopewith32

Page 153: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Ambiguity

the man

saw

0DT1

θDT➝the

1NN2

θNN➝man

the dog

the telescopewith32

Page 154: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Ambiguity

2Vt3the man

saw

0DT1

θDT➝the

1NN2

θNN➝man

θVt➝saw

the dog

the telescopewith32

Page 155: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Ambiguity

2Vt3the man

saw

0DT1

θDT➝the

1NN2

θNN➝man

θVt➝saw

the dog

3DT4

θDT➝the

the telescopewith32

Page 156: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Ambiguity

2Vt3the man

saw

0DT1

θDT➝the

1NN2

θNN➝man

θVt➝saw

the dog

3DT4

θDT➝the

4NN5

θNN➝dog

the telescopewith32

Page 157: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Ambiguity

2Vt3the man

saw

0DT1

θDT➝the

1NN2

θNN➝man

θVt➝saw

the dog

3DT4

θDT➝the

4NN5

θNN➝dog

the telescope

6IN7

withθIN➝with

32

Page 158: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Ambiguity

θDT➝the

2Vt3the man

saw

0DT1

θDT➝the

1NN2

θNN➝man

θVt➝saw

the dog

3DT4

θDT➝the

4NN5

θNN➝dog

7DT8

the telescope

6IN7

withθIN➝with

32

Page 159: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Ambiguity

θDT➝the

2Vt3the man

saw

0DT1

θDT➝the

1NN2

θNN➝man

θVt➝saw

the dog

3DT4

θDT➝the

4NN5

θNN➝dog

7DT8 8NN9

the telescopeθNN➝telescope

6IN7

withθIN➝with

32

Page 160: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Ambiguity

θDT➝the

2Vt3the man

saw

0NP2

θNP➝DT NN

0DT1

θDT➝the

1NN2

θNN➝man

θVt➝saw

the dog

3DT4

θDT➝the

4NN5

θNN➝dog

7DT8 8NN9

the telescopeθNN➝telescope

6IN7

withθIN➝with

32

Page 161: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Ambiguity

θDT➝the

2Vt3the man

saw

0NP2

θNP➝DT NN

0DT1

θDT➝the

1NN2

θNN➝man

θVt➝saw

3NP5

the dog

θNP➝DT NN

3DT4

θDT➝the

4NN5

θNN➝dog

7DT8 8NN9

the telescopeθNN➝telescope

6IN7

withθIN➝with

32

Page 162: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Ambiguity

θDT➝the

2Vt3the man

saw

0NP2

θNP➝DT NN

0DT1

θDT➝the

1NN2

θNN➝man

θVt➝saw

3NP5

the dog

θNP➝DT NN

3DT4

θDT➝the

4NN5

θNN➝dog

7NP9

7DT8 8NN9

the telescope

θNP➝DT NN

θNN➝telescope

6IN7

withθIN➝with

32

Page 163: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Ambiguity

θDT➝the

2Vt3the man

saw

0NP2

θNP➝DT NN

0DT1

θDT➝the

1NN2

θNN➝man

θVt➝saw

3NP5

the dog

θNP➝DT NN

3DT4

θDT➝the

4NN5

θNN➝dog

7NP9

7DT8 8NN9

the telescope

θNP➝DT NN

θNN➝telescope

6IN7

withθIN➝with

6PP9

θPP➝IN NP

32

Page 164: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Ambiguity

θVP➝Vt NP

θDT➝the

2Vt3the man

saw

0NP2

θNP➝DT NN

0DT1

θDT➝the

1NN2

θNN➝man

θVt➝saw

3NP5

the dog

θNP➝DT NN

3DT4

θDT➝the

4NN5

θNN➝dog

7NP9

7DT8 8NN9

the telescope

θNP➝DT NN

θNN➝telescope

6IN7

withθIN➝with

6PP9

θPP➝IN NP

2VP5

32

Page 165: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

AmbiguityθVP➝VP PP

θVP➝Vt NP

θDT➝the

2VP9

2Vt3the man

saw

0NP2

θNP➝DT NN

0DT1

θDT➝the

1NN2

θNN➝man

θVt➝saw

3NP5

the dog

θNP➝DT NN

3DT4

θDT➝the

4NN5

θNN➝dog

7NP9

7DT8 8NN9

the telescope

θNP➝DT NN

θNN➝telescope

6IN7

withθIN➝with

6PP9

θPP➝IN NP

2VP5

32

Page 166: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

AmbiguityθVP➝VP PP

θVP➝Vt NP θNP➝NP PP

θDT➝the

2VP9

2Vt3the man

saw

0NP2

θNP➝DT NN

0DT1

θDT➝the

1NN2

θNN➝man

θVt➝saw

3NP5

the dog

θNP➝DT NN

3DT4

θDT➝the

4NN5

θNN➝dog

7NP9

7DT8 8NN9

the telescope

θNP➝DT NN

θNN➝telescope

6IN7

withθIN➝with

6PP9

θPP➝IN NP

5NP92VP5

32

Page 167: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

AmbiguityθVP➝VP PP

θVP➝Vt NP θNP➝NP PP

θVP➝Vt NP

θDT➝the

2VP9

2Vt3the man

saw

0NP2

θNP➝DT NN

0DT1

θDT➝the

1NN2

θNN➝man

θVt➝saw

3NP5

the dog

θNP➝DT NN

3DT4

θDT➝the

4NN5

θNN➝dog

7NP9

7DT8 8NN9

the telescope

θNP➝DT NN

θNN➝telescope

6IN7

withθIN➝with

6PP9

θPP➝IN NP

5NP92VP5

32

Page 168: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

AmbiguityθVP➝VP PP

θVP➝Vt NP θNP➝NP PP

θVP➝Vt NP

θDT➝the

0S9

2VP9

2Vt3

θS➝NP VP

the man

saw

0NP2

θNP➝DT NN

0DT1

θDT➝the

1NN2

θNN➝man

θVt➝saw

3NP5

the dog

θNP➝DT NN

3DT4

θDT➝the

4NN5

θNN➝dog

7NP9

7DT8 8NN9

the telescope

θNP➝DT NN

θNN➝telescope

6IN7

withθIN➝with

6PP9

θPP➝IN NP

5NP92VP5

32

Page 169: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Complexity

33

Page 170: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Complexity

Item form: [i, X ➝ α •β , j]

33

Page 171: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Complexity

Item form: [i, X ➝ α •β , j]

• Each rule segments the input w1 .. wn

33

Page 172: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Complexity

Item form: [i, X ➝ α •β , j]

• Each rule segments the input w1 .. wn

Every CFG can be written in CNF (max arity = 2)

33

Page 173: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Complexity

Item form: [i, X ➝ α •β , j]

• Each rule segments the input w1 .. wn

Every CFG can be written in CNF (max arity = 2)

• In total we get up to 3 indices ranging from 0 .. n

33

Page 174: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Complexity

Item form: [i, X ➝ α •β , j]

• Each rule segments the input w1 .. wn

Every CFG can be written in CNF (max arity = 2)

• In total we get up to 3 indices ranging from 0 .. n

• O(n3) annotated rules

33

Page 175: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Bitext ParsingImagine we have two streams of text

the man sleeps ⇔ dort l' homme

We want to parse both strings simultaneously such that their trees are isomorphic

• same structure up to

• relabelling and permutation of siblings

34

Page 176: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Isomorphic treesS

C E

A B D

e1 e2 e3

S

C'E'

B' A'D'

f2 f1f3

S'

C' E'

A' B' D'

f1 f2 f3

35

Page 177: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Synchronous Grammar

A CFG paired with another

• RHS symbols map one-to-one

36

Page 178: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Synchronous Grammar

A CFG paired with another

• RHS symbols map one-to-one

English French

36

Page 179: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Synchronous Grammar

A CFG paired with another

• RHS symbols map one-to-one

English French

X ➝ A A copy

36

Page 180: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Synchronous Grammar

A CFG paired with another

• RHS symbols map one-to-one

English French

X ➝ A A copy

X ➝ B C B C copy

36

Page 181: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Synchronous Grammar

A CFG paired with another

• RHS symbols map one-to-one

English French

X ➝ A A copy

X ➝ B C B C copy

C B invert

36

Page 182: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Synchronous Grammar

A CFG paired with another

• RHS symbols map one-to-one

English French

X ➝ A A copy

X ➝ B C B C copy

C B invert

X ➝ e f transduce36

Page 183: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Parse EParse with the English side of the grammar

0S3 ➝ 0NP2 2VP3

0NP2 ➝ 0DT1 1NN2

2VP3 ➝ 2Vi30DT1 ➝ the1NN2 ➝ man2Vi3 ➝ sleeps

37

Page 184: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Projection

38

Page 185: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

ProjectionEnglish French

38

Page 186: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

ProjectionEnglish French

0S3 ➝ 0NP2 2VP3 0NP2 2VP3

38

Page 187: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

ProjectionEnglish French

0S3 ➝ 0NP2 2VP3 0NP2 2VP3

2VP3 0NP2

38

Page 188: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

ProjectionEnglish French

0S3 ➝ 0NP2 2VP3 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2 0DT1 1NN2

38

Page 189: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

ProjectionEnglish French

0S3 ➝ 0NP2 2VP3 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2 0DT1 1NN2

1NN2 0DT1

38

Page 190: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

ProjectionEnglish French

0S3 ➝ 0NP2 2VP3 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi3 2Vi3

38

Page 191: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

ProjectionEnglish French

0S3 ➝ 0NP2 2VP3 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi3 2Vi30DT1 ➝ the le

38

Page 192: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

ProjectionEnglish French

0S3 ➝ 0NP2 2VP3 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi3 2Vi30DT1 ➝ the le

la

38

Page 193: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

ProjectionEnglish French

0S3 ➝ 0NP2 2VP3 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi3 2Vi30DT1 ➝ the le

lal'

38

Page 194: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

ProjectionEnglish French

0S3 ➝ 0NP2 2VP3 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi3 2Vi30DT1 ➝ the le

lal'

1NN2 ➝ man homme

38

Page 195: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

ProjectionEnglish French

0S3 ➝ 0NP2 2VP3 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi3 2Vi30DT1 ➝ the le

lal'

1NN2 ➝ man homme2Vi3 ➝ sleeps dort

38

Page 196: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

French GrammarFrench

0S3 ➝ 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi30DT1 ➝ le

lal'

1NN2 ➝ homme2Vi3 ➝ dort

39

Page 197: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Parse FFrench

0S3 ➝ 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi30DT1 ➝ le

lal'

1NN2 ➝ homme2Vi3 ➝ dort

40

Page 198: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Parse F

l' homme1 2 3

dort0

French

0S3 ➝ 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi30DT1 ➝ le

lal'

1NN2 ➝ homme2Vi3 ➝ dort

40

Page 199: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Parse F

l' homme1 2 3

2Vi3

dort0

French

0S3 ➝ 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi30DT1 ➝ le

lal'

1NN2 ➝ homme2Vi3 ➝ dort

40

Page 200: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Parse F

0DT1

l' homme1 2 3

2Vi3

dort0

French

0S3 ➝ 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi30DT1 ➝ le

lal'

1NN2 ➝ homme2Vi3 ➝ dort

40

Page 201: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Parse F

0DT1 1NN2

l' homme1 2 3

2Vi3

dort0

French

0S3 ➝ 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi30DT1 ➝ le

lal'

1NN2 ➝ homme2Vi3 ➝ dort

40

Page 202: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Parse F

0DT1 1NN2

l' homme1 2 3

2VP3

2Vi3

dort0

French

0S3 ➝ 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi30DT1 ➝ le

lal'

1NN2 ➝ homme2Vi3 ➝ dort

40

Page 203: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Parse F

0NP2

0DT1 1NN2

l' homme1 2 3

2VP3

2Vi3

dort0

French

0S3 ➝ 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi30DT1 ➝ le

lal'

1NN2 ➝ homme2Vi3 ➝ dort

40

Page 204: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Parse F0S3

0NP2

0DT1 1NN2

l' homme1 2 3

2VP3

2Vi3

dort0

French

0S3 ➝ 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi30DT1 ➝ le

lal'

1NN2 ➝ homme2Vi3 ➝ dort

40

Page 205: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Cascade of Monolingual Parsers

41

Page 206: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Cascade of Monolingual Parsers

CFG parsing can be seen as intersecting a CFG and an FSA [Bar-Hillel, 1961; Billot and Lang, 1989]

41

Page 207: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Cascade of Monolingual Parsers

CFG parsing can be seen as intersecting a CFG and an FSA [Bar-Hillel, 1961; Billot and Lang, 1989]

CFGs are closed under intersection [Hopcroft and Ullman, 1979]

41

Page 208: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Cascade of Monolingual Parsers

CFG parsing can be seen as intersecting a CFG and an FSA [Bar-Hillel, 1961; Billot and Lang, 1989]

CFGs are closed under intersection [Hopcroft and Ullman, 1979]

• L(CFG) ∩ L(FSA) is a context-free language

41

Page 209: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Cascade of Monolingual Parsers

CFG parsing can be seen as intersecting a CFG and an FSA [Bar-Hillel, 1961; Billot and Lang, 1989]

CFGs are closed under intersection [Hopcroft and Ullman, 1979]

• L(CFG) ∩ L(FSA) is a context-free language

This neat property makes cascading intersection operations (parsers) appealing [Dyer, 2010]

41

Page 210: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Cascade of Monolingual Parsers

CFG parsing can be seen as intersecting a CFG and an FSA [Bar-Hillel, 1961; Billot and Lang, 1989]

CFGs are closed under intersection [Hopcroft and Ullman, 1979]

• L(CFG) ∩ L(FSA) is a context-free language

This neat property makes cascading intersection operations (parsers) appealing [Dyer, 2010]

• e.g. bitext parsing

41

Page 211: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Biproduct: alignmentsFrench

0S3 ➝ 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi30DT1 ➝ le

lal'

1NN2 ➝ 1NN2 ➝ homme2Vi3 ➝ 2Vi3 ➝ dort

42

Page 212: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Biproduct: alignments

l' homme1 2 3

dort0

French

0S3 ➝ 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi30DT1 ➝ le

lal'

1NN2 ➝ 1NN2 ➝ homme2Vi3 ➝ 2Vi3 ➝ dort

42

Page 213: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Biproduct: alignments

l' homme1 2 3

0,2Vi3,1

dort0

French

0S3 ➝ 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi30DT1 ➝ le

lal'

1NN2 ➝ 1NN2 ➝ homme2Vi3 ➝ 2Vi3 ➝ dort

42

Page 214: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Biproduct: alignments

1,0DT1,2

l' homme1 2 3

0,2Vi3,1

dort0

French

0S3 ➝ 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi30DT1 ➝ le

lal'

1NN2 ➝ 1NN2 ➝ homme2Vi3 ➝ 2Vi3 ➝ dort

42

Page 215: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Biproduct: alignments

1,0DT1,2 2,1NN2,3

l' homme1 2 3

0,2Vi3,1

dort0

French

0S3 ➝ 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi30DT1 ➝ le

lal'

1NN2 ➝ 1NN2 ➝ homme2Vi3 ➝ 2Vi3 ➝ dort

42

Page 216: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Biproduct: alignments

1,0DT1,2 2,1NN2,3

l' homme1 2 3

0,2VP3,1

0,2Vi3,1

dort0

French

0S3 ➝ 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi30DT1 ➝ le

lal'

1NN2 ➝ 1NN2 ➝ homme2Vi3 ➝ 2Vi3 ➝ dort

42

Page 217: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Biproduct: alignments

1,0NP2,3

1,0DT1,2 2,1NN2,3

l' homme1 2 3

0,2VP3,1

0,2Vi3,1

dort0

French

0S3 ➝ 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi30DT1 ➝ le

lal'

1NN2 ➝ 1NN2 ➝ homme2Vi3 ➝ 2Vi3 ➝ dort

42

Page 218: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Biproduct: alignments0,0S3,3

1,0NP2,3

1,0DT1,2 2,1NN2,3

l' homme1 2 3

0,2VP3,1

0,2Vi3,1

dort0

French

0S3 ➝ 0NP2 2VP3

2VP3 0NP2

0NP2 ➝ 0DT1 1NN2

1NN2 0DT1

2VP3 ➝ 2Vi30DT1 ➝ le

lal'

1NN2 ➝ 1NN2 ➝ homme2Vi3 ➝ 2Vi3 ➝ dort

42

Page 219: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Complexity• O(l3 x m3)

• where l is the length of the English string

• and m is the length of the French string

• Joint parsing or cascade of parsers has the same theoretical complexity

• Can cascading be more efficient on average? Why?

Page 220: uva-slpl.github.iouva-slpl.github.io/nlp2/resources/slides/bitext-parsing.pdf · "Dotted items" Parsing a CNF grammar is easy because we know the shape of rules When that's not the

Bibliography• Hopcroft, John E. and Ullman, Jeffrey D. 1979. Introduction To Automata

Theory, Languages, And Computation.

• Shieber, S. and Schabes, Y. and Pereira, F. 1995. Principles and implementation of deductive parsing. In Journal of Logic Programming

• Bar-Hillel, Y. and Perles, M. and Shamir, E. 1961. On formal properties of simple phrase structure grammars.

• Billot, S. and Lang, B. 1989. The Structure of Shared Forests in Ambiguous ParsingThe Structure of Shared Forests in Ambiguous Parsing. In Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics.

• Dyer, C. 2010. Two monolingual parses are better than one (synchronous parse). In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics.