Everything You Always Wanted to Know About Parsing Part ...

52
Dependency Trees Transition-Based Parsing Everything You Always Wanted to Know About Parsing Part VII : Projective Dependency Parsing Giorgio Satta University of Padua, Italy ESSLLI, August 2013 Giorgio Satta Everything About Parsing

Transcript of Everything You Always Wanted to Know About Parsing Part ...

Page 1: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

Everything You AlwaysWanted to Know About Parsing

Part VII : Projective Dependency Parsing

Giorgio SattaUniversity of Padua, Italy

ESSLLI, August 2013

Giorgio Satta Everything About Parsing

Page 2: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

Dependency Trees

Let Σ be a finite set of lexical symbols

Assume an input string w = a1 a2 · · · an such that

ai ∈ Σ for i ∈ [2, n]

a1 is the special symbol root

Giorgio Satta Everything About Parsing

Page 3: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

Dependency Trees

A dependency is denoted i → j , where

the node i is the head

the node j is the dependent

Dependencies may be annotated with a label indicating somegrammatical relation

Giorgio Satta Everything About Parsing

Page 4: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

Dependency Trees

A dependency tree for w is a directed tree t = (Vw ,A), where

Vw = [1, n] is the set of nodes

A ⊆ Vw × Vw is the set of arcs

node 1 is the root

Each node in Vw encodes a token from w

Each arc in A encodes a dependency relation between two tokens

Giorgio Satta Everything About Parsing

Page 5: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

Dependency Trees

Example :

1root

2Mr.

3Tomash

4will

5remain

6as

7a

8director

9emeritus

10.

root

sbjnmod vc pp nmodnmod

np

punc

Giorgio Satta Everything About Parsing

Page 6: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

Projectivity

The support of a node i is the set of nodes reachable from i ,including i itself

Example : The support of node 6 is [6, 9]

1root

2Mr.

3Tomash

4will

5remain

6as

7a

8director

9emeritus

10.

root

sbjnmod vc pp nmodnmod

np

punc

Giorgio Satta Everything About Parsing

Page 7: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

Projectivity

A node in a dependency tree is projective if its support identifies acontiguous substring of w

A dependency tree is projective if each of its nodes is projective

A dependency tree is non-projective if it is not a projective tree

Giorgio Satta Everything About Parsing

Page 8: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

Projectivity

Example : The support of node 3 is [2, 3] ∪ [6, 8]; thus thedisplayed tree is non-projective

1root

2A

3hearing

4is

5scheduled

6on

7the

8issue

9today

10.

root

sbjnmod vc

tmppp

npnmod

punc

Giorgio Satta Everything About Parsing

Page 9: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Transition-Based Model

Transition-based models for parsing of dependencyrepresentations have been introduced in independent work by YujiMatsumoto and by Joakim Nivre in the early 2000

Conceptually, a transition-based parser is very similar to apush-down automaton, with the following differences

there is no internal state

transitions are defined directly on configurations

each transition might be associated with an actionconstructing some dependency

Giorgio Satta Everything About Parsing

Page 10: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Transition-Based Model

Several families of projective transition-based parsers have beendefined in the literature. Most popular are :

arc-standard [Nivre, 2004]

arc-eager [Nivre, 2003]

arc-hybrid [Kuhlmann et al., 2011]

Transition-based parsers differ from each other only with respectto their sets of transitions, and are identical in all other aspects

Giorgio Satta Everything About Parsing

Page 11: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Transition-Based Model

A transition-based parser is a tuple

S = (C ,T , I ,Ct)

where :

C is a set of configurations

T is a finite set of transitions, specific to the parser family;each transition is a partial function of type t : C ⇀ C

I is a total initialization function mapping each input stringto a unique initial configuration

Ct ⊆ C is a set of terminal configurations

Giorgio Satta Everything About Parsing

Page 12: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Configuration

Each configuration in C is defined relatively to some input stringw = a0 a1 · · · an

A configuration is a triple

c = (σ, β,A)

where

σ is a list of nodes from Vw , called stack

β is a list of nodes from Vw , called buffer

A ⊆ Vw × Vw is a set of arcs

Giorgio Satta Everything About Parsing

Page 13: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Configuration

Some conventions :

we write the stack as

σ = [σd , . . . , σ1]

with the topmost element placed at the right

we write the buffer as (i ∈ [1, n])

β = [i , i + 1, . . . , n]

with the first element placed at the left;observe that the buffer represents a suffix of w

Giorgio Satta Everything About Parsing

Page 14: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Configuration

Some conventions :

we use a vertical bar to indicate concatenation in the stackand in the buffer

Example :

node i is at the top of stack σ|inode j is the first in the buffer j |β

Giorgio Satta Everything About Parsing

Page 15: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Configuration

The initialization function provides an initial configuration foreach string w :

I (w) = ([], [1, . . . , |w |], ∅)

The terminal configurations in Ct have the form :

([1], [],A)

where A is some set of arcs

Giorgio Satta Everything About Parsing

Page 16: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Computation

A computation of S is a sequence of m ≥ 0 configurations

c0, . . . , cm

such that, for each i ∈ [1,m], we have

ci = ti (ci−1)

for some transition ti ∈ T

The computation is complete whenever c0 = I (w) and cm ∈ Ct

Giorgio Satta Everything About Parsing

Page 17: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Arc-Standard Parser

The arc-standard parser is a transition-based parser

SAS = (C ,TAS , I ,Ct)

whereTAS = {shift, left-arc, right-arc}

Giorgio Satta Everything About Parsing

Page 18: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Arc-Standard Parser

Shift (sh for short) :removes the first node in the buffer and pushes it into the stack

(σ, i |β,A) B (σ|i , β,A)

Giorgio Satta Everything About Parsing

Page 19: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Arc-Standard Parser

Example : sh transition

Trans Stack Buffer

· · ·— σ2 σ1 i i + 1 · · ·

· · ·sh σ2 σ1 i i + 1 i + 2 · · ·

Giorgio Satta Everything About Parsing

Page 20: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Arc-Standard Parser

Left-arc (la for short) :creates an arc with the topmost node on the stack as the headand the second-topmost node as the dependent, and removes thesecond-topmost node from the stack

(σ|i |j , β,A) B (σ|j , β,A ∪ {j → i})

Giorgio Satta Everything About Parsing

Page 21: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Arc-Standard Parser

Example : la transition

Trans Stack Buffer

· · ·— σ2 σ1 i i + 1 · · ·

· · ·la σ1 i i + 1 · · ·σ2

Giorgio Satta Everything About Parsing

Page 22: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Arc-Standard Parser

Right-arc (ra for short) :creates an arc with the second-topmost node as the head and thetopmost node as the dependent, and removes the topmost node

(σ|i |j , β,A) B (σ|i , β,A ∪ {i → j})

Giorgio Satta Everything About Parsing

Page 23: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Arc-Standard Parser

Example : ra transition

Trans Stack Buffer

· · ·— σ2 σ1 i i + 1 · · ·

· · ·ra σ2 i i + 1 · · ·σ1

Giorgio Satta Everything About Parsing

Page 24: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Arc-Standard Parser

Example (cont’d) :

Trans Stack Buffer

— — rootroot Mr. · · ·

sh root Mr. Tomash · · ·

sh root Mr. Tomash will · · ·

sh root Mr. Tomash will remain · · ·

Giorgio Satta Everything About Parsing

Page 25: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Arc-Standard Parser

Example (cont’d) :

Trans Stack Buffer

la root Tomash will remain · · ·

Mr.

sh root Tomash will remain as · · ·

Mr.

Giorgio Satta Everything About Parsing

Page 26: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Arc-Standard Parser

Example (cont’d) :

Trans Stack Buffer

la root will remain as · · ·

Tomash

Mr.

sh root will remain as a · · ·

Tomash

Mr.

Giorgio Satta Everything About Parsing

Page 27: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Arc-Standard Parser

Example (cont’d) :

Trans Stack Buffer

ra root will as a · · ·

Tomash

Mr.

remain

sh root will as a director · · ·

Tomash

Mr.

remain

Giorgio Satta Everything About Parsing

Page 28: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Formal Properties

SAS implements a purely bottom-up strategy : each arc h→ d isconstructed only after all dependencies of the form d → j havebeen constructed

The above property is implemented by removing node d as soonas arc h→ d is constructed

Giorgio Satta Everything About Parsing

Page 29: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Formal Properties

SAS is strictly nondeterministic, that is, several transitions canbe applied to a given configuration

As a consequence, there are several complete computations ofSAS for input string w

Proposition :Each complete computation of SAS on w consists of exactly2|w | − 1 transitions

Giorgio Satta Everything About Parsing

Page 30: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Formal Properties

Proposition [Soundness] :Each complete computation of SAS on w constructs a projectivedependency tree T with root 1 and |w | nodes

Proposition [Completeness] :For each projective dependency tree T with root 1 and |w | nodes,there is a complete computation of SAS over w which constructs T

Giorgio Satta Everything About Parsing

Page 31: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Formal Properties

The SAS parser has spurious ambiguity : different completecomputations of SAS on w construct the same dependency tree

Example : The following complete computations overw = a1a2a3a4 construct the same dependency tree

sh(1) sh(2) sh(3) la(3→ 2) sh(4) ra(3→ 4) ra(1→ 3)sh(1) sh(2) sh(3) sh(4) ra(3→ 4) la(3→ 2) ra(1→ 3)

1 2 3 4

Giorgio Satta Everything About Parsing

Page 32: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Remarks

Parsing with SAS is an instance of a general approach calledgrammarless parsing

In this approach every projective dependency tree is allowed, thatis, no hard restriction is imposed by some grammar

The approach is also called fully data driven parsing, sincediscrimination among structures is only realized on a statisticalbasis, not on a grammatical basis

Giorgio Satta Everything About Parsing

Page 33: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Remarks

In many natural language processing applictions, multi-labelclassifiers are trained to choose a locally optimal transition forthe current SAS configuration

The result is a greedy version of SAS , running in linear timein |w |

We follow here the alternative approach of simulating allcomputations of SAS using our tabulation technique. From theresulting parse forest we can select the globally optimal completecomputation for w

Giorgio Satta Everything About Parsing

Page 34: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Items

An arc-standard tabular parser, called AS algorithm, can beobtained through tabulation of the SAS automaton

Items in the AS algorithm have the form

[h1, j , h2, i ]

meaning that there is a computation γ of SAS such that

γ starts at index j with some stack σ|h1γ ends at index i with some stack σ|h1|h2portion σ|h1 of the stack is never rewritten in γ

Giorgio Satta Everything About Parsing

Page 35: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Items

Proposition :In an [h1, j , h2, i ] computation, token h1 is never read

...

h1

j

...

h1

?

...

h1

h2

i

· · ·

· · ·· · ·

no intermediatestack below thislength

Giorgio Satta Everything About Parsing

Page 36: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Items

This means that we can drop the first component of an item,and use simplified items of the form

[j , h, i ]

This property was already observed for the tabulation of the CKYautomaton

Giorgio Satta Everything About Parsing

Page 37: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Deduction Rules

We start with an overview of the tabular AS algorithm usingdeduction rules

We will later derive the pseudocode for the complete algorithm

The algorithm is strictly related to a standard parsing algorithmfor so called 2-lexical context-free grammars [Collins, 1999]

Giorgio Satta Everything About Parsing

Page 38: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Deduction Rules

The sh transition is implemented as

[i − 1, i , i ]

i − 1 iai

i

Giorgio Satta Everything About Parsing

Page 39: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Deduction Rules

The la transition is implemented as

[k , h1, j ] [j , h2, i ]

[k, h2, i ]

{(h2 → h1)

k j

h1

i

h2

h2

Giorgio Satta Everything About Parsing

Page 40: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Deduction Rules

The ra transition is implemented as

[k , h1, j ] [j , h2, i ]

[k, h1, i ]

{(h1 → h2)

k j

h1

i

h2

h1

Giorgio Satta Everything About Parsing

Page 41: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

AS Pseudocode

We can now present the pseudocode for the tabular AS algorithm,obtained as a specialization of the algorithm for the tabulation ofgeneral PDAs

The tabular algorithm can be further simplified by omitting theuse of the agenda D

We also omit the acceptance condition, since we do not havegrammatical restrictions, as already observed

Giorgio Satta Everything About Parsing

Page 42: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

AS Pseudocode

Algorithm 6 Arc-Standard Algorithm (from Tabulation of SAS)

1: T ← ∅2: for i ← 1, . . . , n do3: T ← T ∪ {[i − 1, i , i ]} . sh transition4: for each [k , h1, j ] and [j , h2, i ] ∈ T do5: T ← T ∪ {[k, h2, i ]} . la transition6: T ← T ∪ {[k, h1, i ]} . ra transition7: end for8: end for

Giorgio Satta Everything About Parsing

Page 43: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Computational Complexity

Computational results for the tabular AS algorithm can beobtained from the general computational complexity analysis of thetabulation algorithm

time complexity is O(|w |5)

space complexity is O(|w |3)

Observe there is no dependence on grammar size, since theapproach is grammarless

Giorgio Satta Everything About Parsing

Page 44: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Complete Computations

We use a context-free grammar Gw to generate the set of allcomplete computations of SAS on w , viewed as strings

Gw is constructed from the table T produced in a run ofAlgorithm 6

The methodology is the same we have used for shared and packedparse forest, but here we are interested in strings, not trees

Giorgio Satta Everything About Parsing

Page 45: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Complete Computations

The nonterminals of Gw are the items [i ,A, j ] used by the tabularalgorithm

The terminals of Gw are of the three types

sh(i), la(i → j), ra(i → j)

The productions of Gw can be built simultaneously with the runof Algorithm 6, through three steps reported in what follows

Giorgio Satta Everything About Parsing

Page 46: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Complete Computations

For each la deduction rule

[k , h1, j ] [j , h2, i ]

[k , h2, i ]

applied in the construction of T , add to Gw the production

[k, h2, i ] → [k , h1, j ] [j , h2, i ] la(h2 → h1)

Giorgio Satta Everything About Parsing

Page 47: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Complete Computations

For each ra deduction rule

[k , h1, j ] [j , h2, i ]

[k , h1, i ]

applied in the construction of T , add to Gw the production

[k , h1, i ] → [k , h1, j ] [j , h2, i ] ra(h1 → h2)

Giorgio Satta Everything About Parsing

Page 48: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Complete Computations

For each sh deduction rule

[i − 1, i , i ]

applied in the construction of T , add to Gw the production

[i − 1, i , i ] → sh(i)

Giorgio Satta Everything About Parsing

Page 49: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Complete Computations

Usual implementation : for each item I added to T , store the listL(I ) consisting of all pairs of items that have been used in thededuction of I itself

Giorgio Satta Everything About Parsing

Page 50: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

Computational Complexity

Grammar Gw can be built in time and space O(|w |5)

Grammar Gw is already reduced

Giorgio Satta Everything About Parsing

Page 51: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

M. Collins.1999.Head-Driven Statistical Models for Natural Language Parsing.Ph.D. thesis, Department of Computer and InformationScience, University of Pennsylvania, Philadelphia, PA.

Marco Kuhlmann, Carlos Gomez-Rodrıguez, and Giorgio Satta.

2011.Dynamic programming algorithms for transition-baseddependency parsers.In Proceedings of the 49th Annual Meeting of the Associationfor Computational Linguistics (ACL), pages 673–682,Portland, OR, USA.

Joakim Nivre.2003.

Giorgio Satta Everything About Parsing

Page 52: Everything You Always Wanted to Know About Parsing Part ...

Dependency TreesTransition-Based Parsing

General ModelArc-StandardTabulationComplete Computations

An efficient algorithm for projective dependency parsing.In Proceedings of the Eighth International Workshop onParsing Technologies (IWPT), pages 149–160, Nancy, France.

Joakim Nivre.2004.Incrementality in deterministic dependency parsing.In Workshop on Incremental Parsing: Bringing Engineeringand Cognition Together, pages 50–57, Barcelona, Spain.

Joakim Nivre.2008.Algorithms for deterministic incremental dependency parsing.Computational Linguistics, 34(4):513–553.

Giorgio Satta Everything About Parsing