Everything You Always Wanted to Know About Parsing Part ...
Transcript of Everything You Always Wanted to Know About Parsing Part ...
Dependency TreesTransition-Based Parsing
Everything You AlwaysWanted to Know About Parsing
Part VII : Projective Dependency Parsing
Giorgio SattaUniversity of Padua, Italy
ESSLLI, August 2013
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
Dependency Trees
Let Σ be a finite set of lexical symbols
Assume an input string w = a1 a2 · · · an such that
ai ∈ Σ for i ∈ [2, n]
a1 is the special symbol root
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
Dependency Trees
A dependency is denoted i → j , where
the node i is the head
the node j is the dependent
Dependencies may be annotated with a label indicating somegrammatical relation
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
Dependency Trees
A dependency tree for w is a directed tree t = (Vw ,A), where
Vw = [1, n] is the set of nodes
A ⊆ Vw × Vw is the set of arcs
node 1 is the root
Each node in Vw encodes a token from w
Each arc in A encodes a dependency relation between two tokens
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
Dependency Trees
Example :
1root
2Mr.
3Tomash
4will
5remain
6as
7a
8director
9emeritus
10.
root
sbjnmod vc pp nmodnmod
np
punc
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
Projectivity
The support of a node i is the set of nodes reachable from i ,including i itself
Example : The support of node 6 is [6, 9]
1root
2Mr.
3Tomash
4will
5remain
6as
7a
8director
9emeritus
10.
root
sbjnmod vc pp nmodnmod
np
punc
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
Projectivity
A node in a dependency tree is projective if its support identifies acontiguous substring of w
A dependency tree is projective if each of its nodes is projective
A dependency tree is non-projective if it is not a projective tree
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
Projectivity
Example : The support of node 3 is [2, 3] ∪ [6, 8]; thus thedisplayed tree is non-projective
1root
2A
3hearing
4is
5scheduled
6on
7the
8issue
9today
10.
root
sbjnmod vc
tmppp
npnmod
punc
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Transition-Based Model
Transition-based models for parsing of dependencyrepresentations have been introduced in independent work by YujiMatsumoto and by Joakim Nivre in the early 2000
Conceptually, a transition-based parser is very similar to apush-down automaton, with the following differences
there is no internal state
transitions are defined directly on configurations
each transition might be associated with an actionconstructing some dependency
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Transition-Based Model
Several families of projective transition-based parsers have beendefined in the literature. Most popular are :
arc-standard [Nivre, 2004]
arc-eager [Nivre, 2003]
arc-hybrid [Kuhlmann et al., 2011]
Transition-based parsers differ from each other only with respectto their sets of transitions, and are identical in all other aspects
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Transition-Based Model
A transition-based parser is a tuple
S = (C ,T , I ,Ct)
where :
C is a set of configurations
T is a finite set of transitions, specific to the parser family;each transition is a partial function of type t : C ⇀ C
I is a total initialization function mapping each input stringto a unique initial configuration
Ct ⊆ C is a set of terminal configurations
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Configuration
Each configuration in C is defined relatively to some input stringw = a0 a1 · · · an
A configuration is a triple
c = (σ, β,A)
where
σ is a list of nodes from Vw , called stack
β is a list of nodes from Vw , called buffer
A ⊆ Vw × Vw is a set of arcs
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Configuration
Some conventions :
we write the stack as
σ = [σd , . . . , σ1]
with the topmost element placed at the right
we write the buffer as (i ∈ [1, n])
β = [i , i + 1, . . . , n]
with the first element placed at the left;observe that the buffer represents a suffix of w
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Configuration
Some conventions :
we use a vertical bar to indicate concatenation in the stackand in the buffer
Example :
node i is at the top of stack σ|inode j is the first in the buffer j |β
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Configuration
The initialization function provides an initial configuration foreach string w :
I (w) = ([], [1, . . . , |w |], ∅)
The terminal configurations in Ct have the form :
([1], [],A)
where A is some set of arcs
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Computation
A computation of S is a sequence of m ≥ 0 configurations
c0, . . . , cm
such that, for each i ∈ [1,m], we have
ci = ti (ci−1)
for some transition ti ∈ T
The computation is complete whenever c0 = I (w) and cm ∈ Ct
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Arc-Standard Parser
The arc-standard parser is a transition-based parser
SAS = (C ,TAS , I ,Ct)
whereTAS = {shift, left-arc, right-arc}
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Arc-Standard Parser
Shift (sh for short) :removes the first node in the buffer and pushes it into the stack
(σ, i |β,A) B (σ|i , β,A)
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Arc-Standard Parser
Example : sh transition
Trans Stack Buffer
· · ·— σ2 σ1 i i + 1 · · ·
· · ·sh σ2 σ1 i i + 1 i + 2 · · ·
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Arc-Standard Parser
Left-arc (la for short) :creates an arc with the topmost node on the stack as the headand the second-topmost node as the dependent, and removes thesecond-topmost node from the stack
(σ|i |j , β,A) B (σ|j , β,A ∪ {j → i})
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Arc-Standard Parser
Example : la transition
Trans Stack Buffer
· · ·— σ2 σ1 i i + 1 · · ·
· · ·la σ1 i i + 1 · · ·σ2
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Arc-Standard Parser
Right-arc (ra for short) :creates an arc with the second-topmost node as the head and thetopmost node as the dependent, and removes the topmost node
(σ|i |j , β,A) B (σ|i , β,A ∪ {i → j})
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Arc-Standard Parser
Example : ra transition
Trans Stack Buffer
· · ·— σ2 σ1 i i + 1 · · ·
· · ·ra σ2 i i + 1 · · ·σ1
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Arc-Standard Parser
Example (cont’d) :
Trans Stack Buffer
— — rootroot Mr. · · ·
sh root Mr. Tomash · · ·
sh root Mr. Tomash will · · ·
sh root Mr. Tomash will remain · · ·
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Arc-Standard Parser
Example (cont’d) :
Trans Stack Buffer
la root Tomash will remain · · ·
Mr.
sh root Tomash will remain as · · ·
Mr.
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Arc-Standard Parser
Example (cont’d) :
Trans Stack Buffer
la root will remain as · · ·
Tomash
Mr.
sh root will remain as a · · ·
Tomash
Mr.
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Arc-Standard Parser
Example (cont’d) :
Trans Stack Buffer
ra root will as a · · ·
Tomash
Mr.
remain
sh root will as a director · · ·
Tomash
Mr.
remain
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Formal Properties
SAS implements a purely bottom-up strategy : each arc h→ d isconstructed only after all dependencies of the form d → j havebeen constructed
The above property is implemented by removing node d as soonas arc h→ d is constructed
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Formal Properties
SAS is strictly nondeterministic, that is, several transitions canbe applied to a given configuration
As a consequence, there are several complete computations ofSAS for input string w
Proposition :Each complete computation of SAS on w consists of exactly2|w | − 1 transitions
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Formal Properties
Proposition [Soundness] :Each complete computation of SAS on w constructs a projectivedependency tree T with root 1 and |w | nodes
Proposition [Completeness] :For each projective dependency tree T with root 1 and |w | nodes,there is a complete computation of SAS over w which constructs T
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Formal Properties
The SAS parser has spurious ambiguity : different completecomputations of SAS on w construct the same dependency tree
Example : The following complete computations overw = a1a2a3a4 construct the same dependency tree
sh(1) sh(2) sh(3) la(3→ 2) sh(4) ra(3→ 4) ra(1→ 3)sh(1) sh(2) sh(3) sh(4) ra(3→ 4) la(3→ 2) ra(1→ 3)
1 2 3 4
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Remarks
Parsing with SAS is an instance of a general approach calledgrammarless parsing
In this approach every projective dependency tree is allowed, thatis, no hard restriction is imposed by some grammar
The approach is also called fully data driven parsing, sincediscrimination among structures is only realized on a statisticalbasis, not on a grammatical basis
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Remarks
In many natural language processing applictions, multi-labelclassifiers are trained to choose a locally optimal transition forthe current SAS configuration
The result is a greedy version of SAS , running in linear timein |w |
We follow here the alternative approach of simulating allcomputations of SAS using our tabulation technique. From theresulting parse forest we can select the globally optimal completecomputation for w
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Items
An arc-standard tabular parser, called AS algorithm, can beobtained through tabulation of the SAS automaton
Items in the AS algorithm have the form
[h1, j , h2, i ]
meaning that there is a computation γ of SAS such that
γ starts at index j with some stack σ|h1γ ends at index i with some stack σ|h1|h2portion σ|h1 of the stack is never rewritten in γ
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Items
Proposition :In an [h1, j , h2, i ] computation, token h1 is never read
...
h1
j
...
h1
?
...
h1
h2
i
· · ·
· · ·· · ·
no intermediatestack below thislength
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Items
This means that we can drop the first component of an item,and use simplified items of the form
[j , h, i ]
This property was already observed for the tabulation of the CKYautomaton
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Deduction Rules
We start with an overview of the tabular AS algorithm usingdeduction rules
We will later derive the pseudocode for the complete algorithm
The algorithm is strictly related to a standard parsing algorithmfor so called 2-lexical context-free grammars [Collins, 1999]
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Deduction Rules
The sh transition is implemented as
[i − 1, i , i ]
i − 1 iai
i
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Deduction Rules
The la transition is implemented as
[k , h1, j ] [j , h2, i ]
[k, h2, i ]
{(h2 → h1)
k j
h1
i
h2
h2
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Deduction Rules
The ra transition is implemented as
[k , h1, j ] [j , h2, i ]
[k, h1, i ]
{(h1 → h2)
k j
h1
i
h2
h1
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
AS Pseudocode
We can now present the pseudocode for the tabular AS algorithm,obtained as a specialization of the algorithm for the tabulation ofgeneral PDAs
The tabular algorithm can be further simplified by omitting theuse of the agenda D
We also omit the acceptance condition, since we do not havegrammatical restrictions, as already observed
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
AS Pseudocode
Algorithm 6 Arc-Standard Algorithm (from Tabulation of SAS)
1: T ← ∅2: for i ← 1, . . . , n do3: T ← T ∪ {[i − 1, i , i ]} . sh transition4: for each [k , h1, j ] and [j , h2, i ] ∈ T do5: T ← T ∪ {[k, h2, i ]} . la transition6: T ← T ∪ {[k, h1, i ]} . ra transition7: end for8: end for
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Computational Complexity
Computational results for the tabular AS algorithm can beobtained from the general computational complexity analysis of thetabulation algorithm
time complexity is O(|w |5)
space complexity is O(|w |3)
Observe there is no dependence on grammar size, since theapproach is grammarless
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Complete Computations
We use a context-free grammar Gw to generate the set of allcomplete computations of SAS on w , viewed as strings
Gw is constructed from the table T produced in a run ofAlgorithm 6
The methodology is the same we have used for shared and packedparse forest, but here we are interested in strings, not trees
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Complete Computations
The nonterminals of Gw are the items [i ,A, j ] used by the tabularalgorithm
The terminals of Gw are of the three types
sh(i), la(i → j), ra(i → j)
The productions of Gw can be built simultaneously with the runof Algorithm 6, through three steps reported in what follows
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Complete Computations
For each la deduction rule
[k , h1, j ] [j , h2, i ]
[k , h2, i ]
applied in the construction of T , add to Gw the production
[k, h2, i ] → [k , h1, j ] [j , h2, i ] la(h2 → h1)
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Complete Computations
For each ra deduction rule
[k , h1, j ] [j , h2, i ]
[k , h1, i ]
applied in the construction of T , add to Gw the production
[k , h1, i ] → [k , h1, j ] [j , h2, i ] ra(h1 → h2)
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Complete Computations
For each sh deduction rule
[i − 1, i , i ]
applied in the construction of T , add to Gw the production
[i − 1, i , i ] → sh(i)
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Complete Computations
Usual implementation : for each item I added to T , store the listL(I ) consisting of all pairs of items that have been used in thededuction of I itself
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
Computational Complexity
Grammar Gw can be built in time and space O(|w |5)
Grammar Gw is already reduced
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
M. Collins.1999.Head-Driven Statistical Models for Natural Language Parsing.Ph.D. thesis, Department of Computer and InformationScience, University of Pennsylvania, Philadelphia, PA.
Marco Kuhlmann, Carlos Gomez-Rodrıguez, and Giorgio Satta.
2011.Dynamic programming algorithms for transition-baseddependency parsers.In Proceedings of the 49th Annual Meeting of the Associationfor Computational Linguistics (ACL), pages 673–682,Portland, OR, USA.
Joakim Nivre.2003.
Giorgio Satta Everything About Parsing
Dependency TreesTransition-Based Parsing
General ModelArc-StandardTabulationComplete Computations
An efficient algorithm for projective dependency parsing.In Proceedings of the Eighth International Workshop onParsing Technologies (IWPT), pages 149–160, Nancy, France.
Joakim Nivre.2004.Incrementality in deterministic dependency parsing.In Workshop on Incremental Parsing: Bringing Engineeringand Cognition Together, pages 50–57, Barcelona, Spain.
Joakim Nivre.2008.Algorithms for deterministic incremental dependency parsing.Computational Linguistics, 34(4):513–553.
Giorgio Satta Everything About Parsing