Grammar Engineering: Parsing with HPSG Grammars

23
Grammar Engineering: Parsing with HPSG Grammars Miguel Hormazábal

description

Grammar Engineering: Parsing with HPSG Grammars. Miguel Hormazábal. Overview. The Parsing Problem Parsing with constraint-based grammars Advantages and drawbacks Three different approaches. The Parsing Problem. Given a Grammar and a Sentence, - PowerPoint PPT Presentation

Transcript of Grammar Engineering: Parsing with HPSG Grammars

Page 1: Grammar Engineering: Parsing with HPSG Grammars

Grammar Engineering:

Parsing with HPSG Grammars

Miguel Hormazábal

Page 2: Grammar Engineering: Parsing with HPSG Grammars

Overview

The Parsing Problem

Parsing with constraint-based grammars

Advantages and drawbacks

Three different approaches

Page 3: Grammar Engineering: Parsing with HPSG Grammars

The Parsing Problem

Given a Grammar and a Sentence,

Can the < S, Θ > generate / rule out the input String ?

A candidate sentence must satisfy all the principles of the Grammar

Coreferences as main explanatory mechanism in HPSG

Page 4: Grammar Engineering: Parsing with HPSG Grammars

Parsing with Constraint-based Grammars

Object-based formalism Complex specifications on signs Structure sharing imposed by the theory

Feature Structures Sort resolved and well typed Multiple information levels (PHON, SYNSEM)

Universal / Language specific principles to be met

Page 5: Grammar Engineering: Parsing with HPSG Grammars

Advantages and Drawbacks

Pros:

A common formalism for all levels of linguistic Information

All information simultaneously available

Cons:

Hard to modularize

Computational overhead for parser

Page 6: Grammar Engineering: Parsing with HPSG Grammars

1st Approach: Distributed ParsingTwo kind of constraints: Genuine: syntactic, they work as filters of the input Spurious: semantic, they build representational structures

Parser cannot distinguish between analytical and structure-building constraints

VERBMOBIL implementation: Input: word lattices of speech recognition hypotheses Parser identifies those paths of acceptable utterances Lattices can contain hundreds of hypotheses, most ungrammatical

Goal: Distribute the labour of evaluating the constrains in the grammar on several processes

Page 7: Grammar Engineering: Parsing with HPSG Grammars

Distributed Parsing

Analysis strategy:Two parser units:

SYN-Parser:

Works directly with word lattices Performs as a filter for the SEM-Parser

SEM-Parser:

Works only with successful analysis results Performs under control by the SYN-Parser

Page 8: Grammar Engineering: Parsing with HPSG Grammars

Distributed Parsing

Processing requirements:

Incrementality: The SYN-Parser must NOT send its results only when it has

complete analysis, forcing the SEM-Parser to wait

Interactivity: The SYN-Parser must report back when its hypothesis failed

Efficient communication system between the parsers, based on the common grammar

Page 9: Grammar Engineering: Parsing with HPSG Grammars

Distributed ParsingCentralized Parsing

Distributed Parsing

Page 10: Grammar Engineering: Parsing with HPSG Grammars

Distributed Parsing

Bottom-Up Hypotheses Emitted by the SYN-Parser and sent to SEM-Parser, for semantic

verification

Top-Down Hypotheses Emitted by the SEM-Parser, failures reported back to SYN-Parser

Completion HistoryC-hist(NP-DET-N) := ((DET t0 t1) (N t’1 t2))

C-hist(det) := ((“the” t0 t1))

C-hist(N) := ((“example” t’1 t2))

Page 11: Grammar Engineering: Parsing with HPSG Grammars

Distributed Parsing

Compilation of Subgrammars

From common source Grammar,

Straightforward option: split up the Grammar into syntax and semantics strata

Manipulating grammar rules and lexical entries to obtain: Gsyn and Gsem

Page 12: Grammar Engineering: Parsing with HPSG Grammars

2nd Approach: Data-Oriented Parsing

Main goal: achieve domain adaptation to improve efficiency of HPSG parsing

Assumption: frequency and plausibility of linguistic structures within a certain domain, will render better results

DOP process new input by combining structure fragments from a Treebank

DOP allows to assign probabilities to arbitrarily large syntactic constructions

Page 13: Grammar Engineering: Parsing with HPSG Grammars

Data-Oriented Parsing

Procedure:

Parse all sentences from a training corpus using HPSG Grammar and Parser

Automatic acquisition of a stochastic lexicalized tree grammar (SLTG)

Each parse tree is decomposed into a set of subtrees.

Assignment of probabilities to each subtree

Page 14: Grammar Engineering: Parsing with HPSG Grammars

Data-Oriented Parsing

Implementation using unification-based Grammar, parsing and generation platform: LKB

First parse each sentence of the training corpus

The resulting Feature Structure contains the parse tree

Each non-terminal node contains the label of the HPSG-rule schema applied

Each terminal node contains lexical type of the corresponding feature structure

After this, each parse tree is further processed

Page 15: Grammar Engineering: Parsing with HPSG Grammars

Data-Oriented Parsing

1. Decomposition, two operations:

Root creates ‘passive’ (closed, complete) fragments by extracting substructures

Frontier creates ‘active’ (open, incomplete) fragments by deleting pieces of substructure

Each non-head subtree is cut off, and the cutting point is marked for substitution.

Page 16: Grammar Engineering: Parsing with HPSG Grammars

Data-Oriented Parsing

2. Specialization Rule labels of root node and substitution nodes are replaced with

a corresponding category label.Example:

signs with local.cat.head value of type noun, and

local. cat.val.subj feature the empty list, are classified as NPs.

3. Probability Count total number n of all trees with same root label α Divide frequency number m of a tree t with root α by n p(t) The sum of all probabilities of trees ti with root α 1

Σ ti : root(ti) = α p(ti) = 1

Page 17: Grammar Engineering: Parsing with HPSG Grammars

Data-Oriented Parsing

This implementation for the VerbMobil project uses a

chart-based agenda-driven bottom-up parser

Step 1: Selection of a set of SLTG-trees associated with the lexical items in the input sentence

Step 2: Parsing of the sentence with respect to this set.

Step 3: Each SLTG-parse tree is “expanded” by unifying the feature constraints into the parse trees

If successful, complete valid feature structure

Else, next most likely tree is expanded

Page 18: Grammar Engineering: Parsing with HPSG Grammars

3rd Approach: Probabilistic CFG Parsing

Main goal: to obtain the Viterbi parse (highest probability) given an HPSG and a probabilistic model

One way: Parse input without using probabilities Then select most probable parse looking at every result Cost: Exponential search space

This Approach: Define equivalence class function (F.S. reduction) Integrate SEM and SYN preference into Figures Of Merit (FOMs)

Page 19: Grammar Engineering: Parsing with HPSG Grammars

Probabilistic CFG Parsing

Probabilistic Model:

HPSG Grammar: G = < L, R >, where

L = { l = < w, F > | w Є W, F Є F } set of lexical entries

R is a set of grammar rules, i.e., r Є R is a partial function:

F x F -> F

Page 20: Grammar Engineering: Parsing with HPSG Grammars

Probabilistic CFG Parsing

Probabilistic HPSG:

Probability p(F | w) of F.S. Assign to given sentence:

Where λi is a model parameter,

si is a fragment of a F.S., and

σ (si , F )is a function of N of appearences of F.S. fragment

si in F

Probabilities represent syntactic/semantic preferences expressed in a Feature Structure

Page 21: Grammar Engineering: Parsing with HPSG Grammars

Probabilistic CFG Parsing

Implementation: Iterative CYK parsing algorithm Pruning edges during parsing Best N parses are tracked

Reduced F.S.E though equivalence classes Requires not over/undergenerate FOMs computed with reduced F.S. Equivalent to original

Parser calculates Viterbi, taking maximum of probabilities of the same non terminal symbol at each point

Page 22: Grammar Engineering: Parsing with HPSG Grammars

Assessment

The three approaches attempt to achieve a higher efficiency of the Parsing process Distributed Parsing

Distributed Parsing: Unification and copying faster

Soundness of Grammar affected L(G) L(⊂ Gsyn) ∩ L(Gsem)

DO Parsing Fragment at the right level of generality

Straightforward Probability computation

PCFG Parsing Highly efficient CYK parsing implementation trough reduced FS and edge pruning

Page 23: Grammar Engineering: Parsing with HPSG Grammars

References

Pollard, C. and Sag, I. A. (1994). Head-Driven Phrase Structure Grammar . Chicago, IL: University of Chicago Press.

Richter, F. (2004b). A Web-based Course in Grammar Formalisms and Parsing. Textbook, MiLCA project A4, SfS, Universit¨at T¨ubingen. http://milca.sfs.uni-tuebingen.de/A4/Course/PDF/gramandpars.pdf.

Levine Robert, and Meurers Detmar. Head-Driven Phrase Structure Grammar: Linguistic Approach, Formal Foundations, and Computational Realization In Keith Brown (Ed.): Encyclopedia of Language and Linguistics, Second Edition. Oxford: Elsevier. 2006.

Abdel Kader Diagne, Walter Kasper, and Hans-Ulrich Krieger. (1995). Distributed Parsing With HPSG Grammars. In Proceedings of the 4th International Workshop on Parsing Technologies, IWPT-95, pages 79–86.

Neumann, G.HPSG-DOP: data-oriented parsing with HPSG. In: Unpublished manuscript, presented at the 9th Int. Conf. on HPSG, HPSG-2002, Seoul, South Korea (2002)

Tsuruoka Yoshimasa, Miyao Yusuke, and Tsujii Jun'ichi. 2003. Towards efficient probabilistic HPSG parsing: integrating semantic and syntactic preference to guide the parsing. Proceedings of IJCNLP-04 Workshop: Beyond shallow analyses - Formalisms and statistical modeling for deep analyses.