Grammar Engineering: Parsing with HPSG Grammars

of 23/23
Grammar Engineering: Parsing with HPSG Grammars Miguel Hormazábal
  • date post

    08-Jan-2016
  • Category

    Documents

  • view

    40
  • download

    0

Embed Size (px)

description

Grammar Engineering: Parsing with HPSG Grammars. Miguel Hormazábal. Overview. The Parsing Problem Parsing with constraint-based grammars Advantages and drawbacks Three different approaches. The Parsing Problem. Given a Grammar and a Sentence, - PowerPoint PPT Presentation

Transcript of Grammar Engineering: Parsing with HPSG Grammars

  • Grammar Engineering:

    Parsing with HPSG GrammarsMiguel Hormazbal

  • Overview

    The Parsing Problem

    Parsing with constraint-based grammars

    Advantages and drawbacks

    Three different approaches

  • The Parsing Problem

    Given a Grammar and a Sentence,

    Can the < S, > generate / rule out the input String ?

    A candidate sentence must satisfy all the principles of the Grammar

    Coreferences as main explanatory mechanism in HPSG

  • Parsing with Constraint-based Grammars

    Object-based formalismComplex specifications on signsStructure sharing imposed by the theory

    Feature StructuresSort resolved and well typedMultiple information levels (PHON, SYNSEM)

    Universal / Language specific principles to be met

  • Advantages and Drawbacks

    Pros:

    A common formalism for all levels of linguistic InformationAll information simultaneously available

    Cons:

    Hard to modularizeComputational overhead for parser

  • 1st Approach: Distributed ParsingTwo kind of constraints:Genuine: syntactic, they work as filters of the inputSpurious: semantic, they build representational structures

    Parser cannot distinguish between analytical and structure-building constraints

    VERBMOBIL implementation:Input: word lattices of speech recognition hypothesesParser identifies those paths of acceptable utterancesLattices can contain hundreds of hypotheses, most ungrammatical

    Goal: Distribute the labour of evaluating the constrains in the grammar on several processes

  • Distributed ParsingAnalysis strategy:Two parser units:

    SYN-Parser:

    Works directly with word latticesPerforms as a filter for the SEM-Parser

    SEM-Parser:

    Works only with successful analysis resultsPerforms under control by the SYN-Parser

  • Distributed Parsing

    Processing requirements:

    Incrementality: The SYN-Parser must NOT send its results only when it has complete analysis, forcing the SEM-Parser to wait

    Interactivity:The SYN-Parser must report back when its hypothesis failed

    Efficient communication system between the parsers, based on the common grammar

  • Distributed ParsingCentralized Parsing

    Distributed Parsing

  • Distributed Parsing

    Bottom-Up HypothesesEmitted by the SYN-Parser and sent to SEM-Parser, for semantic verification

    Top-Down HypothesesEmitted by the SEM-Parser, failures reported back to SYN-Parser

    Completion HistoryC-hist(NP-DET-N) := ((DET t0 t1) (N t1 t2))C-hist(det) := ((the t0 t1))C-hist(N) := ((example t1 t2))

  • Distributed Parsing

    Compilation of Subgrammars

    From common source Grammar,

    Straightforward option: split up the Grammar into syntax and semantics strata

    Manipulating grammar rules and lexical entries to obtain: Gsyn and Gsem

  • 2nd Approach: Data-Oriented ParsingMain goal: achieve domain adaptation to improve efficiency of HPSG parsing

    Assumption: frequency and plausibility of linguistic structures within a certain domain, will render better results

    DOP process new input by combining structure fragments from a Treebank

    DOP allows to assign probabilities to arbitrarily large syntactic constructions

  • Data-Oriented ParsingProcedure:

    Parse all sentences from a training corpus using HPSG Grammar and Parser

    Automatic acquisition of a stochastic lexicalized tree grammar (SLTG)

    Each parse tree is decomposed into a set of subtrees.

    Assignment of probabilities to each subtree

  • Data-Oriented ParsingImplementation using unification-based Grammar, parsing and generation platform: LKB

    First parse each sentence of the training corpusThe resulting Feature Structure contains the parse treeEach non-terminal node contains the label of the HPSG-rule schema appliedEach terminal node contains lexical type of the corresponding feature structure

    After this, each parse tree is further processed

  • Data-Oriented Parsing1. Decomposition, two operations:

    Root creates passive (closed, complete) fragments by extracting substructuresFrontier creates active (open, incomplete) fragments by deleting pieces of substructure

    Each non-head subtree is cut off, and the cutting point is marked for substitution.

  • Data-Oriented Parsing2. Specialization Rule labels of root node and substitution nodes are replaced with a corresponding category label.Example: signs with local.cat.head value of type noun, and local. cat.val.subj feature the empty list, are classified as NPs.

    3. Probability Count total number n of all trees with same root label Divide frequency number m of a tree t with root by n p(t)The sum of all probabilities of trees ti with root 1 ti : root(ti) = p(ti) = 1

  • Data-Oriented ParsingThis implementation for the VerbMobil project uses a chart-based agenda-driven bottom-up parser

    Step 1: Selection of a set of SLTG-trees associated with the lexical items in the input sentenceStep 2: Parsing of the sentence with respect to this set.Step 3: Each SLTG-parse tree is expanded by unifying the feature constraints into the parse trees

    If successful, complete valid feature structureElse, next most likely tree is expanded

  • 3rd Approach: Probabilistic CFG ParsingMain goal: to obtain the Viterbi parse (highest probability) given an HPSG and a probabilistic model

    One way: Parse input without using probabilitiesThen select most probable parse looking at every resultCost: Exponential search space

    This Approach:Define equivalence class function (F.S. reduction)Integrate SEM and SYN preference into Figures Of Merit (FOMs)

  • Probabilistic CFG Parsing

    Probabilistic Model:

    HPSG Grammar: G = < L, R >, whereL = { l = < w, F > | w W, F F } set of lexical entries

    R is a set of grammar rules, i.e., r R is a partial function: F x F -> F

  • Probabilistic CFG ParsingProbabilistic HPSG:

    Probability p(F | w) of F.S. Assign to given sentence:

    Where i is a model parameter,si is a fragment of a F.S., and (si , F )is a function of N of appearences of F.S. fragment si in F

    Probabilities represent syntactic/semantic preferences expressed in a Feature Structure

  • Probabilistic CFG Parsing

    Implementation: Iterative CYK parsing algorithmPruning edges during parsingBest N parses are tracked

    Reduced F.S.E though equivalence classesRequires not over/undergenerateFOMs computed with reduced F.S. Equivalent to original

    Parser calculates Viterbi, taking maximum of probabilities of the same non terminal symbol at each point

  • AssessmentThe three approaches attempt to achieve a higher efficiency of the Parsing process Distributed Parsing

    Distributed Parsing: Unification and copying faster Soundness of Grammar affected L(G) L(Gsyn) L(Gsem)

    DO Parsing Fragment at the right level of generality Straightforward Probability computation

    PCFG Parsing Highly efficient CYK parsing implementation trough reduced FS and edge pruning

  • ReferencesPollard, C. and Sag, I. A. (1994). Head-Driven Phrase Structure Grammar . Chicago, IL: University of Chicago Press.

    Richter, F. (2004b). A Web-based Course in Grammar Formalisms and Parsing. Textbook, MiLCA project A4, SfS, Universitat Tubingen. http://milca.sfs.uni-tuebingen.de/A4/Course/PDF/gramandpars.pdf.

    Levine Robert, and Meurers Detmar. Head-Driven Phrase Structure Grammar: Linguistic Approach, Formal Foundations, and Computational Realization In Keith Brown (Ed.): Encyclopedia of Language and Linguistics, Second Edition. Oxford: Elsevier. 2006.

    Abdel Kader Diagne, Walter Kasper, and Hans-Ulrich Krieger. (1995). Distributed Parsing With HPSG Grammars. In Proceedings of the 4th International Workshop on Parsing Technologies, IWPT-95, pages 7986.

    Neumann, G.HPSG-DOP: data-oriented parsing with HPSG. In: Unpublished manuscript, presented at the 9th Int. Conf. on HPSG, HPSG-2002, Seoul, South Korea (2002)

    Tsuruoka Yoshimasa, Miyao Yusuke, and Tsujii Jun'ichi. 2003. Towards efficient probabilistic HPSG parsing: integrating semantic and syntactic preference to guide the parsing. Proceedings of IJCNLP-04 Workshop: Beyond shallow analyses - Formalisms and statistical modeling for deep analyses.

    In this presentation I am going to talk about Parsing in the context of Unified, constraint based Grammars.**Details and Notes to be written after the slides are finished.*The Parsing problem can be summarized as:Given a grammar and a string, does the Grammar predict grammaticality or ungrammaticality of the String?

    *As we already know, HPSG grammar relies heavily on the information stored in complex objects structured in term of attribute-value pairs which must satisfy the constraints of the grammar.*Advantages and Disadvantages of the Unification-based theories:

    It supports a common formalism and data structures on all levels of linguistic descriptionsThe information encoded in all linguistic domains is simultaneously available

    Hard to fragment into modulesHigh computational cost when a Parser uses the complete descriptions*The basic premise of this approach is to develop a more flexible way to use HPSG formalism for Parsing instead of a parsing process with full informational power.

    Two kind of constraints:Genuine constraints, related to the wellformedness of the input, work as filters of the input, they are typically the syntactic constrains.Spurious constraints, related to the output for other components in the system, they build representational structures, typically semantic constraints.

    A parser cannot distinguish between analytical and structure-building constraints and the cost of these operations increases exponentially with the size of the structures analysed.

    In the context of the VERBMOBIL Project, the parser input consist of word lattices of hypotheses from speech recognition. The parser has to identify those paths of acceptable utterances. Lattices can contain several hundreds of hypotheses, most of them unacceptable grammatically ungrammatical.

    *The main point in this strategy is that one of the analysis processes has to work as a filter on the word lattices in order to reduce the search space. In this way the second component works only with successful analysis results. This means that the parsers do not actually work in parallel, but the first is in control of the second one, which is not directly exposed to the input (lattices)

    Processing requirements:Incrementality:Interactivity*The main point in this strategy is that one of the analysis processes has to work as a filter on the word lattices in order to reduce the search space. In this way the second component works only with successful analysis results. This means that the parsers do not actually work in parallel, but the first is in control of the second one, which is not directly exposed to the input (lattices)

    Processing requirements:Incrementality:InteractivityIncrementality and Interactivity imply efficient message exchange between the parsers. The basic idea here is not to lose the gains from the distributed parsing. That means that a communication system based on sharing analysis results in terms of F.S. must be discarded. (only way to encode/decode them is through large strings).

    **Bottom-Up Hypotheses are emitted by the SYN-Parser and sent to the SEM-Parser, for verification at the semantic level.A bottom-Up Hypothesis describes a complete subtree built by the SYN-Parser. Wether a hypothesis is sent also depends on its score, because the SYN-Parser is a probabilistic Chart parser (??implements Viterbi, ?? Early, CYK), that uses a statistic language model as additional knowledge source

    Top-Down Hypothesesthey result from the activity of the SEM- Parser, failures are reported back to the Syn-Parser sending a hypothesis identifierCompletion HistoryThe central data structure by which synchronization and communication between the parser is achieved is the completion history*The main point in this strategy is that one of the analysis processes has to work as a filter on the word lattices in order to reduce the search space. In this way the second component works only with successful analysis results. This means that the parsers do not actually work in parallel, but the first is in control of the second one, which is not directly exposed to the input (lattices) *Main idea is to achieve domain adaptation to improve efficient processing of HPSG Grammars.

    The assumption is that focusing on the frequency and plausibility of linguistic structures with regard to a certain domain, will render better results.

    Data Oriented Parsing (DOP) is based on the idea of processing new input by combining fragments (associated with some probabilities) that are extracted from a treebank. In the simplest case these fragments are subparts of simple phrase structure trees

    DOP allows probabilities to be associated with arbitrarily large syntactic constructions*The basic idea of HPSGDOP is to parse all sentences of a representative training corpus using an HPSG grammar and parser

    in order to automatically acquire from the parsing results a stochastic lexicalized tree grammar (SLTG) such that each resulting parse tree is recursively decomposed into a set of subtrees.

    The decomposition operation is guided by the head feature principle of HPSG. Each extracted tree is automatically lexically anchored and each node label of the extracted tree compactly represents a set of relevant features by means of a simple symbol. For each extracted tree a frequency counter is used to estimate the probability of a tree, after all parse trees have been processed.*The approach has been implemented on top of the LKB (Linguistic Knowledge Building) system, a unificationbased grammar development, parsing and generation platform currently being developed at CSLI OPEN SOURCE software.

    Learning of an SLTG starts by first parsing each sentence si of the training corpus with the source HPSG system. The resulting feature structure fssi of each example also contains the parse tree pti, where each non-terminal node contains the label of the HPSG-rule schema (e.g, head-complement rule) that has been applied during the corresponding derivation step as well as a pointer to the feature structure of the corresponding sign.

    The label of each terminal node consists of the lexical type of the corresponding feature structure. Each parse tree pti is now processed by the following interleaved steps*The first step in processing the parse trees is Decomposition:decomposition operations:Root creates passive (closed, complete) fragments by extracting substructuresFrontier produces active (open, incomplete) fragments by deleting pieces of substructure

    Each parse tree is recursively decomposed into a set of subtrees such thateach non-head subtree is cut off, and the cutting point is marked for substitution.*Specialization The root node as well as all substitution nodes of an extracted tree are further processed by replacing the rule label with a corresponding category label. The possible set of category labels is defined in the type hierarchy of the HPSG source grammar. They express equivalence classes for different phrasal signs.

    For example, phrasal signs whose value of the local.cat.head feature is of type noun, and whose value of the local. cat.val.subj feature is the empty list, are classified as NPs.

    Probability For each extracted tree a frequency counter is maintained. After all parse trees of the training set have been decomposed and specialized, we estimate a trees probability as follows: First we count the total number n of all trees which have the same root label a. Then we divide the frequency number m of a tree t with root a by n. This gives the probability p(t).

    In doing so, the sum of all probabilities of trees ti with root is 1 (i.e., ti:root(ti)= p(ti) = 1).

    Since we have only the substitution operation, the probability of a derivation will be the product of the probabilities of the trees which are involved in the derivation, and the probability of a parse tree is the sum of the probabilities of all derivations.*The two major steps are:

    tree selection: selection of a set of SLTG-trees associated with the lexical items in the input sentence, tree combination: parsing of the sentence with respect to this set.

    After parsing, each SLTG-parse tree is expanded by unifying the feature constraints ofthe HPSG source grammar into the parse trees. If successful, it determines a complete validfeature structure wrt. the HPSG grammar (e.g., all agreement and semantic information). Ifunification fails, the SLTG parse tree is rejected, and the next most likely tree is expanded.*This paper presents a unified framework of parsing to obtain the Viterbi parse given an HPSG and its probabilistic model.

    We define the equivalence class function to reduce multiple feature structures to a single feature structure that gives the same resultingfigure-of-merit (FOM).

    With this function, the parser can integrate semantic and syntactic preference into FOMs during parsing, and reduce the search space by using the integrated FOMs.*In mathematics, a partial function is a binary relation that associates each element of a set, sometimes called its domain, with at most one element of another (possibly the same) set, called its codomain. However, not every element of the domain has to be associated with an element of the codomain.

    In HPSG, a small number of schemata explain general grammatical constraints, while a large numberof lexical entries express word-specific characteristics.Both schemata and lexical entries are represented by typed feature structures, and constraintsrepresented by feature structures are checked with unification (for details, see (Pollard and Sag, 1994)).

    Figure 1 shows an example of HPSG parsing of the sentence Spring has come. First, each of the lexical entries for has and come is unified with a daughter feature structure of the Head-Complement Schema. Unification provides the phrasal sign of themother. The sign of the larger constituent is obtained by repeatedly applying schemata to lexical/phrasal signs. Finally, the parse result is output as a phrasal sign that dominates the entire sentence.***Comparison of the three Approaches, posibilities, pro + cons / How do each of them approximate to our Course Hpsg

    Distributed Parsing although unification and coping becomes faster at processing (due to smaller structures), the main drawback of this approach is that it does not guarantees soundness anymore,the subgrammars can accept input that is ruked out by the full grammar (because some constraints are neglected in the subgrammars.

    The intersection of the languages accepted by by Gsem and Gsyn doesnt yield the Language accepted by G

    DATA ORIENTED Parsingit has a number of attractions. (i) It is able to produce fragments are at the right level of generality, so as to avoid both under- and over-generation, and avoid the combinatory explosion of fragments which accompanies use of a Discard operation; (ii) the probability model (relative frequency estimation) is very simple and easy to understand; (iii) the model is probabilistically well behaved, in the sense of defining a genuine probability distribution over fragmentsand representations. Notice that this means that it may in principle provide a general foundation for probabilistic HPSG.*