An Extended GHKM Algorithm for Inducing λ -SCFG
description
Transcript of An Extended GHKM Algorithm for Inducing λ -SCFG
An Extended GHKM Algorithm forInducing λ-SCFG
Peng [email protected] University
Semantic Parsing
• Mapping natural language (NL) sentence to its computable meaning representation (MR)
NL: Every boy likes a star
MR:
variablepredicate
Motivation
• Common way: inducing probabilistic grammarPCFG: Probabilistic Context Free Grammar
Motivation
• Common way: inducing probabilistic grammarCCG: Combinatory Categorial Grammar
Motivation
• Common way: inducing probabilistic grammarSCFG: Synchronous Context Free Grammar
Motivation
• State of the art: SCFG + λ-calculus (λ-SCFG)• Major challenge: grammar induction– It is much harder to find the correspondence between
NL sentence and MR than between NL sentences• SCFG rule extraction is well-studied in MT• GHKM is the most widely used algorithm• We want to adapt GHKM to semantic parsing• Experimental results show that we get the state-
of-the-art performance
Background
• State of the art: SCFG + λ-calculus (λ-SCFG)
• λ-calculus– λ-expression: – β-conversion: bound variable substitution
– α-conversion: bound variable renaming
λ-SCFG Rule Extraction
• Outline1. Building training examples
1. Transforming logical forms to trees2. Aligning trees with sentences
2. Identifying frontier nodes3. Extracting minimal rules4. Extracting composed rules
Building Training Examples
NL: Every boy likes a star
MR:
Building Training Examples
Building Training Examples
Building Training Examples
boy human pop like
Building Training Examplesboy human pop like
Every boy likes a star
Identifying Frontier Nodes
Identifying Frontier Nodes
Identifying Frontier Nodes
Identifying Minimal Frontier Tree
Identifying Minimal Frontier Tree
Identifying Minimal Frontier Tree
Identifying Minimal Frontier Tree
Minimal Rule Extraction
X
X
Minimal Rule Extraction
X
X
Minimal Rule Extraction
X
X
Composed Rule Extraction
λ-SCFG Rule Extraction
• Outline1. Building training examples
1. Transforming logical forms to trees2. Aligning trees with sentences
2. Identifying frontier nodes3. Extracting minimal rules4. Extracting composed rules
Modeling
• Log-linear model + MERT training
• Target
Experiments
• Dataset: GEOQUERY– 880 English questions with corresponding Prolog
logical form–
• Metric
Experiments
SCFG
PCFG
CCG
Experiments
• F-measure for different languages
* en - English, ge - German, el - Greek, th - Thai
Experiments
Experiments