A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann...

12
A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011

Transcript of A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann...

Page 1: A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011.

A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

(ACL 2011)

Andreas Zollmann and Stephan Vogel

Presented by Yun Huang01/07/2011

Page 2: A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011.

2

Background

• PSCFG (Chiang 2005,2007)– Rules: X => (γ/α/ w)

• X=>( 和 沙龙 举行 会谈 / held talk with Sharon)

• X=>( 和 X1 举行 会谈 / held talk with X1)

• X=>( 和 X1 举行 X2 / held X2 with X1)

– Glue rules:• S=>(X / X)• S=>(S X / S X)

– Decoding: cube-pruning, etc.

Page 3: A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011.

3

Motivation

• Only S and X are used in PSCFG, can not model different rule categories. Example:– X=>( 和 X1 举行 X2 / held X2 with X1)

– No difference between X1 and X2

• Maybe we want …– VP=>( 和 PRP 举行 NP / held NP with PRP)

• Idea: multi-label PSCFG.

• How to label hierarchical phrases?

Page 4: A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011.

4

• Simple: boundary (POS) tags– I[PRP] saw[VBD] him[PRP]

• Extracted rules:– PRP-PRP => (ich / I)– PRP-PRP => (ihn / him)– VBD-VBD => (gesehen / saw)– VBD-PRP => (habe ihn gesehen / saw him)– VBD-PRP => (Ich habe ihn gesehen / I saw him)– VBD-PRP => (habe PRP-PRP gesehen / saw PRP-P

RP

Labeling from word classes(1/4)

Page 5: A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011.

5

Labeling from word classes(2/4)

• Accounting for phrase size– 1-word

• PRP=>(Ich | I)• PRP=>(ihn | him)

– 2-word• VBD-PRP => (habe ihn gesehen / saw him)• VBD-PRP => (habe PRP gesehen / saw PRP)

– multiple-word• VBD..PRP => (Ich habe ihn gesehen / I saw him)• VBD..PRP => (Ich habe PRP gesehen / I saw PRP)

Page 6: A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011.

6

Labeling from word classes(3/4)

• Bilingually tagged corpus– Ich[PRP] habe[AUX] ihn[PRP] gesehen[VBN]– I[PRP] saw[VBD] him[PRP]

• Extracted rules: (“src label+tgt label”)– PRP+PRP => (ich / I)– PRP+PRP => (ihn / him)– VBN+VBD => (gesehen / saw)– AUX..VBN+VBD-PRP => (habe ihn gesehen / saw him)– PRP..VBN+PRP..PRP => (Ich habe ihn gesehen / I saw him)– AUX..VBN+VBD-PRP => (habe PRP+PRP gesehen / saw PRP+

PRP

Page 7: A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011.

7

Labeling from word classes(4/4)

• Unsupervised word class clustering– MKCLS– Morphological information

• Problems of word classes:– Huge grammar size– Data sparseness– Solution: directly clustering rules

Page 8: A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011.

8

Clustering phrase pairs

• Directly clustering phrase pairs

• K-means clustering algorithm

Page 9: A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011.

9

Experiments

Baseline

PTB POS Tags

WordClass Clustering

Phrase Clustering

Page 10: A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011.

10

Experiments

Page 11: A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011.

11

Related Work

• JHU workshop 2010– http://www.clsp.jhu.edu/workshops/ws10/grou

ps/msgismt/

• Other approaches– Phrase clustering– Syntax-augmented MT

• Source code:– SAMT system

Page 12: A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011.

12

Problems

• Too simple, sometimes naïve.– Simple features– Simple clustering method– How to control model complexity

• Future work– Other learning method instead of clustering– Combining hierarchical phrase based model

with syntactical trees