Model Combination for Machine...
Transcript of Model Combination for Machine...
![Page 1: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/1.jpg)
research
Model Combinationfor Machine Translation
John DeNero, Shankar Kumar, Ciprian Chelba, and Franz Josef Och
Monday, June 7, 2010
![Page 2: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/2.jpg)
Motivation
A statistical machine translation model scores derivations
(log) model score sums language model and translation model
Monday, June 7, 2010
![Page 3: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/3.jpg)
Motivation
A statistical machine translation model scores derivations
θ · φ(d) = θ ·
�
w∈n-grams(d)
φlm(w) +�
r∈rules(d)
φtm(r)
(log) model score sums language model and translation model
Monday, June 7, 2010
![Page 4: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/4.jpg)
Motivation
A statistical machine translation model scores derivations
θ · φ(d) = θ ·
�
w∈n-grams(d)
φlm(w) +�
r∈rules(d)
φtm(r)
(log) model score sums language model and translation model
Monday, June 7, 2010
![Page 5: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/5.jpg)
Motivation
A statistical machine translation model scores derivations
θ · φ(d) = θ ·
�
w∈n-grams(d)
φlm(w) +�
r∈rules(d)
φtm(r)
(log) model score sums language model and translation model
Monday, June 7, 2010
![Page 6: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/6.jpg)
Motivation
A statistical machine translation model scores derivations
We can improve over max-derivation decoding
θ · φ(d) = θ ·
�
w∈n-grams(d)
φlm(w) +�
r∈rules(d)
φtm(r)
(log) model score sums language model and translation model
Monday, June 7, 2010
![Page 7: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/7.jpg)
Motivation
A statistical machine translation model scores derivations
• Consensus decoding (e.g., minimum Bayes risk)
We can improve over max-derivation decoding
θ · φ(d) = θ ·
�
w∈n-grams(d)
φlm(w) +�
r∈rules(d)
φtm(r)
(log) model score sums language model and translation model
Monday, June 7, 2010
![Page 8: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/8.jpg)
Motivation
A statistical machine translation model scores derivations
• Consensus decoding (e.g., minimum Bayes risk)�
We can improve over max-derivation decoding
θ · φ(d) = θ ·
�
w∈n-grams(d)
φlm(w) +�
r∈rules(d)
φtm(r)
(log) model score sums language model and translation model
Monday, June 7, 2010
![Page 9: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/9.jpg)
Motivation
A statistical machine translation model scores derivations
• Consensus decoding (e.g., minimum Bayes risk)
• System combination (e.g., confusion networks)
�We can improve over max-derivation decoding
θ · φ(d) = θ ·
�
w∈n-grams(d)
φlm(w) +�
r∈rules(d)
φtm(r)
(log) model score sums language model and translation model
Monday, June 7, 2010
![Page 10: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/10.jpg)
Motivation
A statistical machine translation model scores derivations
• Consensus decoding (e.g., minimum Bayes risk)
• System combination (e.g., confusion networks)
�We can improve over max-derivation decoding
θ · φ(d) = θ ·
�
w∈n-grams(d)
φlm(w) +�
r∈rules(d)
φtm(r)
(log) model score sums language model and translation model
Monday, June 7, 2010
![Page 11: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/11.jpg)
Motivation
A statistical machine translation model scores derivations
• Consensus decoding (e.g., minimum Bayes risk)
• System combination (e.g., confusion networks)
�We can improve over max-derivation decoding
In this work, we develop a technique that integrates both
θ · φ(d) = θ ·
�
w∈n-grams(d)
φlm(w) +�
r∈rules(d)
φtm(r)
(log) model score sums language model and translation model
Monday, June 7, 2010
![Page 12: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/12.jpg)
Consensus Decoding
Derivation scores can be interpreted as probabilities
P(d|f) =exp (θ · φ(d))�d� exp (θ · φ(d�))
Monday, June 7, 2010
![Page 13: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/13.jpg)
Consensus Decoding
Derivation scores can be interpreted as probabilities
We can query this distribution for:
P(d|f) =exp (θ · φ(d))�d� exp (θ · φ(d�))
Monday, June 7, 2010
![Page 14: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/14.jpg)
Consensus Decoding
Derivation scores can be interpreted as probabilities
We can query this distribution for:
P(d|f) =exp (θ · φ(d))�d� exp (θ · φ(d�))
Whole translations [Blunsom et al., ’08]:
emax-trans = arg maxe
�
d:σe(d)=e
P(d|f)
Monday, June 7, 2010
![Page 15: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/15.jpg)
Consensus Decoding
Derivation scores can be interpreted as probabilities
We can query this distribution for:
P(d|f) =exp (θ · φ(d))�d� exp (θ · φ(d�))
Whole translations [Blunsom et al., ’08]:
emax-trans = arg maxe
�
d:σe(d)=e
P(d|f)
N-gram overlap [Kumar and Byrne, ’04]:
embr = arg maxe
E [bleu(e;σe(d))]
Monday, June 7, 2010
![Page 16: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/16.jpg)
Consensus Decoding
Derivation scores can be interpreted as probabilities
We can query this distribution for:
P(d|f) =exp (θ · φ(d))�d� exp (θ · φ(d�))
Free BLEU
Points
Whole translations [Blunsom et al., ’08]:
emax-trans = arg maxe
�
d:σe(d)=e
P(d|f)
N-gram overlap [Kumar and Byrne, ’04]:
embr = arg maxe
E [bleu(e;σe(d))]
Monday, June 7, 2010
![Page 17: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/17.jpg)
System Combination
We often have multiple translation systems
Monday, June 7, 2010
![Page 18: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/18.jpg)
System Combination
We often have multiple translation systems
el perro comí mi tarea
Monday, June 7, 2010
![Page 19: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/19.jpg)
System Combination
We often have multiple translation systems
el perro comí mi tarea
1
2
3
Monday, June 7, 2010
![Page 20: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/20.jpg)
System Combination
We often have multiple translation systems
the dog bit my work
the dog ate homework
dog ate homeworkmy
el perro comí mi tarea
1
2
3
Monday, June 7, 2010
![Page 21: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/21.jpg)
System Combination
We often have multiple translation systems
el perro comí mi tarea
the dog bit my work
the dog ate homework
dog ate homeworkmy
1
2
3
Monday, June 7, 2010
![Page 22: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/22.jpg)
System Combination
We often have multiple translation systems
el perro comí mi tarea
the dog bit my work
the dog ate homework
dog ate homeworkmy
1
2
3
Monday, June 7, 2010
![Page 23: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/23.jpg)
System Combination
We often have multiple translation systems
el perro comí mi tarea
the dog bit my work
the dog ate homework
dog ate homeworkmy
Combiners assume little about systems
1
2
3
Monday, June 7, 2010
![Page 24: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/24.jpg)
System Combination
We often have multiple translation systems
el perro comí mi tarea
the dog bit my work
the dog ate homework
dog ate homeworkmy
Combiners assume little about systems
Objectives similar to consensus decoding
1
2
3
Monday, June 7, 2010
![Page 25: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/25.jpg)
System Combination
We often have multiple translation systems
el perro comí mi tarea
the dog bit my work
the dog ate homework
dog ate homeworkmy
Combiners assume little about systems
Objectives similar to consensus decoding
1
2
3
More BLEU
Points
Monday, June 7, 2010
![Page 26: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/26.jpg)
Model Combination
Monday, June 7, 2010
![Page 27: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/27.jpg)
Model Combination
f
Monday, June 7, 2010
![Page 28: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/28.jpg)
Model Combination
1
2
3
f
Monday, June 7, 2010
![Page 29: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/29.jpg)
Model Combination
1
2
3
P1(d|f)
P2(d|f)
P3(d|f)
f
Monday, June 7, 2010
![Page 30: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/30.jpg)
Model Combination
1
2
3
P1(d|f)
P2(d|f)
P3(d|f)
f
Consensus
Consensus
Consensus
e∗1
e∗2
e∗3
Monday, June 7, 2010
![Page 31: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/31.jpg)
Model Combination
1
2
3
P1(d|f)
P2(d|f)
P3(d|f)
f
Consensus
Consensus
Consensus
e∗1
e∗2
e∗3
System combo
e
Monday, June 7, 2010
![Page 32: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/32.jpg)
Model Combination
1
2
3
P1(d|f)
P2(d|f)
P3(d|f)
f
Consensus
Consensus
Consensus
e∗1
e∗2
e∗3
System combo
e
Monday, June 7, 2010
![Page 33: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/33.jpg)
Model Combination
Consensus decoding with multiple models
1
2
3
P1(d|f)
P2(d|f)
P3(d|f)
f
Consensus
Consensus
Consensus
e∗1
e∗2
e∗3
System combo
e
Monday, June 7, 2010
![Page 34: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/34.jpg)
Model Combination
Consensus decoding with multiple models
Distribution-driven approach to system combination
1
2
3
P1(d|f)
P2(d|f)
P3(d|f)
f
Consensus
Consensus
Consensus
e∗1
e∗2
e∗3
System combo
e
Monday, June 7, 2010
![Page 35: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/35.jpg)
Model Combination
Consensus decoding with multiple models
Distribution-driven approach to system combination
Unifies consensus and combination objectives
1
2
3
P1(d|f)
P2(d|f)
P3(d|f)
f
Consensus
Consensus
Consensus
e∗1
e∗2
e∗3
System combo
e
Monday, June 7, 2010
![Page 36: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/36.jpg)
Outline
Consensus decoding review
Our model combination technique
Comparison to system combination
Monday, June 7, 2010
![Page 37: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/37.jpg)
Outline
Consensus decoding review
Our model combination technique
Comparison to system combination
With new
algorithms
and results
Monday, June 7, 2010
![Page 38: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/38.jpg)
Forest-Based Consensus DecodingReview
Monday, June 7, 2010
![Page 39: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/39.jpg)
Forest-Based Consensus Decoding
Build a forest that encodes the model posterior1
P(d|f) =
Review
Monday, June 7, 2010
![Page 40: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/40.jpg)
Forest-Based Consensus Decoding
Yo vi al hombre con el telescopio
I saw the man with {a,the} telescope
Build a forest that encodes the model posterior1
P(d|f) =
Review
Monday, June 7, 2010
![Page 41: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/41.jpg)
Forest-Based Consensus Decoding
Yo vi al hombre con el telescopio
I saw the man with {a,the} telescope
I ... telescope
“I saw the man with {a,the} telescope”
Build a forest that encodes the model posterior1
P(d|f) =
Review
Monday, June 7, 2010
![Page 42: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/42.jpg)
Forest-Based Consensus Decoding
Yo vi al hombre con el telescopio
I saw the man with {a,the} telescope
I ... telescope
I ... telescope
“I saw the man with {a,the} telescope”
I ... man
“I saw with {a,the} telescope the man”
Build a forest that encodes the model posterior1
P(d|f) =
Review
Monday, June 7, 2010
![Page 43: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/43.jpg)
Forest-Based Consensus Decoding
Yo vi al hombre con el telescopio
I saw the man with {a,the} telescope
I ... telescope
I ... telescope
“I saw the man with {a,the} telescope”
I ... man
“I saw with {a,the} telescope the man”
Build a forest that encodes the model posterior1
Compute n-gram statistics from the posterior 2
P(d|f) =
Review
Monday, June 7, 2010
![Page 44: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/44.jpg)
Forest-Based Consensus Decoding
Yo vi al hombre con el telescopio
I saw the man with {a,the} telescope
I ... telescope
I ... telescope
“I saw the man with {a,the} telescope”
I ... man
“I saw with {a,the} telescope the man”
Build a forest that encodes the model posterior1
Compute n-gram statistics from the posterior 2
P(d|f) =
Review
Monday, June 7, 2010
![Page 45: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/45.jpg)
Forest-Based Consensus Decoding
Yo vi al hombre con el telescopio
I saw the man with {a,the} telescope
I ... telescope
I ... telescope
“I saw the man with {a,the} telescope”
I ... man
“I saw with {a,the} telescope the man”
P(edge|f)
Build a forest that encodes the model posterior1
Compute n-gram statistics from the posterior 2
P(d|f) =
Review
Monday, June 7, 2010
![Page 46: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/46.jpg)
Forest-Based Consensus Decoding
Yo vi al hombre con el telescopio
I saw the man with {a,the} telescope
I ... telescope
I ... telescope
“I saw the man with {a,the} telescope”
I ... man
“I saw with {a,the} telescope the man”
“telescope the”
P(edge|f)
Build a forest that encodes the model posterior1
Compute n-gram statistics from the posterior 2
P(d|f) =
Review
Monday, June 7, 2010
![Page 47: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/47.jpg)
Forest-Based Consensus Decoding
Yo vi al hombre con el telescopio
I saw the man with {a,the} telescope
I ... telescope
I ... telescope
“I saw the man with {a,the} telescope”
I ... man
“I saw with {a,the} telescope the man”
“telescope the”
P(edge|f)
Build a forest that encodes the model posterior1
Compute n-gram statistics from the posterior 2
Optimize a consensus objective using these statistics3
P(d|f) =
Review
Monday, June 7, 2010
![Page 48: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/48.jpg)
Types of Efficient Consensus Techniques
[Tromble et al., ’08]
Lattice Minimum Bayes-Risk Decoding
Review
Monday, June 7, 2010
![Page 49: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/49.jpg)
Types of Efficient Consensus Techniques
[Tromble et al., ’08]
Lattice Minimum Bayes-Risk Decoding
Posteriors Expected counts
N-gram Statistics
Review
Monday, June 7, 2010
![Page 50: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/50.jpg)
Types of Efficient Consensus Techniques
[Tromble et al., ’08]
Lattice Minimum Bayes-Risk Decoding
Lear
ned
Fixe
d
Con
sens
us O
bjec
tive
Posteriors Expected counts
N-gram Statistics
Review
Monday, June 7, 2010
![Page 51: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/51.jpg)
Types of Efficient Consensus Techniques
[Tromble et al., ’08]
Lattice Minimum Bayes-Risk Decoding
[DeNero et al., ’09]
Fast Consensus Decoding
[Kumar et al., ’09]
Minimum Bayes-Risk Decoding for Hypergraphs
[Li et al., ’09]
Variational Decoding for Machine Translation
Lear
ned
Fixe
d
Con
sens
us O
bjec
tive
Posteriors Expected counts
N-gram Statistics
Review
Monday, June 7, 2010
![Page 52: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/52.jpg)
Learned Objectives are Better than FixedReview
Monday, June 7, 2010
![Page 53: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/53.jpg)
Learned Objectives are Better than Fixed
C(d) =4�
n=1
w(n)�
g∈n-grams
c(g, d) · P(g|f) + w(�)|σe(d)| + w(b)θ · φ(d)
Review
Monday, June 7, 2010
![Page 54: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/54.jpg)
Learned Objectives are Better than Fixed
C(d) =4�
n=1
w(n)�
g∈n-grams
c(g, d) · P(g|f) + w(�)|σe(d)| + w(b)θ · φ(d)
N-gram statistics
Review
Monday, June 7, 2010
![Page 55: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/55.jpg)
Learned Objectives are Better than Fixed
C(d) =4�
n=1
w(n)�
g∈n-grams
c(g, d) · P(g|f) + w(�)|σe(d)| + w(b)θ · φ(d)
N-gram statistics Length
Review
Monday, June 7, 2010
![Page 56: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/56.jpg)
Learned Objectives are Better than Fixed
C(d) =4�
n=1
w(n)�
g∈n-grams
c(g, d) · P(g|f) + w(�)|σe(d)| + w(b)θ · φ(d)
N-gram statistics Length Base model
Review
Monday, June 7, 2010
![Page 57: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/57.jpg)
Learned Objectives are Better than Fixed
C(d) =4�
n=1
w(n)�
g∈n-grams
c(g, d) · P(g|f) + w(�)|σe(d)| + w(b)θ · φ(d)
Objective parameters
N-gram statistics Length Base model
Review
Monday, June 7, 2010
![Page 58: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/58.jpg)
Learned Objectives are Better than Fixed
C(d) =4�
n=1
w(n)�
g∈n-grams
c(g, d) · P(g|f) + w(�)|σe(d)| + w(b)θ · φ(d)
Objective parameters
N-gram statistics Length Base model
Choose such thatwFixed: Cw(d) ≈ E [bleu(d)]
Review
Monday, June 7, 2010
![Page 59: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/59.jpg)
Learned Objectives are Better than Fixed
C(d) =4�
n=1
w(n)�
g∈n-grams
c(g, d) · P(g|f) + w(�)|σe(d)| + w(b)θ · φ(d)
Objective parameters
N-gram statistics Length Base model
Choose such thatwFixed: Cw(d) ≈ E [bleu(d)]
Choose to maximizewLearned: bleu
��arg max
dCw(d)
�; e
�
[Kumar et al., ’09]
Review
Monday, June 7, 2010
![Page 60: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/60.jpg)
Learned Objectives are Better than Fixed
C(d) =4�
n=1
w(n)�
g∈n-grams
c(g, d) · P(g|f) + w(�)|σe(d)| + w(b)θ · φ(d)
Objective parameters
N-gram statistics Length Base model
Choose such thatwFixed: Cw(d) ≈ E [bleu(d)]
Choose to maximizewLearned: bleu
��arg max
dCw(d)
�; e
�References
[Kumar et al., ’09]
Review
Monday, June 7, 2010
![Page 61: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/61.jpg)
Learned Objectives are Better than Fixed
C(d) =4�
n=1
w(n)�
g∈n-grams
c(g, d) · P(g|f) + w(�)|σe(d)| + w(b)θ · φ(d)
Objective parameters
N-gram statistics Length Base model
Choose such thatwFixed: Cw(d) ≈ E [bleu(d)]
Choose to maximizewLearned: bleu
��arg max
dCw(d)
�; e
�References
Increased test set BLEU by ≥0.2Decreased test set BLEU by ≥0.2
Consensus performance versus max-derivation decoding (39 pairs)
[Kumar et al., ’09]
Review
Monday, June 7, 2010
![Page 62: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/62.jpg)
Learned Objectives are Better than Fixed
C(d) =4�
n=1
w(n)�
g∈n-grams
c(g, d) · P(g|f) + w(�)|σe(d)| + w(b)θ · φ(d)
Objective parameters
N-gram statistics Length Base model
Choose such thatwFixed: Cw(d) ≈ E [bleu(d)]
Choose to maximizewLearned: bleu
��arg max
dCw(d)
�; e
�References
Increased test set BLEU by ≥0.2Decreased test set BLEU by ≥0.2
Consensus performance versus max-derivation decoding (39 pairs)
[Kumar et al., ’09]
1218
Fixed
Review
Monday, June 7, 2010
![Page 63: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/63.jpg)
Learned Objectives are Better than Fixed
C(d) =4�
n=1
w(n)�
g∈n-grams
c(g, d) · P(g|f) + w(�)|σe(d)| + w(b)θ · φ(d)
Objective parameters
N-gram statistics Length Base model
Choose such thatwFixed: Cw(d) ≈ E [bleu(d)]
Choose to maximizewLearned: bleu
��arg max
dCw(d)
�; e
�References
Increased test set BLEU by ≥0.2Decreased test set BLEU by ≥0.2
Consensus performance versus max-derivation decoding (39 pairs)
[Kumar et al., ’09]
1218
5
26
Fixed Learned
Review
Monday, June 7, 2010
![Page 64: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/64.jpg)
N-gram Posteriors can also be Computed Quickly
Monday, June 7, 2010
![Page 65: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/65.jpg)
N-gram Posteriors can also be Computed Quickly
P(g|f) = 1.0− P(g|f)
Monday, June 7, 2010
![Page 66: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/66.jpg)
N-gram Posteriors can also be Computed Quickly
I ... telescope
I saw
the man with {a,the} telescope
“I saw the man with {a,the} telescope”
P(g|f) = 1.0− P(g|f)
Monday, June 7, 2010
![Page 67: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/67.jpg)
N-gram Posteriors can also be Computed Quickly
I ... telescope
I saw
the man with {a,the} telescope
“I saw the man with {a,the} telescope”
eθ·φ(edge) = 0.2
“saw the”
P(g|f) = 1.0− P(g|f)
Monday, June 7, 2010
![Page 68: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/68.jpg)
Inside: 0.1
...
“I saw” : 0“with the” : 0.1
N-gram Posteriors can also be Computed Quickly
I ... telescope
I saw
the man with {a,the} telescope
“I saw the man with {a,the} telescope”
eθ·φ(edge) = 0.2
“saw the”
P(g|f) = 1.0− P(g|f)
Monday, June 7, 2010
![Page 69: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/69.jpg)
Inside: 0.1
...
“I saw” : 0“with the” : 0.1
N-gram Posteriors can also be Computed Quickly
I ... telescope
I saw
the man with {a,the} telescope
“I saw the man with {a,the} telescope”
eθ·φ(edge) = 0.2
“saw the”
Derivations that don’t contain “with the”
P(g|f) = 1.0− P(g|f)
Monday, June 7, 2010
![Page 70: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/70.jpg)
Inside: 0.1
...
“I saw” : 0“with the” : 0.1
N-gram Posteriors can also be Computed Quickly
I ... telescope
I saw
the man with {a,the} telescope
“I saw the man with {a,the} telescope”
eθ·φ(edge) = 0.2
“saw the”Inside: 0.5
“with a” : 0.3“with the” : 0.2
...Derivations that don’t
contain “with the”
P(g|f) = 1.0− P(g|f)
Monday, June 7, 2010
![Page 71: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/71.jpg)
Inside: 0.1
...
“I saw” : 0“with the” : 0.1
N-gram Posteriors can also be Computed Quickly
I ... telescope
I saw
the man with {a,the} telescope
“I saw the man with {a,the} telescope”
eθ·φ(edge) = 0.2
“saw the”Inside: 0.5
“with a” : 0.3“with the” : 0.2
...
Inside: 0.01“saw the” : 0
“I saw” : 0“with the” : 0.004
...
Derivations that don’t contain “with the”
P(g|f) = 1.0− P(g|f)
Monday, June 7, 2010
![Page 72: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/72.jpg)
Inside: 0.1
...
“I saw” : 0“with the” : 0.1
N-gram Posteriors can also be Computed Quickly
I ... telescope
I saw
the man with {a,the} telescope
“I saw the man with {a,the} telescope”
eθ·φ(edge) = 0.2
“saw the”Inside: 0.5
“with a” : 0.3“with the” : 0.2
...
Inside: 0.01“saw the” : 0
“I saw” : 0“with the” : 0.004
...
Derivations that don’t contain “with the”
P(g|f) = 1.0− P(g|f)
Monday, June 7, 2010
![Page 73: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/73.jpg)
Inside: 0.1
...
“I saw” : 0“with the” : 0.1
N-gram Posteriors can also be Computed Quickly
Audience challenge: What semiring computes n-gram posteriors?
I ... telescope
I saw
the man with {a,the} telescope
“I saw the man with {a,the} telescope”
eθ·φ(edge) = 0.2
“saw the”Inside: 0.5
“with a” : 0.3“with the” : 0.2
...
Inside: 0.01“saw the” : 0
“I saw” : 0“with the” : 0.004
...
Derivations that don’t contain “with the”
P(g|f) = 1.0− P(g|f)
Monday, June 7, 2010
![Page 74: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/74.jpg)
Results for Learned Consensus Decoding
Constrained data track of the 2008 NIST MT task
max-derivation posterior probabilities expected counts
Learned consensus decoding techniquesBaseline
Monday, June 7, 2010
![Page 75: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/75.jpg)
Results for Learned Consensus Decoding
Constrained data track of the 2008 NIST MT task
max-derivation posterior probabilities expected counts
Learned consensus decoding techniquesBaseline
Phrase-Based
Hiero
SAMT
40 42 44 46
44.4
43.8
44.6
44.5
43.8
44.6
43.8
43.3
43.9
Arabic-to-English
Monday, June 7, 2010
![Page 76: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/76.jpg)
Results for Learned Consensus Decoding
Constrained data track of the 2008 NIST MT task
max-derivation posterior probabilities expected counts
Learned consensus decoding techniquesBaseline
Phrase-Based
Hiero
SAMT
40 42 44 46
44.4
43.8
44.6
44.5
43.8
44.6
43.8
43.3
43.9
Arabic-to-English
24 26 28 30
28.7
28.2
27.2
28.8
27.8
27.3
28.4
27.2
25.4
Chinese-to-English
Monday, June 7, 2010
![Page 77: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/77.jpg)
Outline
Consensus decoding review
Our model combination technique
Comparison to system combination
Monday, June 7, 2010
![Page 78: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/78.jpg)
Outline
Consensus decoding review
Our model combination technique
Comparison to system combination
All in
One Slide!
Monday, June 7, 2010
![Page 79: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/79.jpg)
Extending Consensus Decoding to Multiple Models
1. Build posterior forests
Monday, June 7, 2010
![Page 80: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/80.jpg)
Extending Consensus Decoding to Multiple Models
Phrase-based
fHiero- style
P1(d|f) :
P2(d|f) :
1. Build posterior forests
Monday, June 7, 2010
![Page 81: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/81.jpg)
Extending Consensus Decoding to Multiple Models
C(d) =4�
n=1
w(n)�
g∈n-grams
c(g, d) · P(g|f) + w(�)|σe(d)| + w(b)θ · φ(d)
N-gram statistics Length Base model
Phrase-based
fHiero- style
P1(d|f) :
P2(d|f) :
1. Build posterior forests
Monday, June 7, 2010
![Page 82: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/82.jpg)
Extending Consensus Decoding to Multiple Models
N-gram statistics Length Base model
Phrase-based
fHiero- style
P1(d|f) :
P2(d|f) :
C(d) = +w(�)|σe(d)| + w(b)b(d)4�
n=1
w(n)v(n)(d)
1. Build posterior forests
Monday, June 7, 2010
![Page 83: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/83.jpg)
Extending Consensus Decoding to Multiple Models
N-gram statistics Length Base model
Phrase-based
fHiero- style
P1(d|f) :
P2(d|f) :
1. Build posterior forests
C(d) = +w(�)|σe(d)| + w(b)b(d)I�
i=1
4�
n=1
w(n)i v(n)
i (d)
Sum over models
Monday, June 7, 2010
![Page 84: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/84.jpg)
Extending Consensus Decoding to Multiple Models
N-gram statistics Length Base model
Phrase-based
fHiero- style
P1(d|f) :
P2(d|f) :
1. Build posterior forests
2. Compute n-gram statistics from forests
C(d) = +w(�)|σe(d)| + w(b)b(d)I�
i=1
4�
n=1
w(n)i v(n)
i (d)
Sum over models
Monday, June 7, 2010
![Page 85: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/85.jpg)
Extending Consensus Decoding to Multiple Models
N-gram statistics Length Base model
Phrase-based
fHiero- style
P1(d|f) :
P2(d|f) :
1. Build posterior forests
2. Compute n-gram statistics from forests
3. Union all forests
C(d) = +w(�)|σe(d)| + w(b)b(d)I�
i=1
4�
n=1
w(n)i v(n)
i (d)
Sum over models
Monday, June 7, 2010
![Page 86: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/86.jpg)
Extending Consensus Decoding to Multiple Models
N-gram statistics Length Base model
Phrase-based
fHiero- style
P1(d|f) :
P2(d|f) :
1. Build posterior forests
2. Compute n-gram statistics from forests
3. Union all forests
C(d) = +w(�)|σe(d)| + w(b)b(d)I�
i=1
4�
n=1
w(n)i v(n)
i (d)
Sum over models
Monday, June 7, 2010
![Page 87: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/87.jpg)
Extending Consensus Decoding to Multiple Models
N-gram statistics Length Base model
Phrase-based
fHiero- style
P1(d|f) :
P2(d|f) :
1. Build posterior forests
2. Compute n-gram statistics from forests
3. Union all forests
C(d) = +w(�)|σe(d)| + w(b)b(d)I�
i=1
4�
n=1
w(n)i v(n)
i (d) +w(α)i αi(d)
Model choiceSum over models
Monday, June 7, 2010
![Page 88: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/88.jpg)
Extending Consensus Decoding to Multiple Models
N-gram statistics Length Base model
Phrase-based
fHiero- style
P1(d|f) :
P2(d|f) :
1. Build posterior forests
2. Compute n-gram statistics from forests
3. Union all forests
C(d) = +w(�)|σe(d)| + w(b)b(d)I�
i=1
4�
n=1
w(n)i v(n)
i (d) +w(α)i αi(d)
Model choiceSum over models
dIndicator feature for the system that originally generatedModel choice:
Base model: Model score under the system that generated d
Monday, June 7, 2010
![Page 89: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/89.jpg)
Extending Consensus Decoding to Multiple Models
N-gram statistics Length Base model
Phrase-based
fHiero- style
P1(d|f) :
P2(d|f) :
1. Build posterior forests
2. Compute n-gram statistics from forests
3. Union all forests
4. Optimize multi-model consensus objective
C(d) = +w(�)|σe(d)| + w(b)b(d)I�
i=1
4�
n=1
w(n)i v(n)
i (d) +w(α)i αi(d)
Model choiceSum over models
dIndicator feature for the system that originally generatedModel choice:
Base model: Model score under the system that generated d
Monday, June 7, 2010
![Page 90: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/90.jpg)
1. Build posterior forests
2. Compute n-gram statistics from forests
3. Union all forests
4. Optimize multi-model consensus objective
N-gram statistics Length Base model
C(d) = +w(�)|σe(d)| + w(b)b(d)I�
i=1
4�
n=1
w(n)i v(n)
i (d) +w(α)i αi(d)
Model choiceSum over models
dIndicator feature for the system that originally generatedModel choice:
Base model: Model score under the system that generated d
Extending Consensus Decoding to Multiple Models
Monday, June 7, 2010
![Page 91: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/91.jpg)
1. Build posterior forests
2. Compute n-gram statistics from forests
3. Union all forests
4. Optimize multi-model consensus objective
N-gram statistics Length Base model
C(d) = +w(�)|σe(d)| + w(b)b(d)I�
i=1
4�
n=1
w(n)i v(n)
i (d) +w(α)i αi(d)
Model choiceSum over models
dIndicator feature for the system that originally generatedModel choice:
Base model: Model score under the system that generated d
Original features
Extending Consensus Decoding to Multiple Models
Monday, June 7, 2010
![Page 92: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/92.jpg)
1. Build posterior forests
2. Compute n-gram statistics from forests
3. Union all forests
4. Optimize multi-model consensus objective
N-gram statistics Length Base model
C(d) = +w(�)|σe(d)| + w(b)b(d)I�
i=1
4�
n=1
w(n)i v(n)
i (d) +w(α)i αi(d)
Model choiceSum over models
dIndicator feature for the system that originally generatedModel choice:
Base model: Model score under the system that generated d
Original features
Extending Consensus Decoding to Multiple Models
Monday, June 7, 2010
![Page 93: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/93.jpg)
1. Build posterior forests
2. Compute n-gram statistics from forests
3. Union all forests
4. Optimize multi-model consensus objective
N-gram statistics Length Base model
C(d) = +w(�)|σe(d)| + w(b)b(d)I�
i=1
4�
n=1
w(n)i v(n)
i (d) +w(α)i αi(d)
Model choiceSum over models
dIndicator feature for the system that originally generatedModel choice:
Base model: Model score under the system that generated d
Original features
Extending Consensus Decoding to Multiple Models
Monday, June 7, 2010
![Page 94: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/94.jpg)
1. Build posterior forests
2. Compute n-gram statistics from forests
3. Union all forests
4. Optimize multi-model consensus objective
N-gram statistics Length Base model
C(d) = +w(�)|σe(d)| + w(b)b(d)I�
i=1
4�
n=1
w(n)i v(n)
i (d) +w(α)i αi(d)
Model choiceSum over models
dIndicator feature for the system that originally generatedModel choice:
Base model: Model score under the system that generated d
Original features
Extending Consensus Decoding to Multiple Models
Monday, June 7, 2010
![Page 95: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/95.jpg)
Properties of Model Combination
N-gram statistics Length Base model
C(d) = +w(�)|σe(d)| + w(b)b(d)I�
i=1
4�
n=1
w(n)i v(n)
i (d) +w(α)i αi(d)
Model choiceSum over models
Monday, June 7, 2010
![Page 96: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/96.jpg)
Properties of Model Combination
N-gram statistics Length Base model
C(d) = +w(�)|σe(d)| + w(b)b(d)I�
i=1
4�
n=1
w(n)i v(n)
i (d) +w(α)i αi(d)
Model choiceSum over models
Reduces to consensus decoding when we have only one model
Monday, June 7, 2010
![Page 97: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/97.jpg)
Properties of Model Combination
N-gram statistics Length Base model
C(d) = +w(�)|σe(d)| + w(b)b(d)I�
i=1
4�
n=1
w(n)i v(n)
i (d) +w(α)i αi(d)
Model choiceSum over models
Reduces to consensus decoding when we have only one model
A linear model: can be tuned to maximize output performancew
Monday, June 7, 2010
![Page 98: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/98.jpg)
Properties of Model Combination
N-gram statistics Length Base model
C(d) = +w(�)|σe(d)| + w(b)b(d)I�
i=1
4�
n=1
w(n)i v(n)
i (d) +w(α)i αi(d)
Model choiceSum over models
Reduces to consensus decoding when we have only one model
A linear model: can be tuned to maximize output performancew
No concept of a primary system
Monday, June 7, 2010
![Page 99: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/99.jpg)
Properties of Model Combination
N-gram statistics Length Base model
C(d) = +w(�)|σe(d)| + w(b)b(d)I�
i=1
4�
n=1
w(n)i v(n)
i (d) +w(α)i αi(d)
Model choiceSum over models
Reduces to consensus decoding when we have only one model
A linear model: can be tuned to maximize output performancew
No concept of a primary system
Every possible output was a derivation under some original model
Monday, June 7, 2010
![Page 100: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/100.jpg)
Model Combination Experimental Results
Compared three in-house Google systems
Constrained data track of 2008 NIST task
Parameters tuned on NIST 2004 eval set
Max-derivationConsensus
Monday, June 7, 2010
![Page 101: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/101.jpg)
42
43
44
45
46
Phrase Hiero SAMT Combo
45.3
44.5
43.8
44.6
43.8
43.3
43.9
Arabic-to-English BLEU
Model Combination Experimental Results
Compared three in-house Google systems
Constrained data track of 2008 NIST task
Parameters tuned on NIST 2004 eval set
Max-derivationConsensus
Monday, June 7, 2010
![Page 102: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/102.jpg)
42
43
44
45
46
Phrase Hiero SAMT Combo
45.3
44.5
43.8
44.6
43.8
43.3
43.9
Arabic-to-English BLEU
Model Combination Experimental Results
Compared three in-house Google systems
Constrained data track of 2008 NIST task
Parameters tuned on NIST 2004 eval set
+1.4
Max-derivationConsensus
Monday, June 7, 2010
![Page 103: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/103.jpg)
42
43
44
45
46
Phrase Hiero SAMT Combo
45.3
44.5
43.8
44.6
43.8
43.3
43.9
Arabic-to-English BLEU
Model Combination Experimental Results
Compared three in-house Google systems
Constrained data track of 2008 NIST task
Parameters tuned on NIST 2004 eval set
+1.4
24
25
26
27
28
29
30
Phrase Hiero SAMT Combo
29.028.8
27.827.3
28.4
27.2
25.4
Chinese-to-English BLEU
+0.6
Max-derivationConsensus
Monday, June 7, 2010
![Page 104: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/104.jpg)
Sources of Improvement: Arabic-to-English
Monday, June 7, 2010
![Page 105: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/105.jpg)
Sources of Improvement: Arabic-to-English
43 44 45 46
43.9
44.5
Best max-derivation system
Best single-system with consensus decoding
Monday, June 7, 2010
![Page 106: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/106.jpg)
Sources of Improvement: Arabic-to-English
43 44 45 46
43.9
44.5
Best max-derivation system
Do we need model choice features?
Best single-system with consensus decoding
Monday, June 7, 2010
![Page 107: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/107.jpg)
Sources of Improvement: Arabic-to-English
43 44 45 46
43.9
44.5
Best max-derivation system
Do we need model choice features?
Should n-gram statistics be model specific?
Best single-system with consensus decoding
Monday, June 7, 2010
![Page 108: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/108.jpg)
Sources of Improvement: Arabic-to-English
43 44 45 46
43.9
44.5
Best max-derivation system
Do we need model choice features?
Should n-gram statistics be model specific?
Phrase-based
fHiero- style
Best single-system with consensus decoding
Monday, June 7, 2010
![Page 109: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/109.jpg)
Sources of Improvement: Arabic-to-English
43 44 45 46
43.9
44.5
Best max-derivation system
Do we need model choice features?
Should n-gram statistics be model specific?
Phrase-based
fHiero- style
Best single-system with consensus decoding
“Union”
Monday, June 7, 2010
![Page 110: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/110.jpg)
Sources of Improvement: Arabic-to-English
43 44 45 46
43.9
44.5
44.5
Best max-derivation system
Do we need model choice features?
Should n-gram statistics be model specific?
Phrase-based
fHiero- style
Best single-system with consensus decoding
Union + consensus decoding
“Union”
Monday, June 7, 2010
![Page 111: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/111.jpg)
Sources of Improvement: Arabic-to-English
43 44 45 46
43.9
44.5
44.5
44.9
Best max-derivation system
Do we need model choice features?
Should n-gram statistics be model specific?
Phrase-based
fHiero- style
Best single-system with consensus decoding
Union + consensus decoding
Union + consensus + model choice features
“Union”
Monday, June 7, 2010
![Page 112: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/112.jpg)
Sources of Improvement: Arabic-to-English
43 44 45 46
43.9
44.5
44.5
44.9
45.1
Best max-derivation system
Do we need model choice features?
Should n-gram statistics be model specific?
Phrase-based
fHiero- style
Best single-system with consensus decoding
Union + consensus decoding
Union + consensus + model choice features
Union with model-specific n-gram statistics
“Union”
Monday, June 7, 2010
![Page 113: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/113.jpg)
Sources of Improvement: Arabic-to-English
43 44 45 46
43.9
44.5
44.5
44.9
45.1
45.3
Best max-derivation system
Do we need model choice features?
Should n-gram statistics be model specific?
Phrase-based
fHiero- style
Best single-system with consensus decoding
Union + consensus decoding
Union + consensus + model choice features
Union with model-specific n-gram statistics
“Union”
Model-specific & union n-gram statistics
Monday, June 7, 2010
![Page 114: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/114.jpg)
Outline
Consensus decoding review
Our model combination technique
Comparison to system combination
Monday, June 7, 2010
![Page 115: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/115.jpg)
Outline
Consensus decoding review
Our model combination technique
Comparison to system combination
The Final
Showdown
Monday, June 7, 2010
![Page 116: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/116.jpg)
System Combination Baselines
Two established system combination methods [Macherey & Och, ’07]
Monday, June 7, 2010
![Page 117: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/117.jpg)
System Combination Baselines
Two established system combination methods [Macherey & Och, ’07]
P1(d|f)
P2(d|f)
P3(d|f)
f
e∗1
e∗2
e∗3
Consensus
Consensus
Consensus
System combo
e
1
2
3
Monday, June 7, 2010
![Page 118: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/118.jpg)
System Combination Baselines
Sentence-level Choose among system outputs using an MBR objective
Two established system combination methods [Macherey & Och, ’07]
e = arg maxe∈{e∗1 ,...,e∗k}
E [bleu(e)]
P1(d|f)
P2(d|f)
P3(d|f)
f
e∗1
e∗2
e∗3
Consensus
Consensus
Consensus
System combo
e
1
2
3
Monday, June 7, 2010
![Page 119: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/119.jpg)
System Combination Baselines
Sentence-level Choose among system outputs using an MBR objective
Word-level Confusion network approach
Two established system combination methods [Macherey & Och, ’07]
e = arg maxe∈{e∗1 ,...,e∗k}
E [bleu(e)]
P1(d|f)
P2(d|f)
P3(d|f)
f
e∗1
e∗2
e∗3
Consensus
Consensus
Consensus
System combo
e
1
2
3
Monday, June 7, 2010
![Page 120: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/120.jpg)
System Combination Baselines
Sentence-level Choose among system outputs using an MBR objective
Word-level Confusion network approach
Two established system combination methods [Macherey & Och, ’07]
e = arg maxe∈{e∗1 ,...,e∗k}
E [bleu(e)]
P1(d|f)
P2(d|f)
P3(d|f)
f
e∗1
e∗2
e∗3
Consensus
Consensus
Consensus
System combo
e
1
2
3
All are aligned to a backbone e∗i eb ∈ {e∗1, . . . , e∗k}
Monday, June 7, 2010
![Page 121: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/121.jpg)
System Combination Baselines
Sentence-level Choose among system outputs using an MBR objective
Word-level Confusion network approach
Sentences + alignments form a confusion network
Two established system combination methods [Macherey & Och, ’07]
e = arg maxe∈{e∗1 ,...,e∗k}
E [bleu(e)]
P1(d|f)
P2(d|f)
P3(d|f)
f
e∗1
e∗2
e∗3
Consensus
Consensus
Consensus
System combo
e
1
2
3
All are aligned to a backbone e∗i eb ∈ {e∗1, . . . , e∗k}
Monday, June 7, 2010
![Page 122: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/122.jpg)
System Combination Baselines
Sentence-level Choose among system outputs using an MBR objective
Word-level Confusion network approach
Sentences + alignments form a confusion network
Output maximizes a consensus objective
Two established system combination methods [Macherey & Och, ’07]
e = arg maxe∈{e∗1 ,...,e∗k}
E [bleu(e)]
P1(d|f)
P2(d|f)
P3(d|f)
f
e∗1
e∗2
e∗3
Consensus
Consensus
Consensus
System combo
e
1
2
3
All are aligned to a backbone e∗i eb ∈ {e∗1, . . . , e∗k}
Monday, June 7, 2010
![Page 123: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/123.jpg)
Model versus System Combination
Qualitative
Quantitative
Monday, June 7, 2010
![Page 124: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/124.jpg)
Model versus System Combination
Single-system n-gram statistics are required in both methods
Qualitative
Quantitative
Monday, June 7, 2010
![Page 125: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/125.jpg)
Model versus System Combination
Single-system n-gram statistics are required in both methods
Model combination searches only once under a consensus objective
Qualitative
Quantitative
Monday, June 7, 2010
![Page 126: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/126.jpg)
Model versus System Combination
Inter-hypothesis alignment problem exists only in system combination
Single-system n-gram statistics are required in both methods
Model combination searches only once under a consensus objective
Qualitative
Quantitative
Monday, June 7, 2010
![Page 127: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/127.jpg)
Model versus System Combination
Best single-system consensus
Sentence-level system combination
Word-level system combination
Model combination
43 44 45 46
45.3
44.7
44.6
44.5
Arabic-to-English
Inter-hypothesis alignment problem exists only in system combination
Single-system n-gram statistics are required in both methods
Model combination searches only once under a consensus objective
Qualitative
Quantitative
Monday, June 7, 2010
![Page 128: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/128.jpg)
Model versus System Combination
Best single-system consensus
Sentence-level system combination
Word-level system combination
Model combination
43 44 45 46
45.3
44.7
44.6
44.5
Arabic-to-English
27 28 29 30
29.0
28.8
28.8
28.8
Chinese-to-English
Inter-hypothesis alignment problem exists only in system combination
Single-system n-gram statistics are required in both methods
Model combination searches only once under a consensus objective
Qualitative
Quantitative
Monday, June 7, 2010
![Page 129: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/129.jpg)
Conclusion
Monday, June 7, 2010
![Page 130: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/130.jpg)
Conclusion
Consensus decoding with a learned objective extends naturally to multiple models
Monday, June 7, 2010
![Page 131: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/131.jpg)
Conclusion
Consensus decoding with a learned objective extends naturally to multiple models
Model combination provides consensus and combination effects in a unified, distribution-driven objective
Monday, June 7, 2010
![Page 132: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/132.jpg)
Conclusion
It outperforms a pipeline of consensus decoding followed by system combination, using less total computation
Consensus decoding with a learned objective extends naturally to multiple models
Model combination provides consensus and combination effects in a unified, distribution-driven objective
Monday, June 7, 2010
![Page 133: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/133.jpg)
Conclusion
It outperforms a pipeline of consensus decoding followed by system combination, using less total computation
Consensus decoding with a learned objective extends naturally to multiple models
Model combination provides consensus and combination effects in a unified, distribution-driven objective
It’s easy, it’s clean, and it works
Monday, June 7, 2010
![Page 134: Model Combination for Machine Translationdenero.org/content/pubs/naacl10_denero_combination_slides.pdf · dog ate my homework Combiners assume little about systems Objectives similar](https://reader034.fdocument.org/reader034/viewer/2022042605/5f442a263888c6153236ab65/html5/thumbnails/134.jpg)
Conclusion
It outperforms a pipeline of consensus decoding followed by system combination, using less total computation
Consensus decoding with a learned objective extends naturally to multiple models
Model combination provides consensus and combination effects in a unified, distribution-driven objective
It’s easy, it’s clean, and it works Thanks!
Monday, June 7, 2010