EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special...

55
EM algorithm LING 572 Fei Xia 03/02/06
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    254
  • download

    1

Transcript of EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special...

Page 1: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

EM algorithm

LING 572

Fei Xia

03/02/06

Page 2: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Outline

• The EM algorithm

• EM for PM models

• Three special cases– Inside-outside algorithm– Forward-backward algorithm– IBM models for MT

Page 3: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

The EM algorithm

Page 4: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Basic setting in EM

• X is a set of data points: observed data• Θ is a parameter vector.• EM is a method to find θML where

• Calculating P(X | θ) directly is hard.• Calculating P(X,Y|θ) is much simpler,

where Y is “hidden” data (or “missing” data).

)|(logmaxarg

)(maxarg

XP

LML

Page 5: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

The basic EM strategy

• Z = (X, Y)– Z: complete data (“augmented data”)– X: observed data (“incomplete” data)– Y: hidden data (“missing” data)

• Given a fixed x, there could be many possible y’s.– Ex: given a sentence x, there could be many state

sequences in an HMM that generates x.

Page 6: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Examples of EM

HMM PCFG MT Coin toss

X

(observed)

sentences sentences Parallel data Head-tail sequences

Y (hidden) State sequences

Parse trees Word alignment

Coin id sequences

θ aij

bijk

P(ABC) t(f|e)

d(aj|j, l, m), …

p1, p2, λ

Algorithm Forward-backward

Inside-outside

IBM Models N/A

Page 7: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

The log-likelihood function

• L is a function of θ, while holding X constant: )|()()|( XPLXL

)|,(log

)|(log

)|(log

)|(log)(log)(

1

1

1

yxP

xP

xP

XPLl

iy

n

i

i

n

i

n

ii

Page 8: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

The iterative approach for MLE

)|,(logmaxarg

)(maxarg

)(maxarg

1

yxp

l

L

n

i yi

ML

,....,...,, 10 tIn many cases, we cannot find the solution directly.

An alternative is to find a sequence:

....)(...)()( 10 tlll s.t.

Page 9: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

])|,(

)|,([log

])|,(

)|,([log

)|,(

)|,(),|(log

)|,(

)|,(

)|',(

)|,(log

)|,(

)|,(

)|',(

)|,(log

)|',(

)|,(log

)|,(

)|,(

log

)|,(log)|,(log

)|(log)|(log)()(

1),|(

1),|(

1

'1

'1

'1

1

11

ti

in

ixyP

ti

in

ixyP

ti

itn

i yi

ti

i

yt

yi

ti

n

i

ti

ti

yt

yi

in

i

yt

yi

in

i

t

yi

yin

i

t

yi

n

iyi

n

i

tt

yxP

yxPE

yxP

yxPE

yxP

yxPxyP

yxP

yxP

yxP

yxP

yxP

yxP

yxP

yxP

yxP

yxP

yxP

yxP

yxPyxP

XPXPll

ti

ti

Jensen’s inequality

Page 10: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Jensen’s inequality

])([()](([, xgEfxgfEthenconvexisfif

)])([log()]([log( xpExpE

])([()](([, xgEfxgfEthenconcaveisfif

log is a concave function

Page 11: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Maximizing the lower bound

)]|,([logmaxarg

)|,(log),|(maxarg

)|,(

)|,(log),|(maxarg

])|,(

)|,([logmaxarg

1),|(

1

1

1),|(

)1(

yxPE

yxPxyP

yxP

yxPxyP

yxp

yxpE

i

n

ixyP

it

i

n

i y

ti

iti

n

i y

ti

in

ixyP

t

ti

ti

The Q function

Page 12: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

The Q-function

• Define the Q-function (a function of θ):

– Y is a random vector.– X=(x1, x2, …, xn) is a constant (vector).– Θt is the current parameter estimate and is a constant (vector).– Θ is the normal variable (vector) that we wish to adjust.

• The Q-function is the expected value of the complete data log-likelihood P(X,Y|θ) with respect to Y given X and θt.

)|,(log),|(

)]|,([log)|,(log),|(

)]|,([log],|)|,([log),(

1

1),|(

),|(

yxPxyP

yxPEYXPXYP

YXPEXYXPEQ

it

n

i yi

n

iixyP

Y

t

XYP

tt

ti

t

Page 13: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

The inner loop of the EM algorithm

• E-step: calculate

• M-step: find

),(maxarg)1( tt Q

)|,(log),|(),(1

yxPxyPQ it

n

i yi

t

Page 14: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

L(θ) is non-decreasing at each iteration

• The EM algorithm will produce a sequence

• It can be proved that

,....,...,, 10 t

....)(...)()( 10 tlll

Page 15: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

The inner loop of the Generalized EM algorithm (GEM)

• E-step: calculate

• M-step: find

),(maxarg)1( tt Q

)|,(log),|(),(1

yxPxyPQ it

n

i yi

t

),(),( 1 tttt QQ

Page 16: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Recap of the EM algorithm

Page 17: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Idea #1: find θ that maximizes the likelihood of training data

)|(logmaxarg

)(maxarg

XP

LML

Page 18: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Idea #2: find the θt sequence

No analytical solution iterative approach, find

s.t.

,....,...,, 10 t

....)(...)()( 10 tlll

Page 19: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Idea #3: find θt+1 that maximizes a tight lower bound of )()( tll

a tight lower bound

])|,(

)|,([log)()(

1),|( t

i

in

ixyP

t

yxP

yxPEll t

i

Page 20: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Idea #4: find θt+1 that maximizes the Q function

)]|,([logmaxarg

])|,(

)|,([logmaxarg

1),|(

1),|(

)1(

yxPE

yxp

yxpE

i

n

ixyP

ti

in

ixyP

t

ti

ti

Lower bound of )()( tll

The Q function

Page 21: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

The EM algorithm

• Start with initial estimate, θ0

• Repeat until convergence– E-step: calculate

– M-step: find

),(maxarg)1( tt Q

)|,(log),|(),(1

yxPxyPQ it

n

i yi

t

Page 22: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Important classes of EM problem

• Products of multinomial (PM) models

• Exponential families

• Gaussian mixture

• …

Page 23: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

The EM algorithm for PM models

Page 24: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

PM models

mi

yxici

i

yxici

i

yxici pppyxp ),,(),,(),,(

1

...)|,(

1 ji

ip

Where is a partition of all the parameters, and for any j

),...,( 1 m

Page 25: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

HMM is a PM

kji

j

kw

ik

jsis

ji

wss

yxsscj

w

i

yxsscji

ssp

ssp

yxp

,,

),,(

),,(

)(

)(

)|,(

,

1

1

kijk

jij

b

a

Page 26: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

PCFG

• PCFG: each sample point (x,y):– x is a sentence– y is a possible parse tree for that sentence.

)|()|,(1

ii

n

ii AAPyxP

)|(

)|(

)|(

)|,(

VPsleepsVPP

NPJimNPP

SVPNPSP

yxP

Page 27: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

PCFG is a PM

,

),,()(

)|,(

A

yxAcAp

yxp

A

Ap 1)(

Page 28: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Q-function for PM

)log),,(),|((...)log),,(),|((

)log),|((...)log),|((

))(log(),|(

)|,(log),|(

),(

11

),,(

1

),,(

1

),,(

1

1

1

1

jij

tn

i yiji

j

tn

i yi

yxjC

jj

tn

i yi

yxjC

jj

tn

i yi

yxjC

k jj

tn

i yi

it

n

i yi

t

pyxjCxyPpyxjCxyP

pxyPpxyP

pxyP

yxPxyP

Q

k

i

k

i

k

),(1tQ

Page 29: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Maximizing the Q function

jj

it

n

i yi

t pyxjCxyPQ log),,(),|(),(11

1

11

j

jp

Maximize

Subject to the constraint

11

)log),,(),|((),(ˆ1

1j

jjj

it

n

i yi

t ppyxjCxyPQ

Use Lagrange multipliers

Page 30: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Optimal solution

11

)log),,(),|((),(ˆ1

1j

jjj

it

n

i yi

t ppyxjCxyPQ

0)/),,(),|((),(ˆ

1

1

ji

tn

i yi

j

t

pyxjCxypp

Q

),,(),|(1

yxjCxyp

pi

tn

i yi

j

Normalization factor

Expected count

Page 31: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

PM Models

r is rth parameter in the model. Each parameter is the member of some multinomial distribution.

Count(x,y, r) is the number of times that is seen in the expression for P(x, y | θ)

r

Page 32: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

The EM algorithm for PM Models

• Calculate expected counts

• Update parameters

Page 33: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

PCFG example

• Calculate expected counts

• Update parameters

Page 34: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

The EM algorithm for PM models

// for each iteration

// for each training example xi

// for each possible y

// for each parameter

// for each parameter

Page 35: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Inside-outside algorithm

Page 36: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Inner loop of the Inside-outside algorithm

Given an input sequence and1. Calculate inside probability:

• Base case• Recursive case:

2. Calculate outside probability:• Base case:

• Recursive case:

)(),( kj

j wNPkk

),1(),()(),(,

1

qddpNNNPqp srsr

sr

q

pd

jj

otherwise

jifmj 0

11),1(

)1,()(),(

),1()(),(),(

,

1

1

, 1

peNNNPqe

eqNNNPepqp

gjgf

gf

p

ef

ggjf

gf

m

qefj

Page 37: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Inside-outside algorithm (cont)

)(

),1(),()(),(

)|,(

1

1

1 1

1

m

q

pdsr

srjj

m

p

m

q

msrjj

wP

qddpNNNPqp

wNNNNP

)(

),(),(),()|,(

1

11

m

m

h

khjj

mjkj

wP

wwhhhhwusedisNwNP

3. Collect the counts

4. Normalize and update the parameters

km

jkjm

jkj

kj

k

kjkj

r sm

srjjm

srjj

r s

srj

srjsrj

wusedisNwNP

wusedisNwNP

wNCnt

wNCntwNP

wNNNNP

wNNNNP

NNNCnt

NNNCntNNNP

)|,(

)|,(

)(

)()(

)|,(

)|,(

)(

)()(

1

1

1

1

Page 38: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Expected counts for PCFG rules

),|,(

),|,(

),,(*),|(

),,(*),|()(

1

11 1

11

msrjj

msrjj

pq

m

p

m

q

srj

Trmm

srj

Y

srj

wNNNNP

wNNNNP

NNNwTrcountwTrP

NNNYXcountXYPNNNcount

This is the formula if we have only one sentence.Add an outside sum if X contains multiple sentences.

Page 39: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Expected counts (cont)

),|,(

),,(*),|(

),,(*),|()(

11

11

mjkj

m

h

Tr

kjmm

kj

Y

kj

wusedisNwNP

wNTrwcountwTrP

wNYXcountXYPwNcount

Page 40: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Relation to EM

• PCFG is a PM Model

• Inside-outside algorithm is a special case of the EM algorithm for PM Models.

• X (observed data): each data point is a sentence w1m.

• Y (hidden data): parse tree Tr.

• Θ (parameters):

)(

)(

kj

srj

wNP

NNNP

Page 41: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Forward-backward algorithm

Page 42: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

The inner loop for forward-backward algorithm

Given an input sequence and1. Calculate forward probability:

• Base case• Recursive case:

2. Calculate backward probability:• Base case:• Recursive case:

3. Calculate expected counts:

4. Update the parameters:

),,,,( BAKS

ii )1(

tijoiji

ij batt )()1(

1)1( Ti

tijoijj

ji batt )1()(

N

mmm

jijoijiij

tt

tbatt t

1

)()(

)1()()(

T

tij

N

j

T

tij

ij

t

ta

11

1

)(

)(

T

tij

T

tijkt

ijk

t

twob

1

1

)(

)(),(

Page 43: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Expected counts

)(

),|,(

),,(*),|(

),,(*),|()(

1

111

1111

1

t

OjXiXP

ssXOcountOXP

ssYXcountXYPsscount

T

tij

Tt

T

tt

jX

iTTTT

Yjiji

T

Page 44: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Expected counts (cont)

),()(

),(*),|,(

),,(*),|(

),,(*),|()(

1

111

1111

1

kk

T

tij

kkTt

T

tt

jX

w

iTTTT

j

w

iY

j

w

i

wOt

wOOjXiXP

ssXOcountOXP

ssYXcountXYPsscount

T

k

kk

Page 45: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Relation to EM

• HMM is a PM Model

• Forward-backward algorithm is a special case of the EM algorithm for PM Models.

• X (observed data): each data point is an O1T.

• Y (hidden data): state sequence X1T.

• Θ (parameters): aij, bijk, πi.

Page 46: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

IBM models for MT

Page 47: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Expected counts for (f, e) pairs

• Let Ct(f, e) be the fractional count of (f, e) pair in the training data.

)),(),(*),|((),(||

1,ja

F

jj

FE a

eeffFEaPefCt

Alignment prob Actual count of times e and f are linked in (E,F) by alignment a

FVx

exCt

efCteft

),(

),()|(

Page 48: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Relation to EM

• IBM models are PM Models.

• The EM algorithm used in IBM models is a special case of the EM algorithm for PM Models.

• X (observed data): each data point is a sentence pair (F, E).

• Y (hidden data): word alignment a.• Θ (parameters): t(f|e), d(i | j, m, n), etc..

Page 49: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Summary

• The EM algorithm– An iterative approach – L(θ) is non-decreasing at each iteration– Optimal solution in M-step exists for many classes of

problems.

• The EM algorithm for PM models– Simpler formulae– Three special cases

• Inside-outside algorithm• Forward-backward algorithm• IBM Models for MT

Page 50: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Relations among the algorithms

The generalized EM

The EM algorithm

PM Gaussian MixInside-OutsideForward-backwardIBM models

Page 51: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Strengths of EM

• Numerical stability: in every iteration of the EM algorithm, it increases the likelihood of the observed data.

• The EM handles parameter constraints gracefully.

Page 52: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Problems with EM

• Convergence can be very slow on some problems and is intimately related to the amount of missing information.

• It guarantees to improve the probability of the training corpus, which is different from reducing the errors directly.

• It cannot guarantee to reach global maxima (it could get struck at the local maxima, saddle points, etc)

The initial estimate is important.

Page 53: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Additional slides

Page 54: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Lower bound lemma

)()ˆ( 0xgxg

If

Then

)()()ˆ()ˆ( 00 xgxfxfxg Proof :

Page 55: EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

L(θ) is non-decreasing

])|,(

)|,([log)()(

1),|( t

i

in

ixyP

t

yxP

yxPEll t

i

)()()( tllg

])|,(

)|,([log)(

1),|( t

i

in

ixyP yxP

yxPEf t

i

)(maxarg

0)()()()(1

f

fgandfgt

tt

0)()( 1 tt gg

)()( 1 tt ll

Let

We have

(By lower bound lemma)