Forward-backward algorithm LING 572 Fei Xia 02/23/06.

Forward-backward algorithm

LING 572

Fei Xia

02/23/06

Outline

• Forward and backward probability

• Expected counts and update formulae

• Relation with EM

HMM

• A HMM is a tuple :– A set of states S={s1, s2, …, sN}.

– A set of output symbols Σ={w1, …, wM}.

– Initial state probabilities

– State transition prob: A={aij}.

– Symbol emission prob: B={bijk}

• State sequence: X1…XT+1

• Output sequence: o1…oT

}{ i

),,,,( BAS

Constraints

M

kijk

N

jij

N

ii

b

a

1

1

1

1

1

1

1 ijkk j

ijba

Decoding

• Given the observation O1,T=o1…oT, find the state sequence X1,T+1=X1 … XT+1 that maximizes P(X1,T+1 | O1,T).

Viterbi algorithm

X1 X2XT…

o1 o2oT

XT+1

Notation

• A sentence: O1,T=o1…oT,

• T is the sentence length• The state sequence X1,T+1=X1 … XT+1

• t: time t, range from 1 to T+1.• Xt: the state at time t.

• i, j: state si, sj.

• k: word wk in the vocabulary

Forward and backward probabilities

Forward probability

The probability of producing oi,t-1 while ending up in state si:

),()( 1,1 iXOPt tt

def

i

Calculating forward probability

tijoiji

i

ttttti

tttttti

tti

t

ttj

bat

iXjXoPiXOP

iXOjXoPiXOP

jXiXOP

jXOPt

)(

)|,(*),(

),|,(*),(

),,(

),()1(

11,1

1,111,1

1,1

1,1

ii )1(Initialization:

Induction:

Backward probability

• The probability of producing the sequence Ot,T, given that at time t, we are at state s i.

)|()( , iXOPt tTt

def

i

Calculating backward probability

tijoijj

j

tTtttj

t

tttTtttj

t

ttTtj

t

tTt

def

i

bat

jXOPiXjXoP

ojXiXOPiXjXoP

iXjX，OoP

iXOPt

)1(

)|(*)|,(

),,|(*)|,(

)|,(

)|()(

1,11

1),1(1

1),1(

,

1)1( TiInitialization:

Induction:

Calculating the prob of the observation

)1()(1

i

N

iiOP

)1()(1

TOPN

ii

)()(

),()(

1

1

tt

iXOPOP

i

N

ii

N

it

Estimating parameters

• The prob of traversing a certain arc at time t given O: (denoted by pt(i, j) in M&S)

N

mmm

jijoiji

tt

ttij

tt

tbat

OP

OjXiXP

OjXiXPt

t

1

1

1

)()(

)1()(

)(

),,(

)|,()(

N

jttti OjXiXPOiXPt

11 )|,()|()(

)()(1

ttN

jiji

The prob of being at state i at time t given O:

Expected counts

Sum over the time index:• Expected # of transitions from state i to j in O:

• Expected # of transitions from state i in O:

N

j

T

tij

T

t

N

jij

T

ti ttt

1 11 11

)()()(

T

tij t

1

)(

Update parameters

)1(1expˆ ii ttimeatistateinfrequencyected

T

tij

N

j

T

tij

T

ti

T

tij

ij

t

t

t

t

istatefromstransitionofected

jtoistatefromstransitionofecteda

11

1

1

1

)(

)(

)(

)(

#exp

#exp

T

tij

T

tijkt

ijk

t

two

jtoistatefromstransitionofected

observedkwithjtoistatefromstransitionofectedb

1

1

)(

)(),(

#exp

#exp

Final formulae

N

mmm

jijoijiij

tt

tbatt t

1

)()(

)1()()(

T

tij

N

j

T

tij

ij

t

ta

11

1

)(

)(

T

tij

T

tijkt

ijk

t

twob

1

1

)(

)(),(

Emission probabilities

T

tij

T

tijkt

ijk

t

two



1

1

)(

)(),(

#exp

#exp

Arc-emission HMM:

The inner loop for forward-backward algorithm

Given an input sequence and1. Calculate forward probability:

• Base case• Recursive case:

2. Calculate backward probability:• Base case:• Recursive case:

3. Calculate expected counts:

4. Update the parameters:

),,,,( BAKS

ii )1(

tijoiji

ij batt )()1(

1)1( Ti

tijoijj

ji batt )1()(

N

mmm

jijoijiij

tt

tbatt t

1

)()(

)1()()(

T

tij

N

j

T

tij

ij

t

ta

11

1

)(

)(

T

tij

T

tijkt

ijk

t

twob

1

1

)(

)(),(

Relation to EM

Relation to EM

• HMM is a PM (Product of Multi-nominal) Model

• Forward-back algorithm is a special case of the EM algorithm for PM Models.

• X (observed data): each data point is an O1T.

• Y (hidden data): state sequence X1T.

• Θ (parameters): aij, bijk, πi.

Relation to EM (cont)

)(

),|,(

),,(*),|(

),,(*),|()(

1

111

1111

1

t

OjXiXP

aXOcountOXP

aYXcountXYPacount

T

tij

Tt

T

tt

XijTTTT

Yijij

T

),()(

),(*),|,(

),,(*),|(

),,(*),|()(

1

111

1111

1

kk

T

tij

kkTt

T

tt

XijkTTTT

Yijkijk

wOt

wOOjXiXP

bXOcountOXP

bYXcountXYPbcount

T

Iterations

• Each iteration provides values for all the parameters

• The new model always improve the likeliness of the training data:

• The algorithm does not guarantee to reach global maximum.

)|()ˆ|( OPOP

Summary

• A way of estimating parameters for HMM– Define forward and backward probability, which can

calculated efficiently (DP)– Given an initial parameter setting, we re-estimate the

parameters at each iteration. – The forward-backward algorithm is a special case of

EM algorithm for PM model

Additional slides

Definitions so far

• The prob of producing O1,t-1, and ending at state si at time t:

• The prob of producing the sequence Ot,T, given that at time t, we are at state si:

• The prob of being at state i at time t given O:

),()( 1,1 iXOPt tt

def

i

)|()( , iXOPt tTt

def

i

N

jjj

iitti

tt

tt

OP

OiXPOiXPt

1

)()(

)()(

)(

),()|()(

N

jjj

iiti

tt

ttOiXPt

1

)()(

)()()|()(

N

mmm

jijoiji

ttij

tt

tbatOjXiXPt t

1

1

)()(

)1()()|,()(

)()(1

ttN

jiji

Emission probabilities

T

tij

T

tijkt

ijk

t

two



1

1

)(

)(),(

#exp

#exp

T

tj

T

tjkt

jk

t

two

jtostransitionofected

observedkwithjtostransitionofectedb

1

1

)(

)(),(

#exp

#exp

Arc-emission HMM:

State-emission HMM:

Forward-backward algorithm LING 572 Fei Xia 02/23/06.

Documents

Transcript of Forward-backward algorithm LING 572 Fei Xia 02/23/06.