Forward-backward algorithm LING 572 Fei Xia 02/23/06.
-
date post
21-Dec-2015 -
Category
Documents
-
view
227 -
download
0
Transcript of Forward-backward algorithm LING 572 Fei Xia 02/23/06.
HMM
• A HMM is a tuple :– A set of states S={s1, s2, …, sN}.
– A set of output symbols Σ={w1, …, wM}.
– Initial state probabilities
– State transition prob: A={aij}.
– Symbol emission prob: B={bijk}
• State sequence: X1…XT+1
• Output sequence: o1…oT
}{ i
),,,,( BAS
Decoding
• Given the observation O1,T=o1…oT, find the state sequence X1,T+1=X1 … XT+1 that maximizes P(X1,T+1 | O1,T).
Viterbi algorithm
X1 X2XT…
o1 o2oT
XT+1
Notation
• A sentence: O1,T=o1…oT,
• T is the sentence length• The state sequence X1,T+1=X1 … XT+1
• t: time t, range from 1 to T+1.• Xt: the state at time t.
• i, j: state si, sj.
• k: word wk in the vocabulary
Forward probability
The probability of producing oi,t-1 while ending up in state si:
),()( 1,1 iXOPt tt
def
i
Calculating forward probability
tijoiji
i
ttttti
tttttti
tti
t
ttj
bat
iXjXoPiXOP
iXOjXoPiXOP
jXiXOP
jXOPt
)(
)|,(*),(
),|,(*),(
),,(
),()1(
11,1
1,111,1
1,1
1,1
ii )1(Initialization:
Induction:
Backward probability
• The probability of producing the sequence Ot,T, given that at time t, we are at state s i.
)|()( , iXOPt tTt
def
i
Calculating backward probability
tijoijj
j
tTtttj
t
tttTtttj
t
ttTtj
t
tTt
def
i
bat
jXOPiXjXoP
ojXiXOPiXjXoP
iXjX,OoP
iXOPt
)1(
)|(*)|,(
),,|(*)|,(
)|,(
)|()(
1,11
1),1(1
1),1(
,
1)1( TiInitialization:
Induction:
Calculating the prob of the observation
)1()(1
i
N
iiOP
)1()(1
TOPN
ii
)()(
),()(
1
1
tt
iXOPOP
i
N
ii
N
it
Estimating parameters
• The prob of traversing a certain arc at time t given O: (denoted by pt(i, j) in M&S)
N
mmm
jijoiji
tt
ttij
tt
tbat
OP
OjXiXP
OjXiXPt
t
1
1
1
)()(
)1()(
)(
),,(
)|,()(
Expected counts
Sum over the time index:• Expected # of transitions from state i to j in O:
• Expected # of transitions from state i in O:
N
j
T
tij
T
t
N
jij
T
ti ttt
1 11 11
)()()(
T
tij t
1
)(
Update parameters
)1(1expˆ ii ttimeatistateinfrequencyected
T
tij
N
j
T
tij
T
ti
T
tij
ij
t
t
t
t
istatefromstransitionofected
jtoistatefromstransitionofecteda
11
1
1
1
)(
)(
)(
)(
#exp
#exp
T
tij
T
tijkt
ijk
t
two
jtoistatefromstransitionofected
observedkwithjtoistatefromstransitionofectedb
1
1
)(
)(),(
#exp
#exp
Final formulae
N
mmm
jijoijiij
tt
tbatt t
1
)()(
)1()()(
T
tij
N
j
T
tij
ij
t
ta
11
1
)(
)(
T
tij
T
tijkt
ijk
t
twob
1
1
)(
)(),(
Emission probabilities
T
tij
T
tijkt
ijk
t
two
jtoistatefromstransitionofected
observedkwithjtoistatefromstransitionofectedb
1
1
)(
)(),(
#exp
#exp
Arc-emission HMM:
The inner loop for forward-backward algorithm
Given an input sequence and1. Calculate forward probability:
• Base case• Recursive case:
2. Calculate backward probability:• Base case:• Recursive case:
3. Calculate expected counts:
4. Update the parameters:
),,,,( BAKS
ii )1(
tijoiji
ij batt )()1(
1)1( Ti
tijoijj
ji batt )1()(
N
mmm
jijoijiij
tt
tbatt t
1
)()(
)1()()(
T
tij
N
j
T
tij
ij
t
ta
11
1
)(
)(
T
tij
T
tijkt
ijk
t
twob
1
1
)(
)(),(
Relation to EM
• HMM is a PM (Product of Multi-nominal) Model
• Forward-back algorithm is a special case of the EM algorithm for PM Models.
• X (observed data): each data point is an O1T.
• Y (hidden data): state sequence X1T.
• Θ (parameters): aij, bijk, πi.
Relation to EM (cont)
)(
),|,(
),,(*),|(
),,(*),|()(
1
111
1111
1
t
OjXiXP
aXOcountOXP
aYXcountXYPacount
T
tij
Tt
T
tt
XijTTTT
Yijij
T
),()(
),(*),|,(
),,(*),|(
),,(*),|()(
1
111
1111
1
kk
T
tij
kkTt
T
tt
XijkTTTT
Yijkijk
wOt
wOOjXiXP
bXOcountOXP
bYXcountXYPbcount
T
Iterations
• Each iteration provides values for all the parameters
• The new model always improve the likeliness of the training data:
• The algorithm does not guarantee to reach global maximum.
)|()ˆ|( OPOP
Summary
• A way of estimating parameters for HMM– Define forward and backward probability, which can
calculated efficiently (DP)– Given an initial parameter setting, we re-estimate the
parameters at each iteration. – The forward-backward algorithm is a special case of
EM algorithm for PM model
Definitions so far
• The prob of producing O1,t-1, and ending at state si at time t:
• The prob of producing the sequence Ot,T, given that at time t, we are at state si:
• The prob of being at state i at time t given O:
),()( 1,1 iXOPt tt
def
i
)|()( , iXOPt tTt
def
i
N
jjj
iitti
tt
tt
OP
OiXPOiXPt
1
)()(
)()(
)(
),()|()(
N
jjj
iiti
tt
ttOiXPt
1
)()(
)()()|()(
N
mmm
jijoiji
ttij
tt
tbatOjXiXPt t
1
1
)()(
)1()()|,()(
)()(1
ttN
jiji