Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

120
Hidden Markov Hidden Markov Models Models BIOL337/STAT337/437 Spring Semester 2014

Transcript of Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Page 1: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Hidden Markov Hidden Markov ModelsModels

BIOL337/STAT337/437Spring Semester 2014

Page 2: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1

2

K

1

2

K

1

2

K

1

2

K

x1 x2 x3xn

2

1

K

2

•Theory of hidden Markov models (HMMs)•Probabilistic interpretation of sequence alignments using HMMs•Applications of HMMs to biological sequence modeling and discovery of features such as genes, etc.

An HHM

π1

…π2 π3 πn

Page 3: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Example of an HHMExample of an HHM

Do you want to play?

The Dishonest

Casino

Page 4: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

The situation...

•Casino has two dice, one fair (F) and one loaded (L)

•Probabilities for the fair die: P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6

•Probabilities for the loaded die: P(1) = P(2) = P(3) = P(4) = P(5) = 1/10; P(6) = ½

•Before each roll, the casino player switches from the fair die to the loaded die (or vice versa) with probability 1/20

The game...

•You bet $1

•You roll (always with the fair die)

•Casino player rolls (maybe with the fair die, maybe with the loaded die)

•Player who rolls the highest number wins $2

Page 5: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Dishonest Casino HMMDishonest Casino HMM

FAIR LOADED

0.05

0.05

0.950.95

P(1 | F) = 1/6P(2 | F) = 1/6P(3 | F) = 1/6P(4 | F) = 1/6P(5 | F) = 1/6P(6 | F) = 1/6

P(1 | L) = 1/10P(2 | L) = 1/10P(3 | L) = 1/10P(4 | L) = 1/10P(5 | L) = 1/10P(6 | L) = 1/2

Page 6: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Three Fundamental Three Fundamental Questions About Any HMMQuestions About Any HMM

•Evaluation: What is the probability of a sequence of outputs of an HMM?

•Decoding: Given a sequence of outputs of an HMM, what is the most probable sequence of states that the HMM went through to produce the output?

•Learning: Given a sequence of outputs of an HMM, how do we estimate the parameters of the model?

Page 7: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1666316316416412646255421

6515616361663616636166466

26532164151151

Evaluation QuestionEvaluation Question

Suppose the casino player rolls the following sequence...

How likely is this sequence given our model of how the casino operates?

443

Probability = 1.3 x 10-35 (Note (1/6)67 = 7.309139054 x 10-53.)

Page 8: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Decoding QuestionDecoding Question

1666316316416412646255421

6515616361663616636166466

26532164151151

What portion of the sequence was generated with the fair die, and what portion with the loaded die?

443

FAIRLOADED

Page 9: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Learning QuestionLearning Question

1666316316416412646255421

6515616361663616636166466

26532164151151

How “loaded” is the loaded die? How “fair” is the fair die? How often does the casino player change from fair to loaded, and back? That is,

what are the parameters of the model?

443

FAIR

LOADED

P(6) = 0.64

Page 10: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Ingredients of a HMMIngredients of a HMM

An HMM M has the following parts...

•Alphabet Σ = {b1, b2, ..., bm} //these symbols are output by the model, for example, Σ = {1,2,3,4,5,6} in the dishonest casino model

•Set of states Q = {1, 2, ..., K} //for example, ‘FAIR’ and ‘LOADED’

•Transition probabilities between states ai,j = probability of making a transition from state i to state j

•Starting probabilities a0j = probability of the model starting in state j

Σ aij = 1 j=1

K

Σ a0j = 1 j=1

K

Page 11: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

•Emission probabilities within state k, ek(b) = probability of seeing (emitting) the symbol b output while in state k, that is,

ek(b) = P(xi = b | πi = k)

k

1

2

K

•ek(b1) = P(xi = b1 | πi = k)•ek(b2) = P(xi = b2 | πi = k)

•ek(bm) = P(xi = bm | πi = k)

ak1

ak2

ak,K

ak,k

.…

.…

Page 12: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Some NotationSome Notation

•π = (π1,π2,...,πt)

•x1,x2,...,xt,π1,π2,...,πt

πi = state occupied after i steps

πi = state occupied after i steps, xi emitted in state i

Page 13: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1

2

K

1

2

K

1

2

K

x1 x2 x3

1

2

K

xn

… … …

x1,x2,...,xn,π1,π2,...,πn

Page 14: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

•‘Forgetfulness’ property: if the current state is πt, the next state πt+1 depends only on πt and nothing else

P(t+1 = k | ‘whatever happened so far’) =

P(t+1 = k | x1,x2,…,xt,1,2,…,t) =

P(t+1 = k | t)

The ‘forgetfulness’ property is part of the definition of an HMM!

Page 15: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

What is the probability of What is the probability of xx11,x,x22,...,x,...,xnn,,ππ11,,ππ22,...,,...,ππnn??

Example: Take n = 2.

P(x1,x2,π1,π2) = P(x1,π1,x2,π2)

= P(x2,π2 | x1,π1)·P(x1,π1) (conditional probability)

= P(x2 | π2)·P(π2 | x1,π1)·P(x1,π1) (conditional probability)

= P(x2 | π2)·P(π2 | π1)·P(x1,π1) (‘forgetfulness’)

= P(x2 | π2)·P(π2 | π1)·P(x1 | π1)·P(π1) (conditional probability)

= eπ (x2)·eπ (x1)·aπ π ·a0π2 1 11 2

Page 16: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

In general, for the sequence x1,x2,...,xn,π1,π2,...,πn, we have that

P(x1,...,xn,π1,...,πn) = Πeπ (xk)·aπ πk k-1 kk=1

n

2

x1

1

x2

K

x3

2

xn

a21 a1K aK* a*2

e2(x1) e1(x2) eK(x3) e2(xn)

Page 17: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Do you want to play?

The Dishonest Casino (cont.)

4251265121

π = F F F F F F F F F F

x =

Page 18: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

a0F = ½ a0L = ½

P(x,π) = a0FeF(1)·aFFeF(2)· aFFeF(1)· aFFeF(5)· aFFeF(6)· aFFeF(2)· aFFeF(1)· aFFeF(5)· aFFeF(2) aFFeF(4)

= ½·(1/6)10 (0.95)9 = .00000000521158647211 ≈ 5.21 10-9

Suppose the starting probabilities are as follows:

Now suppose that

π = L L L L L L L L L L

P(x,π) = a0LeL(1)·aLLeL(2)· aLLeL(1)· aLLeL(5)· aLLeL(6)· aLLeL(2)· aLLeL(1)· aLLeL(5)· aLLeL(2) aLLeL(4)

= ½·(1/2)1·(1/10)9·(0.95)9 = .00000000015756235243 ≈ 0.16 10-9

Page 19: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

6366265661

π = F F F F F F F F F F

x =

P(x,π) = a0FeF(1)·aFFeF(6)· aFFeF(6)· aFFeF(5)· aFFeF(6)· aFFeF(2)· aFFeF(6)· aFFeF(6)· aFFeF(3) aFFeF(6)

= ½·(1/6)10 (0.95)9 = .00000000521158647211 ≈ 5.21 10-9

Now suppose that

π = L L L L L L L L L L

P(x,π) = a0LeL(1)·aLLeL(6)· aLLeL(6)· aLLeL(5)· aLLeL(6)· aLLeL(2)· aLLeL(6)· aLLeL(6)· aLLeL(3) aLLeL(6)

= ½·(1/2)6·(1/10)4·(0.95)9 = .00000049238235134735 ≈ 492 10-9

≈100 times more likely!

Page 20: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

• Evaluation Problem (solved by Forward-Backward algorithm)

GIVEN: an HMM M and a sequence xFIND: P(x | M)

• Decoding Problem (solved by Viterbi algorithm)

GIVEN: an HMM M and a sequence xFIND: sequence of states that maximizes P(x, | M)

• Learning Problem (solved by Baum-Welch algorithm)

GIVEN: a HMM M, with unspecified transition/emission probabilitiesand a sequence x

FIND: parameter vector = (ei(.), aij) that maximize P(x | )

Three Fundamental ProblemsThree Fundamental Problems

Page 21: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Let’s not get confused by Let’s not get confused by notation...(lots of different ones!)notation...(lots of different ones!)

P(x | M) : probability that x is generated by the model M (the model M consists of the transition and emission probabilities and the architecture of the HMM, that is, the underlying directed graph)

P(x | θ) : probability that x is generated by the model M where θ is the vector of parameter values, that, is the transition and emission probabilities (note that P(x | M) is equivalent to P(x | θ))

P(x) : same as P(x | M) and P(x | θ)

: probability that x is generated by the model and π is the sequence of states that produced x

P(x, | M), P(x, | ) and P(x,)

Page 22: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1

2

K

1

2

K

1

2

K

x1 x2 x3

1

2

K

xn

… … …

Decoding Problem for HMMsDecoding Problem for HMMs

GIVEN: Sequence x = x1x2… xn generated by the model M

FIND: Path π = π1π2…πn that maximizes P(x,π)

ππ1

π2

π3

πn

π = π1,π2,…,πn

Page 23: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Formally, the Decoding Problem for HHMs is to find the following given a sequence x = x1x2

… xn generated by the model M:

π* = argmax {P(x,π | M)}

P* = max {P(x,π | M)} =P(x, π* | M)}

π

π

P(x1,...,xn,π1,...,πn) = Πeπ (xk)·aπ πk k-1 kk=1

n

Page 24: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Let Vk(i) denote the probability of the maximum probability path from stage 1 to stage i ending in state k generating xi in state k. Can we write an equation (hopefully recursive) for Vk(i)?

Vk(i) = max {P(x1,...,xi-1,π1,...,πi-1,xi,πi = k | M)}

= ek(xi)·max {ajk·Vj(i-1)}

π1,...,πi-1

j

… (proof using properties of conditional probabilities...)

Recursive equation... so...dynamic programming!

Page 25: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

k

……

ak1

ak2

akK

a1k

a2k

aKk

xi

Stage i, State k

ek(xi)

1

2

K

Stage (i – 1)

V1(i-1)

V2(i-1)

VK-1(i-1)

Page 26: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1

2

K

1

2

K

1

2

K

x1 x2 x3

1

2

K

xn

… … …

1

2

K

0

V1(0) = 0

V2(0) = 0

VK(0) = 0

V0(0) = 1a0k

a02a01

Initialization step is needed to start the algorithm! (‘dummy’

0th stage)

How do we start the algorithm?How do we start the algorithm?

Stage 1 Stage 2 Stage 3 … Stage n

Page 27: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1. Initialization Step. //initialize matrixV0(0) = 1 Vj(0) = 0 for all j > 0

2. Main Iteration. //fill in tablefor each i = 1 to n for each k = 1 to K Vk(i) = ek(xi)·max {ajk·Vj(i-1)}

Ptrk(i) = argmax {ajk·Vj(i-1)}

3. Termination. //recover optimal probability and path

P* = max {P(x,π | M)} = max {Vj(n)}

π* = argmax {Vj(n)}

for each i = n – 1 downto 1 π* = Ptrπ (i+1)

Viterbi Algorithm for Decoding an HMMViterbi Algorithm for Decoding an HMM

Time: O(K2n)Space: O(Kn)

j

j

nj

i i+1

Page 28: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

x1 x2 x3 xn

1

2

K

States

V1(1) = e1(x1)·a01

VK(1) = eK(x1)·a0K

V2(1) = e2(x1)·a02

Vk(2) = ek(x2)·max {ajk·Vj(1)}j

Page 29: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

FAIR LOADED

0.05

0.05

0.950.95

P(1 | F) = 1/6P(2 | F) = 1/6P(3 | F) = 1/6P(4 | F) = 1/6P(5 | F) = 1/6P(6 | F) = 1/6

P(1 | L) = 1/10P(2 | L) = 1/10P(3 | L) = 1/10P(4 | L) = 1/10P(5 | L) = 1/10P(6 | L) = 1/2

Computational Example Computational Example of the Viterbi Algorithmof the Viterbi Algorithm

x = 2 6 6

Page 30: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1/2*1/6 = 1/121/6*max{19/20*1/12,1/20

*1/20} = 19/1440 = 0.0131944

1/6*max{19/20* 19/1440,1/20*19/800} =

361/172800 = 0.002089120

1/2*1/10 = 1/201/2*max{1/20*1/12,19/20

*1/20} = 19/800 = 0.02375

1/2*max{1/20* 19/1440,19/20*19/800} = 361/32000 = 0.01128125

F

L

2 6 6

P* = 0.01128125π = LLL

x = 2 6 6

P(266,LLL) = (1/2)3 ·(1/10)·(95/100)2 = 361/32000 = 0.01128125

Page 31: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

How well does Viterbi perform?How well does Viterbi perform?

300 rolls by the casino

Viterbi is correct 91% of the time!

Page 32: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Problem of Underflow Problem of Underflow in the Viterbi Algorithmin the Viterbi Algorithm

Vk(i) = max {P(x1,...,xi-1,π1,...,πi-1,xi,πi = k | M)}

= ek(xi)·max {ajk·Vj(i-1)}

π1,...,πi-1

j

Numbers become very small since probabilities are being

multiplied together!

Vk(i) = log(ek(xi)) + max {log(ajk) + Vj(i-1)}j

Compute with the logarithms of the probabilities to reduce the occurrence of underflow!

Page 33: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

• Evaluation Problem (solved by Forward-Backward algorithm)

GIVEN: an HMM M and a sequence xFIND: P(x | M)

• Decoding Problem (solved by Viterbi algorithm)

GIVEN: an HMM M and a sequence xFIND: sequence of states that maximizes P(x, | M)

• Learning Problem (solved by Baum-Welch algorithm)

GIVEN: a HMM M, with unspecified transition/emission probabilitiesand a sequence x

FIND: parameter vector = (ei(.), aij) that maximize P(x | )

Three Fundamental ProblemsThree Fundamental Problems

Done!

Page 34: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1

2

K

1

2

K

1

2

K

x1 x2 x3

1

2

K

xn

… … …

Evaluation Problem for HMMsEvaluation Problem for HMMs

GIVEN: Sequence x = x1x2… xn generated by the model M

FIND: Find P(x), the probability of x given the model M

ππ1

π2

π3

πn

Page 35: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Formally, the Evaluation Problem for HHMs is to find the following given a sequence x = x1x2

… xn generated by the model M:

P(x) = Σ P(x,π | M)

= Σ P(x | π)·P(π)

π

π

Exponential number of paths π !

Since there are an exponential number of paths, specifically Kn, the probability P(x) cannot be computed directly. So...dynamic programming again!

Page 36: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Let fk(i) denote the probability of the subsequence x1x2,...,xi of x such that πi = k. The quantity fk(i) is called the forward probability. Can we write an equation (hopefully recursive) for fk(i)?

fk(i) = Σ P(x1,...,xi-1,π1,...,πi-1,xi,πi = k | M)

= ek(xi)· Σ {ajk·fj(i-1)}

π1,...,πi-1

j

… (proof using properties of conditional probabilities...)

Recursive equation suitable for dynamic programming

Page 37: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

k

……

ak1

ak2

akK

a1k

a2k

aKk

xi

Stage i, State k

ek(xi)

1

2

K

Stage (i – 1)

f1(i-1)

f2(i-1)

fK(i-1)

Page 38: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1

2

K

1

2

K

1

2

K

x1 x2 x3

1

2

K

xn

… … …

1

2

K

0

f1(0) = 0

f2(0) = 0

fK(0) = 0

f0(0) = 1a0k

a02a01

Initialization step is needed to start the algorithm! (‘dummy’

0th stage)

How do we start the algorithm?How do we start the algorithm?

Stage 1 Stage 2 Stage 3 … Stage n

Page 39: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1. Initialization Step. //initialize matrixf0(0) = 1 fj(0) = 0 for all j > 0

2. Main Iteration. //fill in tablefor each i = 1 to n for each k = 1 to K fk(i) = ek(xi)·Σ ajk·fj(i-1)

3. Termination. //recover probability of x

P(x) = Σ fj(n)

Forward Algorithm Forward Algorithm for Evaluationfor Evaluation

Time: O(K2n)Space: O(Kn)

j

j

Page 40: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

x1 x2 x3 xn

1

2

K

States

f1(1) = e1(x1)·a01

fK(1) = eK(x1)·a0K

f2(1) = e2(x1)·a02

fk(2) = ek(x2)· Σ {ajk·fj(1)}j

Page 41: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1. Initialization Step. //initialize matrixV0(0) = 1 Vj(0) = 0 for all j > 0

2. Main Iteration. //fill in tablefor each i = 1 to n for each k = 1 to K Vk(i) = ek(xi)·max {ajk·Vj(i-1)}

Ptrk(i) = argmax {ajk·Vj(i-1)}

3. Termination. //recover optimal probability and path P* = max {P(x,π | M)} = max {Vj(n)}

π* = argmax {Vj(n)}

for each i = n – 1 downto 1 π* = Ptrπ (i+1)

ViterbiViterbi vs. vs. ForwardForward

j

j n

i i+1

j

1. Initialization Step. //initialize matrixf0(0) = 1 fj(0) = 0 for all j > 0

2. Main Iteration. //fill in tablefor each i = 1 to n for each k = 1 to K fk(i) = ek(xi)·Σ ajk·fj(i-1)

3. Termination. //recover probability P(x) = Σ fj(n)

j

j

Page 42: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1/2*1/6 = 1/121/6*[19/20*1/12+1/20*1/2

0] = 49/3600 = 0.0136111

1/6*[19/20*49/3600+1/20*31/1200] =

8/3375 = 0.0023703

1/2*1/10 = 1/201/2*[1/20*1/12+19/20*1/2

0] = 31/1200 = 0.0258333

1/2*[1/20*49/3600+19/20*31/1200] = 227/18000 =

0.0126111

F

L

2 6 6

P(x) = P(266) = 8/3375 + 227/18000 = 809/54000 = 0.01498148148

x = 2 6 6

Page 43: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

(1/2*1/10 * (95/100)*(1/2) * (95/100)*(1/2)) + (1/2*1/10 * (95/100)*(1/2) * (1/20)*(1/6)) + (1/2*1/10 * (1/20)*(1/6) * (1/20)*(1/2)) + (1/2*1/10 * (1/20)*(1/6) * (95/100)*(1/6) ) + (1/2*1/6 * (1/20)*(1/2) * (95/100)*(1/2) ) + (1/2*1/6 * (1/20)*(1/2) * (1/20)*(1/6)) +(1/2*1/6 * (95/100)*(1/6) * (1/20)*(1/2)) + (1/2*1/6 * (95/100)*(1/6) * (95/100)*(1/6))

P(x) = Σ P(x,π | M) = π

= 0.01498148148

Checking, we see that for x = 266,

Page 44: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Backward Algorithm: MotivationBackward Algorithm: Motivation

Suppose we wish to compute the probability that the ith state is k given the observed sequence of outputs x. (Notice that we would then know the density of the random variable πi.) That is, we must compute

We start by computing

P(πi = k | x) = P(πi = k, x) / P(x)

P(πi = k , x) = P(x1,...,xi,πi = k,xi+1,...,xn)

= P(x1,...,xi,πi = k) · P(xi+1,...,xn | x1,...,xi,πi = k)

fk(i) bk(i)new!

Page 45: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

P(πi = k | x) = P(πi = k , x) / P(x)

= fk(i)·bk(i) / P(x)

So then, we have the following equation.

The quantity bk(i) is called the backward probability and is defined by

bk(i) = P(xi+1,...,xn | πi = k)

Page 46: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

bk(i) = P(xi+1,...,xn | πi = k)

= Σ ej(xi+1)·akj·bj(i+1)j

… (proof using properties of conditional probabilities...)

Recursive equation suitable for dynamic programming

Can we write an equation (hopefully recursive) for bk(i)?

Page 47: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1. Initialization Step. //initialize matrixbj(n) = 1 for all j > 0

2. Main Iteration. //fill in tablefor each i = n - 1 downto 1 for each k = 1 to K bk(i) = Σ ej(xi+1)·akj·bj(i+1)

3. Termination. //recover probability of x

P(x) = Σ ej(x1)·bj(1)·a0j

Backward Algorithm Backward Algorithm for Evaluationfor Evaluation

Time: O(K2n)Space: O(Kn) j

j

Page 48: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

x1 x2 x3 xn

1

2

K

States

b1(n) = 1

bK(n) = 1

b2(n) = 1

bk(n - 1) = Σ ej(xn)·akj·bj(n)j

Page 49: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1/6*19/20*11/60+1/2*1/20*29/60 = 37/900 =

0.041111

1/6*19/20*1+1/2*1/20*1 = 11/60 = 0.183333

1

1/6*1/20*11/60+1/2*19/20*29/60 = 52/225 =

0.231111

1/6*1/20*1+1/2*19/20*1 = 29/60 = 0.483333

1

F

L

2 6 6

P(x) = P(266) = 1/6*37/900*1/2 + 1/10*52/225*1/2 = 809/54000 = 0.01498148148

x = 2 6 6

Page 50: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Posterior DecodingPosterior Decoding

P(πi = k | x) = P(πi = k , x) / P(x)

= fk(i)·bk(i) / P(x)Now we can ask what is the most probable state at stage i. Let πi denote thisstate. Clearly, we have

πi = argmax {P(πi = k | x)}^

^

Therefore, (π1, π2,..., πn) is the sequence of the most probable states. Notice that the above sequence is not (necessarily) the most probable path that the HMM went through to produce x and may not even be a valid path!

There are two types of decoding for an HMM: Viterbi decoding and posterior decoding.

^ ^ ^

Page 51: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

π1 = argmax{fF(1)·bF(1)/P(x), fL(1)·bL(1)/P(x)}

= argmax{1/12*37/900/(809/54000),1/20*52/225 /(809/54000)}

= argmax{185/809,624/809}

= argmax{0.2286773795,0.7713226205}

= L

^

π2 = argmax{fF(2)·bF(2)/P(x), fL(2)·bL(2)/P(x)}

= argmax{49/3600*11/60/(809/54000),31/1200*29/60/(809/54000)}

= argmax{539/3236,2697/3236}

= argmax{0.1665636588,0.8334363412}

= L

^

Page 52: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

π3 = argmax{fF(3)·bF(3)/P(x), fL(3)·bL(3)/P(x)}

= argmax{8/3375*1/(809/54000),227/18000*1/(809/54000)}

= argmax{128/809,681/809}

= argmax{0.1582200247,0.8417799753}

= L

^

(π1, π2, π3) = (L, L, L)^ ^ ^

The sequence of most probable states given x = 266 is

Page 53: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

fπ (k | x) = P(πi = k | x) i

P(πi = k | x) is the (conditional) density function of the random variable πi.

Page 54: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1. Initialization Step. V0(0) = 1 V0(j) = 0 for all j > 0

2. Main Iteration.for each i = 1 to n for each k = 1 to K Vk(i) = ek(xi)·max {ajk·Vj(i-1)}

Ptrk(i) = argmax {ajk·Vj(i-1)}

3. Termination. P* = max {P(x,π | M)} = max {Vj(n)}

π* = argmax {Vj(n)}

for each i = n – 1 downto 1 π* = Ptrπ (i+1)

ViterbiViterbi vs. vs. ForwardForward vs. vs. BackwardBackward

j

j n

i i+1

j

1. Initialization Step.f0(0) = 1 f0(j) = 0 for all j > 0

2. Main Iteration.for each i = 1 to n for each k = 1 to K

fk(i) = ek(xi)·Σ ajk·fj(i-1)

3. Termination. P(x) = Σ fj(n)

j

j

1. Initialization Step.bj(n) = 1 for all j > 0

2. Main Iteration.for each i = n - 1 downto 1 for each k = 1 to K

bk(i) = Σ ej(xi+1)·akj·bj(i+1)

3. Termination. P(x) = Σ ej(x1)·bj(1)·a0j

j

j

Page 55: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

• Evaluation Problem (solved by Forward-Backward algorithm)

GIVEN: an HMM M and a sequence xFIND: P(x | M)

• Decoding Problem (solved by Viterbi algorithm)

GIVEN: an HMM M and a sequence xFIND: sequence of states that maximizes P(x, | M)

• Learning Problem (solved by Baum-Welch algorithm)

GIVEN: a HMM M, with unspecified transition/emission probabilitiesand a sequence x

FIND: parameter vector = (ei(.), aij) that maximize P(x | )

Three Fundamental ProblemsThree Fundamental Problems

Done!

Done!

Page 56: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Two Learning ScenariosTwo Learning Scenarios• Learning when states are known

GIVEN: an HMM M with unspecified transition/emission

probabilities, a sequence x, and a sequence π1,...,πn

FIND: parameter vector = (ei(.), aij) that maximize P(x | )

For example, the Dishonest Casino dealer allows an observer to view himchanging dice while he produces a large number of rolls.

• Learning when states are unknown

GIVEN: an HMM M with unspecified transition/emission probabilities and a sequence x

FIND: parameter vector = (ei(.), aij) that maximize P(x | )

The Dishonest Casino dealer does not allow an observer to view himchanging dice while he produces a large number of rolls.

Page 57: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Learning When Learning When States are KnownStates are Known

GIVEN: an HMM M with unspecified transition/emission

probabilities, a sequence x, and a sequence π1,...,πn

FIND: parameter vector = (ei(.), aij) that maximize P(x | )

Ajk = # times there is a j → k transition in π1,...,πn

Ek(b) = # times state k in π1,...,πn emits b

The following can be shown to be the maximum likelihood estimators for the parameters in θ (that is, those parameter values that maximize P(x | ).

ajk = Ajk

Σ Aji i

ek(b) = Ek(b)

Σ Ek(c) c

^ ^

Page 58: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

300 rolls by the casino

= 262/(262 + 6) = 0.9776 aFF = AFF^

AFF + AFL

= 6/(262 + 6) = 0.0224 aFL = AFL^

AFF + AFL

eL(6) = EL(6)

Σ EL(c) c

^= 54/95 = 0.5684

eL(1) = EL(1)

Σ EL(c) c

^= 8/95 = 0.0842

Page 59: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Problem of ‘Overfitting’ Problem of ‘Overfitting’

aFF = 1, aFL = 0, aLL = und., aLF = und.

eF(1) = eF(3) = 0.2eF(2) = 0.3, eF(4) = 0, eF(5) = 0.1, eF(6) = 0.2

x = 2 1 5 6 1 2 3 6 2 3

π = F F F F F F F F F F

P(x | ) is maximized, but is unreasonable! More data is needed to derive sensible parameter values or, as an alternative, pseudocounts can be used.

^ ^ ^ ^

^ ^^ ^ ^ ^

Page 60: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Learning When Learning When States are UnknownStates are Unknown

GIVEN: an HMM M with unspecified transition/emission

probabilities and a sequence x (the values of Ajk and

Ek(b) cannot be computed since π1,...,πn are unknown)

FIND: parameter vector = (ei(.), aij) that maximizes P(x | )

•STEP 1: Estimate our ‘best guess’ on what Ajk and Ek(b) should be

•STEP 2: Update the parameters of the model based on our guess

•Repeat STEPS 1 and 2 until convergence of P(x | θ)

Page 61: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

How do we update the current How do we update the current parameters of the model?...parameters of the model?...

Assume that θcurr represents the current estimates of the HMM parameters. We will derive the new estimate of Ajk (as an example).

First, at each state j, find the probability that j → k transition is used. Assume that θcurr appears in the appropriate places in the formulas below.

P(πi = j, πi+1 = k | x) = P(πi = j, πi+1 = k, x1,...,xn) / P(x)

= P(x1,...,xi, πi = j, πi+1 = k, xi+1,...,xn) / P(x)

= P(πi+1 = k, xi+1,...,xn | πi = j )·P(x1,...,xi πi = j) / P(x)

= P(πi+1 = k, xi+1,xi+2...,xn | πi = j )·fj(i) / P(x)

= P(xi+2...,xn | πi+1 = k)·P(xi+1| πi+1 = k )·P(πi+1 = k | πi = j)·fj(i) / P(x)

= bk(i+1)·ek(xi+1)·ajk·fj(i) / P(x)

Page 62: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

j k

xi+1

ajk

ek(xi+1)

bk(i+1)fj(i)

x1,...,xi-1

xi

xi+2,...,xn

P(x)

Page 63: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

So, we have derived a formula for the probability of a j → k transitionfrom stage i to stage i+1 given the output x and the current values of the parameters.

So, the new value of Ajk (expected number of j → k transitions) can be found as

P(πi = j, πi+1 = k | x, θcurr) =

bk(i+1)·ek(xi+1)·ajk·fj(i) / P(x | θcurr)

Ajk = P(πi = j, πi+1 = k | x, θcurr) =

bk(i+1)·ek(xi+1)·ajk·fj(i) / P(x | θcurr)

Σi

Σi

Page 64: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

In a similar way,

Ek(b) = bk(i)·fk(i) / P(x | θcurr)

Σi, xi = b

ajk = Ajk

Σ Aji i

ek(b) = Ek(b)

Σ Ek(c) c

^ ^

To obtain new (updated) values for the parameters of the HMM, we normalize as before. Recall that

Page 65: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

The Baum-Welch algorithm is normally applied to an entire group of sequences that are assumed to have been generated independently by the model. Typically, training sequences are collected over a period of time. Let x1, x2, ..., xr be r sequences of length n1, n2, ..., nr.

x1x1...x11 2 n1

x2x2...x21 2 n2

x1 =

x2 =

.

.

.

xrxr ...xr1 2 nr

xr =

Training Sequences for the Training Sequences for the Baum-Welch AlgorithmBaum-Welch Algorithm

Training sequences

Page 66: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1. Initialization Step. //initialize parametersPick the initial guess θcurr for model parameters (or arbitrary)

2. Main Iteration. //refine model parameters by iterationrepeat for each training sequence perform the Forward Algorithm perform the Backward Algorithm calculate the Ajk and Ek(b) given θcurr and using all the training sequences x1, x2, ..., xr

calculate new model parameters θnew : ajk and ek(b) calculate P(x1, x2, ..., xr | θnew) //theory guarantees that value will increase; note that P(x1,

x2, ..., xr | θnew) = P(x1 | θnew)·P(x2 | θnew)· ... · P(xr | θnew) by independenceuntil (P(x1, x2, ..., xr | θnew)) does not change much)

3. Termination.

return θnew as the parameter values

Baum-Welch AlgorithmBaum-Welch Algorithm

Time: O(K2n)/interationSpace: O(Kn)

Page 67: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Baum-Welch Algorithm ExampleBaum-Welch Algorithm Example

1

2

EB

1/2

0

1/2 0

0

1

1

0

x1 = ABAx2 = ABBx3 = AB

e1(A) = 1/4e1(B) = 3/4

e2(A) = 1/2e2(B) = 1/2

Page 68: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Iteration #1Iteration #1

The Forward and Backward probability tables must be computed for each of thetraining sequences!

1/4 3/32 3/256

0 1/16 3/128

x1 = ABA

A B A

1

2

Forward Probabilities

P(x1) = (0)(3/256) + (1)(3/128) = 3/128

Page 69: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

3/32 1/4 0

0 0 1

x1 = ABA

A B A

1

2

Backward Probabilities

P(x1) = (1/4)(3/32)(1) + (1/2)(0)(0) = 3/128

Page 70: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1/4 3/32 9/256

0 1/16 3/128

x2 = ABB

A B B

1

2

Forward Probabilities

P(x2) = (3/128)(1) + (9/256)(0)

Page 71: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

3/32 1/4 0

0 0 1

x2 = ABB

A B B

1

2

Backward Probabilites

P(x2) = (1/4)(3/32)(1) + (1/2)(0)(0) = 3/128

Page 72: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1/4 3/32

0 1/16

x3 = AB

A B

1

2

Forward Probabilites

P(x3) = (1/16)(1) + (3/32)(0)

Page 73: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1/4 0

0 1

x3 = AB

A B

1

2

Backward Probabilities

P(x3) = (1/4)(1/4)(1) + (1/2)(0)(0) = 1/16

Page 74: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

All the expected number of transition values must be computed.

A12 = f1(1)·a12·e2(B)·b2(2) + f1(2)·a12·e2(A)·b2(3)

P(x1)

(1/4)·(1/2)·(1/2)·(0) + (3/32)·(1/2)·(1/2)·(1)

f1(1)·a12·e2(B)·b2(2) + f1(2)·a12·e2(B)·b2(3)

P(x2)

f1(1)·a12·e2(B)·b2(2)

P(x3)

+

+

=3/128

(1/4)·(1/2)·(1/2)·(0) + (3/32)·(1/2)·(1/2)·(1)

3/128+ +

(1/4)·(1/2)·(1/2)·(1)

1/16

= 3

Likewise, A11 = 2, A21 = 0 = A22.

Page 75: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

AB1 = aB1·e1(A)·b1(1)

P(x1)

aB1·e1(A)·b1(1)

P(x2)

aB1·e1(A)·b1(1)

P(x3)+ +

= (1)·(1/4)·(3/32)

3/128

(1)·(1/4)·(3/32)

3/128

(1)·(1/4)·(1/4)

1/16+ +

Computations for the states B and E must be adjusted accordingly.

= 3

A2E = f2(3)·a2E

P(x1)

f2(3)·a2E

P(x2)

f2(2)·a2E

P(x3)+ +

= (3/128)·(1)

3/128

(3/128)·(1)

3/128

(1/16)·(1)

1/16+ +

= 3

Likewise, A1E = 0 = AB2.

Page 76: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

All the expected number of emission values must be computed.

E1(A) = f1(1)·b1(1) + f1(3)·b1(3)

P(x1)

f1(1)·b1(1)

P(x2)+

f1(1)·b1(1)

P(x3)+

= (1/4)·(3/32) + (3/256)·(0)

3/128

(1/4)·(3/32)

3/128

(1/4)·(1/4)

1/16+ +

= 3

Likewise, E1(B) = 2, E2(A) = 1, E2(B) = 2.

Page 77: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Finally, all the new parameter values must be computed.

a12 = ^ A12

A11 + A12 + A1E

32 + 3 + 0

=35

=

a12 = ^ 35

a11 = ^ 25

a1E = 0 ^ a21 = 0 ^

a22 = 0 ^ a2E = 1 ^ aB1 = 1 ^ aB2 = 0 ^

Similar computations yield the following new transition probabilities.

Page 78: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

e1(A) = E1(A)

E1(A) + E1(B)3

3 + 2 =

35

=

e1(A) = 25

e1(B) =

13

e2(A) = 23

e2(B) =

^ ^

^ ^

35

Similar computations yield the following new emission probabilities.

Page 79: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1

2

EB

2/5

0

3/5 0

0

1

1

0

x1 = ABAx2 = ABBx3 = AB

e1(A) = 3/5e1(B) = 2/5

e2(A) = 1/3e2(B) = 2/3

Page 80: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Iteration #2Iteration #2

The Forward and Backward probability tables must be computed for each of thetraining sequences! Only the Forward probabilities will be computed this time.

3/5 12/125 72/3125

0 6/25 12/625

x1 = ABA

A B A

1

2

Forward Probabilities

P(x1) = (0)(72/3125) + (1)(12/625) = 12/625

Page 81: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

3/5 12/125 48/3125

0 6/25 24/625

x2 = ABB

A B B

1

2

Forward Probabilities

P(x2) = (1)(24/625) + (0)(48/3125)

Page 82: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

3/5 12/125

0 6/25

x3 = AB

A B

1

2

Forward Probabilities

P(x3) = (1)(6/25) + (0)(12/125)

Page 83: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Iteration #1Iteration #1

P(x1)·P(x2)·P(x3) = (3/128)·(3/128)·(1/16) = 9/262144 = .0000343

Iteration #2Iteration #2

P(x1)·P(x2)·P(x3) = (12/625)·(24/625)·(6/25) = 1728/9765625 = .0001769

Probability has increased!

Page 84: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

A Modeling Example: CpG A Modeling Example: CpG Islands in DNA SequenceIslands in DNA Sequence

A+ C+ G+ T+

A- C- G- T-

Page 85: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

What are What are CpGCpG islands and why are they important? islands and why are they important?

•The frequency of the four nucleotides A, T, C, and G are fairly stable across the human genome: A ≈ 29.5%, C ≈ 20.4%, T ≈ 20.5%, and G ≈ 29.6%.

•Frequencies of dinucleotides (that is, nucleotide pairs) vary widely across the human genome.

•CG pairs are typically underrepresented. This is because CG pairs often mutate to TG, and so the frequency of CG pairs is less than 1/16. In fact, the CG pair is the least frequent dinucleotide since the C in the CG pair is easily methylated (a mythyl CH3 “joins” the cytosine), then the methyl-C has the tendency to mutate to a T over the course of evolution by a process called deamination. CG pairs tend to mutate to TG pairs.

•Methylation is suppressed around genes in a genome, and so CpG dinucleotides occur at greater frequencies in and around genes. These high-frequency stretches of DNA are called CpG (or simply CG) islands. (The ‘p’ stands for the phosphodiester bond between the C and G nucleotide to emphasize that the C and G are on the same strand of DNA and are not a base pair.)

Page 86: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

CpGCpG Islands & Genes Islands & Genes

Gene5’ end

Gene

promoter CpG islands

CpG islands in body

Gene 3’ end CpG islands

CpG island

Gene

Finding CpG islands is an important problem!

Page 87: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

ModelModel of of CpGCpG Islands: Islands: ArchitectureArchitecture

A+ C+ G+ T+

A- C- G- T-

+ : in a CpG island

- : not in a CpG island

Page 88: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

ModelModel of of CpGCpG Islands: Transitions Islands: Transitions

+ A C G T

A .180 .274 .426 .120

C .171 .368 .274 .188

G .161 .339 .375 .125

T .079 .355 .384 .182

- A C G T

A .300 .205 .285 .210

C .322 .298 .078 .302

G .248 .246 .298 .208

T .177 .239 .292 .292

Transition probabilities within CpG islands; emission probability =

1/0, e.g., eA+(A) = 1, eA+(C,G,T) = 0

Tables below were established from many known (experimentally verified) CpG islands and known non-CpG islands (called training sequences).

=1

=1

=1

=1

=1

=1

=1

=1

Note: Transitions out of each state add up to one. There is no room for transitions between (+) and (-) states! What do we do?...

Transition probabilities within non-CpG islands; emission probability = 1/0, e.g., eA-(A) = 1, eA-(C,G,T) = 0

Page 89: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

ModelModel of of CpGCpG Islands: Transitions (cont.) Islands: Transitions (cont.)

What about the transitions between the + and – states? Certainly there is a probability (say) p of staying in a CpG island and a probability (say) q of staying in a non-CpG island.

A+ C+ G+ T+

A- C- G- T-

1 - p

p

1 - q

q

Page 90: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

ModelModel of of CpGCpG Islands: Transitions (cont.) Islands: Transitions (cont.)

To estimate the remaining probabilities, use the following steps.

Step 1: Adjust all probabilities by a factor of p or q. For example,

aA+C+ ← aA+C+ · p, aA-C- ← aA-C- · q, etc.

Step 2: Calculate all the probabilities between the + states and the – states.

Step 2.1: Let fA-, fC-, fG-, and fT- be the frequency of A, C, G, and T among the non-CpG nucleotides in the training sequence.

Step 2.2: Let aA+A- ← fA- · (1 – p), aA+C- ← fC- · (1 – p), etc. Do the same for the – to + states.

Step 3: Estimate the probabilities p and q. But how?...

Page 91: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Geometric DistributionGeometric Distribution

A random variable X is said to be geometrically distributed if it has a density given by

fX(x) = p·(1 – p)x-1, x = 1,2,...

p is the probability of a success in a series of Bernoulli trials. The random variable X counts the number of trials up to and including the first success.

The expected value and variance of X are easy to remember!

E(X) = 1/p

Var(X) = (1 – p)/p2

Page 92: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

ModelModel of of CpGCpG Islands: Transitions (cont.) Islands: Transitions (cont.)

Let L+ denote the length in nucleotides of a CpG island. L+ is a random variable, and one approach is to model L+ as a geometric random variable (controversial since CpG islands may not have an exponential-length distribution in the genome under study).

P(L+ = 1) = 1 – p (leaving is considered a success!) P(L+ = 2) = p(1 – p) P(L+ = 3) = p2(1 – p)

P(L+ = k) = pk-1(1 – p)

The expected value of L+ is E(L+) = 1/(1 – p). Similarly, E(L-) = 1/(1 – q) where L- is the length in nucleotides of a non-CpG island.

From the training data, compute the average length of a CpG island, and then set that number equal to E(L+) = 1/(1 – p), and solve for p. Do the same for non-CpG islands. For example, if the average length of a CpG island is 300, then p = 299/300.

Page 93: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Duration Modeling in Hidden Markov ModelsDuration Modeling in Hidden Markov Models

Length distribution of introns and exons show considerable variation. Length distribution of introns and exons show considerable variation.

Page 94: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

k(1) k(2) k(n-1) k(n)

p p p p

1-p 1-p 1-p 1-p

n copies of state k

Negative Binomial DistributionNegative Binomial Distribution

The shortest sequence through the states that can be modeled has length n. Let D denote the duration of state k. (Clearly, D is at least n.) Note that D has a negative binomial distribution with parameters 1 – p (probability of a success) and n (number of successes needed).

j1-p...

P(D = L) = L-1n-1

(1-p)n pL-n

Page 95: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.
Page 96: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

ModelModel of of CpGCpG Islands: Islands: ApplicationsApplications

A+ C+ G+ T+

A- C- G- T-

So what is it good

for?

Page 97: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Viterbi Decoding: Given a long strand of DNA, we can decode it using the model!

ATCGTTAGCTACCGACC...

A-T-C-G-T-T-A-G+C+T+A+C+C+G+A+C+C+...

CpG island

Viterbi decoding

For Example...For Example...

Page 98: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Posterior Decoding: Given a long strand of DNA, we can derive the probability distribution of a given position.

ATCGTTAGCTACCGACC...

posterior decoding

ith position

fπ (k) = P(πi = k | x) i

Page 99: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

What if a new genome What if a new genome comes along...?comes along...?

Porcupine

Page 100: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

We just sequenced the porcupine genome

We know CpG islands play the same role in this genome. That is, they signal the occurrence of a gene.

However, we have no known CpG islands for porcupines.

We suspect the frequency and characteristics of CpG islands are quite different in porcupines and humans.

What do we do...?What do we do...?

Page 101: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Alignment Penalties Revisited: Alignment Penalties Revisited: Affine Gap Penalties Affine Gap Penalties

If, for example n ≥ m, then the time needed to run the algorithm is O(n2m). A compromise between general convex gap penalties and linear gap penalties is the affine gap penalty. In the following discussion, we assume that a deletion will not be followed directly by an insertion. That is, Ix and Iy can not jump between each other.

d is the gap opening penalty and e is the gap extension penalty. For affine gap penalties, there is an implementation of the N-W algorithm that runs in O(nm) time like the original algorithm.

γ(n) = d + (n – 1)e de

(n)

Page 102: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1. xi is aligned to yj

x1...xi-1 xi

y1...yj-1 yj

2. xi is aligned to a gap

x1...xi-2 xi-1 xi

y1...yj - -

3. yj is aligned to a gap

x1...xi - - -

y1...yj-3 yj-2 yj-1 yj

Updating the Score (cont.)Updating the Score (cont.)Updating the score is complicated by the fact that gaps are not assessed the same penalty. Opening a gap is penalized more (typically a lot more!) than extending a group of gaps. Keeping one value, F(i,j) does not suffice!

M(i,j) = optimal score aligning x1x2...xi to y1y2…yj given xi is aligned to yj

Ix(i,j) = optimal score aligning x1x2...xi to y1y2…yj given xi is aligned to a gap

Iy(i,j) = optimal score aligning x1x2...xi to y1y2…yj given yj is aligned to a gap

Page 103: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1. xi is aligned to yj

x1...xi-1 xi

y1...yj-1 yj

2. xi is aligned to a gap

x1...xi-2 xi-1 xi

y1...yj - -

3. yj is aligned to a gap

x1...xi - - -

y1...yj-3 yj-2 yj-1 yj

M(i - 1,j - 1) + m, if xi = yj

M(i - 1,j - 1) - s, if xi ≠ yj

Ix(i - 1,j - 1) + m, if xi = yj

Ix(i - 1,j - 1) - s, if xi ≠ yj

Iy(i - 1,j - 1) - m, if xi = yj

Iy(i - 1,j - 1) - s, if xi ≠ yj

M(i,j) = max

M(i - 1,j) – d (opening)Ix(i - 1,j) – e (extending)

Ix(i,j) = max

M(i,j - 1) – d (opening)Iy(i,j - 1) – e (extending)

Iy(i,j) = max

Updating the Score (cont.)Updating the Score (cont.)

Now we assume that Ix and Iy can not jump between each other.

Page 104: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

HMMs for Sequence HMMs for Sequence AlignmentAlignment

),( ji yxs

),( ji yxs),( ji yxs

d de e

MΔ(+1,+1)

Iy

Δ(0,+1)Ix

Δ(+1,0)

Δ(i,j) = change in indices when the state is entered

ji

jiji yxs

yxmyxs

if ,

if ,),(

Page 105: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Needleman-Wunsch Algorithm With Affine Needleman-Wunsch Algorithm With Affine Gap Penalties (FSA Representation)Gap Penalties (FSA Representation)

•The recursive equations of dynamic programming have an elegant representation as a finite-state automaton (FSA).

•A finite state machine (FSM) or finite state automaton (plural: automata), or simply a state machine, is a model of behavior composed of a finite number of states, transitions between those states, and actions.

•The new value of the state variable at indices (i,j) is the maximum of the scores corresponding to the transitions coming into the state.

•Each transition score is given by the values of the source state at the offsets specified by the Δ(i,j) of the target state, plus the specified score increment.

•This type of representation corresponds to a finite-state automaton (FSA) common in computer science.

Page 106: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

•An alignment corresponds to a path through the states, with symbols from the underlying pair of sequences being transferred to the alignment according to the Δ(i,j) values in the states.

M M

Ix Ix

M

M

Iy

M

x = V L S P A D Ky = H L A E S K

Consider the alignment

x = V L S P A D - Ky = H L - - A E S K

Page 107: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Pair HMMsPair HMMs• We would like to transform the FSA for the Needleman-Wunsch affine gap penalty algorithm into a HMM.

• Why?

• The HMM methods allow us to use the resulting probabilistic model to explore questions about the reliability of the alignment obtained by dynamic programming, and to explore alternative (suboptimal) alignments.

• By weighting all alternatives probabilistically, we will be able to score the similarity of two sequences independent of any specific alignment.

• We can also build more specialized probabilistic models out of simple pieces, to model more complex versions of sequence alignment.

• How? Two issues need to be resolved.

• Emission probabilities and transition probabilities must be established.

We will keep the parameters arbitrary. The model must be fitted to training data to estimate the parameter values.

Page 108: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Probabilistic Model

Mpx y

Ix

px

Iy

py

δ ε

i

i j

j

δε

1- ε1- ε

1 - 2δ

Something is missing...a begin and end state!

Page 109: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Mpx y

Ix

px

Iy

py

δ ε

i

i j

j

δε

1- ε - τ1- ε - τ

1 - 2δ - τ

Eτ τ

τ

B

τ

δδ

1 - 2δ - τ

Pair HMM With Begin and End StatesPair HMM With Begin and End States

Page 110: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

M M

Ix Ix

M

M

Iy

Mx = Vy = H

x = V L S P A D Ky = H L A E S K

LL

S-

P-

AA

DE

-S

KK

Probability of an AlignmentProbability of an Alignment

P = (1 – 2δ – τ)·pVH·(1 – 2δ – τ)·pLL·δ·pS·ε·pP ·(1 – ε – τ)·pAA

·(1 – 2δ – τ)·pDE·δ·pS·(1 – ε – τ)·pKK·τ

Page 111: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

What is the most probable What is the most probable alignment?...Use Viterbi!alignment?...Use Viterbi!

•All the algorithms we have seen for HMMs apply, for example, the Viterbi algorithm, forward-backward, etc.

•There is an extra dimension in the search space because of the extra emitted sequence

•Instead of using Vk(i), we will use Vk(i,j), because an observation of xi does not necessarily mean an observation of yj

•Imagine we have two clocks, one for the sequence x and one for the sequence y that work differently in different time zones

•Vk(i,j) can only advance in certain ways1. In time zone M, both i and j advance2. In time zone Ix, only i advances3. In time zone Iy, only j advances

Page 112: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

1. Initialization Step. //initialize three matricesVM (0,0) = 1, VM (i,0) = VM (0,j) = 0 for i, j > 0VX (0,0) = 0, VX (i,0) = VX (0,j) = 0 for i, j > 0VY (0,0) = 0, VY (i,0) = VY (0,j) = 0 for i, j > 0

2. Main Iteration. //fill in three tablesfor each i = 1 to m for each j = 1 to n VM(i,j) = px y·max {(1 – 2δ – τ)·VM(i-1,j-1), (1 – ε – τ)·VX (i-1,j-1), (1 – ε – τ)·VY (i-1,j-1 }

VX (i,j) = px ·max {δ·VM(i-1,j), ε·VX (i-1,j)}

VY (i,j) = py ·max {δ·VM(i,j-1), ε·VY (i,j-1)}

//keep pointers so that the most probable alignment can be reconstructed

Viterbi Algorithm for Decoding a Pair HMMViterbi Algorithm for Decoding a Pair HMM

Time: O(mn)Space: O(mn)

i j

j

i

Page 113: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

3. Termination. //recover optimal probability and path P* = τ·max {VM(m,n), VX (m,n), VY (m,n)}

//use pointers to reconstruct the most probable alignment

Remark

With the initialization conditions of the Viterbi algorithm for the pair HMM as suggested above (Durbin et al., 1998, p 84), the resulting alignment of two sequences will always start with a matched pair x1, y1 for any two sequences x and y. Hence the alignment generated by a pair HMM with such a restriction on the initialization step may not be the optimal one.

Question

How to change the initialization condition to allow for alignments starting with a gap aligned to a letter in x or y?

Page 114: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Optimal log-odds alignmentOptimal log-odds alignment

• In log-odds terms, we can compute in terms of an additive model with log-odds emission scores and log-odds transition scores.• In practice, this is normally the most practical way to implement pair HMM.• It is possible to merge the emission scores with the transitions as

to produce scores that correspond to the standard terms used in sequence alignment by dynamic programming.

2)1(

21loglog),(

ba

ab

pp

pbas

)21)(1(

)1(log

d

1

loge

Page 115: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Example: Pair HMMExample: Pair HMM

The Pair HMM shown on Slide 109 generates two aligned DNA sequences x and y. State M emits aligned pairs of nucleotides with emission probabilities defined as follows: PTT = PCC = PAA = PGG = 0.5, PCT = PAG = 0.05, PAT = PGC = 0.3, PGT = PAC = 0.15.

The insert states X and Y emit (aligned with gaps) symbols from sequences x and y, respectively. The emission probabilities are the same for both insert states: pA = pC = pG = pT = 0.25

No symbols are emitted by begin and end states. The values of other parameters are as follows:

ji yxP

1.0 ,1.0 ,2.0

Page 116: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Pair HMM: ViterbiPair HMM: ViterbiNow consider the Viterbi algorithm to find the optimal alignment of DNA sequences x = TAG and y = TTACG.

Answer:VM (0,0) = 1, VM (i,0) = VM (0,j) = 0 for i, j > 0VX (0,0) = 0, VX (i,0) = VX (0,j) = 0 for i, j > 0VY (0,0) = 0, VY (i,0) = VY (0,j) = 0 for i, j > 0

We start calculations as follows:

and continue by using equations on slide112, filling the computed probability values in the cells of the V matrix (next slide). At the termination step we have:

The traceback through the V matrix determines the optimal path :

which corresponds to the alignment T T A C G T - A - G

0)1,1( ,0)1,1( ,25.0)0,0()1,1( yXMMMTTM vvvpv

*)1,1()2,1()3,2()4,2()5,3( MYMYM vvvvv

510))5,3(),5,3(),5,3((max *),,( YXM vvvyxP

Page 117: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

j=0 j=1 j=2 j=3 j=4 j=5 V(i,j) - T T A C Gi=0 - 1 0.0000000 0.000000 0.0000000 0.000e+00 0.000e+00 - 0 0.0000000 0.000000 0.0000000 0.000e+00 0.000e+00 - 0 0.0000000 0.000000 0.0000000 0.000e+00 0.000e+00

i=1 T 0 0.2500000 0.000000 0.0000000 0.000e+00 0.000e+00 T 0 0.0000000 0.000000 0.0000000 0.000e+00 0.000e+00 T 0 0.0000000 0.012500 0.0003125 7.813e-06 1.953e-07

i=2 A 0 0.0000000 0.037500 0.0050000 3.750e-05 3.125e-07 A 0 0.0125000 0.000000 0.0000000 0.000e+00 0.000e+00 A 0 0.0000000 0.000000 0.0018750 2.500e-04 6.250e-06

i=3 G 0 0.0000000 0.001500 0.0009375 7.500e-04 1.000e-04 G 0 0.0003125 0.001875 0.0002500 1.875e-06 1.563e-08 G 0 0.0000000 0.000000 0.0000750 4.688e-05 3.750e-05

The matrix of probability values V(i,j) determined by the Viterbi algorithm. Each cell (i,j) contains three values, VM (i,j), VX (i,j) and VY (i,j), written in top down order. Entries on the optimal path are shown in red bold.

Page 118: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Pair HMM: ForwardPair HMM: ForwardNow find P(x,y) for DNA sequences x = TAG and y = TTACG using the forward algorithm.

Answer: (the forward variables are on next slide)

Initial valuefM (0,0) = 1, fX (0,0) = fY (0,0) = 0, for any i, j and M, X or Y matrices,f (i, -1) = f (-1,j) = 0

Main iteration

At the termination step we have:510632.3))5,3()5,3()5,3(( ),( YXM fffyxP

))1,1()1,1()1,1((),( jifjifjifpjif YYMXXMMMMyxM ji

)),1(),1((),( jifjifpjif XXXMMXxX i

))1,()1,((),( jifjifpjif YYYMMYyY j

Page 119: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

j=0 j=1 j=2 j=3 j=4 j=5f(i,j) - T T A C Gi=0 - 1.000e+00 0.0000000 0.00e+00 0.000e+00 0.000e+00 0.000e+00 - 0.000e+00 0.0000000 0.00e+00 0.000e+00 0.000e+00 0.000e+00 - 0.000e+00 0.0500000 1.25e-03 3.125e-05 7.813e-07 1.953e-08

i=1 T 0.000e+00 0.2500000 2.00e-02 3.000e-04 1.250e-06 9.375e-08 T 5.000e-02 0.0000000 0.00e+00 0.000e+00 0.000e+00 0.000e+00 T 0.000e+00 0.0000000 1.25e-02 1.313e-03 4.781e-05 1.258e-06

i=2 A 0.000e+00 0.0120000 3.75e-02 1.000e-02 1.800e-04 1.944e-06 A 1.250e-03 0.0125000 1.00e-03 1.500e-05 6.250e-08 4.688e-09 A 0.000e+00 0.0000000 6.00e-04 1.890e-03 5.473e-04 2.268e-05

i=3 G 0.000e+00 0.0001500 2.40e-03 1.002e-03 1.957e-03 2.639e-04 G 3.125e-05 0.0009125 1.90e-03 5.004e-04 9.002e-06 9.730e-08 G 0.000e+00 0.0000000 7.50e-06 1.202e-04 5.308e-05 9.919e-05

Forward variables f(i,j) determined by the forward algorithm. Each cell (i,j) contains three values, fM (i,j), fX (i,j) and fY (i,j), written in top down order.

Page 120: Hidden Markov Models BIOL337/STAT337/437 Spring Semester 2014.

Pair HMM: an example questionPair HMM: an example question

Question: For sequences x = TAG and y = TTACG find the posterior probability of the optimal alignment obtained by the Viterbi algorithm for the pair HMM as described above.

Answer: The posterior probability of path is given by

From the previous calculations, we have and , therefore,

),|*( yxp

*

),(

*),,(),|*(

yxp

yxpyxp

510*),,( yxP510632.3),( yxP

2753.010632.3

10),|*(

5

5

yxp