Download - CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (II)

Transcript

CISC667, F05, Lec11, Liao

CISC 667 Intro to Bioinformatics(Fall 2005)

Hidden Markov Models (II)

• The model likelihood: Forward algorithm, backward algorithm

• Posterior decoding

Page 2: CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (II)

CISC667, F05, Lec11, Liao

The probability that sequence x is emitted by a state path π is:

P(x, π) = ∏i=1 to L eπi (xi) a πi πi+1

i:123456789 x:TGCGCGTAC π :--++++---

P(x, π) = 0.338 × 0.70 × 0.112 × 0.30 × 0.368 × 0.65 × 0.274 × 0.65 × 0.368 × 0.65 × 0.274 × 0.35 × 0.338 × 0.70 × 0.372 × 0.70 × 0.198.

Then, the probability to observe sequence x in the model is P(x) = π P(x, π),

which is also called the likelihood of the model.

A: .170C: .368G: .274T: .188

A: .372C: .198G: .112T: .338

0.35

0.30

0.65 0.70

+ −

Page 3: CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (II)

CISC667, F05, Lec11, Liao

How to calculate the probability to observe sequence x in the model?

P(x) = π P(x, π)

Let fk(i) be the probability contributed by all paths from the beginning up to (and include) position i with the state at position i being k.

The the following recurrence is true:

f k(i) = [ j f j(i-1) ajk ] ek(xi)

Graphically,

x1 x2 x3 xL

Again, a silent state 0 is introduced for better presentation

Page 4: CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (II)

CISC667, F05, Lec11, Liao

Forward algorithm

Initialization: f0(0) =1, fk(0) = 0 for k > 0.

Recursion: fk(i) = ek(xi) jfj(i-1) ajk.

Termination: P(x) = kfk(L) ak0.

Time complexity: O(N2L), where N is the number of states and L is the sequence length.

Page 5: CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (II)

CISC667, F05, Lec11, Liao

Let bk(i) be the probability contributed by all paths that pass state k at position i.

bk(i) = P(xi+1, …, xL | (i) = k)

Backward algorithm

Initialization: bk(L) = ak0 for all k.

Recursion (i = L-1, …, 1): bk(i) = j akj ek(xi+1) fj(i+1).

Termination: P(x) = k a0k ek(x1)bk(1).

Time complexity: O(N2L), where N is the number of states and L is the sequence length.

Page 6: CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (II)

CISC667, F05, Lec11, Liao

Posterior decoding

P(πi = k |x) = P(x, πi = k) /P(x) = fk(i)bk(i) / P(x)

Algorithm:

for i = 1 to L

do argmax k P(πi = k |x)

Notes: 1. Posterior decoding may be useful when there are multiple almost most probable paths, or when a function is defined on the states.

2. The state path identified by posterior decoding may not be most probable overall, or may not even be a viable path.

Page 7: CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (II)

CISC667, F05, Lec11, Liao

• The log transformation for Viterbi algorithm

vk(i) = ek(xi) maxj (vj(i-1) ajk);

ajk = log ajk;

ek(xi) = log ek(xi);

vk(i) = log vk(i);

vk(i) = ek(xi) + maxj (vj(i-1) + ajk);

Top Related

CISC 235: Topic 1

BMC Bioinformatics BioMed Central - IIT Bombay · from The Eighth Asia Pacific Bioinformatics ... India 18-21 January 2010 Published: 18 January 2010 BMC Bioinformatics 2010, 11(Suppl

Aneta T. Petkova Wai-Ming Yau, and Robert Tycko Laboratory ...faculty.uml.edu/vbarsegov/teaching/bioinformatics/papers/petkova.pdf · As shown by Petkova et al., seeded growth leads

NIH Center for Macromolecular Modeling and Bioinformatics ...€¦ · Alpha helix Extended beta Isolated bridge 3-10 helix Beta turn None (coil) VMD Timeline plug-in : graphing and

CLONING, CHARACTERIZATION AND BACTERIAL … December... · Bioinformatics analysis showed that PjβAS had more than 80% homology with βAS from P. ginseng, Aralia elata, Betula platyphylla,

Basics of X-ray scattering by solutions - EMBL Hamburg · 29/11/2012 1 Basics of X-ray scattering by solutions D.Svergun, EMBL-Hamburg EM Crystallography NMR Biochemistry FRET Bioinformatics

Bioinformatics Data Skills Working with Python · 2019-09-20 · Bioinformatics Data Skills ΤαμπόσηςΙωάννης Researcher & Software Development Engineer PhD Candidate,

Meta data and bioinformatics Bioinformatics is EBI-centred, loosely organised Bioinformatics was coined by Pauline Hogekamp ~1979 European bioinformatics.