EM algorithm and applications Lecture #9

of 27 /27
. EM algorithm and applications Lecture #9 Background Readings : Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

Embed Size (px)

description

EM algorithm and applications Lecture #9. Background Readings : Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis , Durbin et al., 2001. The EM algorithm. This lecture plan: Presentation and Correctness Proof of the EM algorithm. Examples of Implementations . - PowerPoint PPT Presentation

Transcript of EM algorithm and applications Lecture #9

No Slide Title

. EM algorithm and applicationsLecture #9Background Readings: Chapters 11.2, 11.6 in the text book, Biological Sequence Analysis, Durbin et al., 2001.12This lecture plan:Presentation and Correctness Proof of the EM algorithm.Examples of Implementations

The EM algorithm23A model with parameters is a probabilistic space M, in which each simple event y is determined by values of random variables (dice). The parameters are the probabilities associated with the random variables.(In HMM of length L, the simple events are HMM-sequences of length L, and the parameters are the transition probabilities mkl and the emission probabilities ek(b)).An observed data is a non empty subset x M. (In HMM, it is usually all the simple events which fit with a given output sequence).Given observed data x, the ML method seeks parameters * which maximize the likelihood of the data p(x|)=yp(x,y|).(In HMM, x can be the transmitted letters ,and y the hidden states) Finding such * is easy when the observed data is a simple event, but hard in general.

Model, Parameters, ML3Assume a model with parameters as in the previous slide.Given observed data x, the likelihood of x under model parameters is given by p(x|)=yp(x,y|).(The pairs (x,y) are the simple events which comprise x. Informally, y denotes the possible values of thehidden data).The EM algorithm receives x and parameters , and returns new parameters * s.t. p(x|*) p(x|), with equality only if *=. i.e., the new parameters increase the likelihood of the observed data.

4The EM algorithm45EM uses the current parameters to construct a simpler ML problem L:

Guarantee: if L()>L(), than P(x| )>P(x| ).log P(x| )The EM algorithmThe graphs below are the logarithms of the likelihood functionsLog(L)= E [log P(x,y|)]*

56Let x be the observed data. Let {(x,y1),,(x,yk)} be the set of (simple) events which comprise x.Our goal is to find parameters * which maximize the sum

As this is hard, we start with some parameters , and only find * s.t. if * then:

Derivation of the EM Algorithm

Finding * is obtained via virtual sampling, defined next.67 For given parameters , Let pi= p(yi|x,). (note that p1++pk=1).We use the pis to define virtual sampling, in which:y1 occurs p1 times, y2 occurs p2 times, yk occurs pk times

78In each iteration the EM algorithm does the following. (E step): Given , compute the function

(M step): Find * which maximizes L ()(Next iteration sets * and repeat).

The EM algorithmComment: At the M-step we only need that L(*)>L(). This change yields the so called Generalized EM algorithm. It is used when it is hard to find the optimal *.Usually, the computations use the function:

89Correctness Theorem for the EM Algorithm

910Correctness proof of EM

10

11Correctness proof of EM (end)1112The Baum-Welsh algorithm is the EM algorithm for HMM:

E step for HMM:

where are the new parameters {mkl,ek(b)}. M step for HMM: look for which maximizes L ().

Example: Baum Welsh = EM for HMM

1213Baum Welsh = EM for HMM (cont)

MklEk(b)1314A simple example: EM for 2 coin tossesConsider the following experiment:Given a coin with two possible outcomes: H (head) and T (tail), with probabilities H, T = 1- H.The coin is tossed twice, but only the 1st outcome, T, is seen. So the data is x = (T,*).We wish to apply the EM algorithm to get parameters that increase the likelihood of the data. Let the initial parameters be = (H, T) = ( , ).1415EM for 2 coin tosses (cont)The hidden data which produce x are the sequences y1= (T,H); y2=(T,T); Hence the likelihood of x with parameters (H, T), is p(x| ) = P(x,y1 |) + P(x,y2 |) = qHqT+qT2 For the initial parameters = ( , ), we have:p(x| ) = + = Note that in this case P(x,yi |) = P(yi |), for i = 1,2. we can always define y so that (x,y) = y (otherwise we set y (x,y) and replace the y s by y s).1516EM for 2 coin tosses - E stepCalculate L () = L(H,T). Recall: H,T are the new parameters, which we need to optimizep(y1|x,) = p(y1,x|)/p(x|) = ( )/ () = p(y2|x,) = p(y2,x|)/p(x|) = ( )/ () =

Thus we have

This is the virtual sampling1617EM for 2 coin tosses - E stepFor a sequence y of coin tosses, let NH(y) be the number of Hs in y, and NT(y) be the number of Ts in y. Then

In our example:y1= (T,H); y2=(T,T), hence:NH(y1) = NT(y1)=1, NH(y2) =0, NT(y2)=2

1718

Thus

Example: 2 coin tosses - E step

NT= 7/4NH=

And in general:1819EM for 2 coin tosses - M stepFind * which maximizes L ()And as we already saw, is maximized when:

[The optimal parameters (0,1), will never be reached by the EM algorithm!]

1920Let Nk be the expected value of Nk(y), given x and :Nk=E(Nk|x,) = y p(y|x,) Nk(y),EM for single random variable (dice) Now, the probability of each y ((x,y)) is given by a sequence of dice tosses. The dice has m outcomes, with probabilities 1,..,m.Let Nk(y) = #(outcome k occurs in y). Then

Then we have:2021

L () for one diceNk2122EM algorithm for n independent observations x1,, xn :Expectation stepIt can be shown that, if the xj are independent, then:

2223Example: The ABO locusA locus is a particular place on the chromosome. Each locus state (called genotype) consists of two alleles one parental and one maternal. Some loci (plural of locus) determine distinguished features. The ABO locus, for example, determines blood type.

Suppose we randomly sampled N individuals and found that Na/a have genotype a/a, Na/b have genotype a/b, etc. Then, the MLE is given by:The ABO locus has six possible genotypes {a/a, a/o, b/o, b/b, a/b, o/o}. The first two genotypes determine blood type A, the next two determine blood type B, then blood type AB, and finally blood type O.We wish to estimate the proportion in a population of the 6 genotypes.2324The ABO locus (Cont.)However, testing individuals for their genotype is a very expensive. Can we estimate the proportions of genotype using the common cheap blood test with outcome being one of the four blood types (A, B, AB, O) ?The problem is that among individuals measured to have blood type A, we dont know how many have genotype a/a and how many have genotype a/o. So what can we do ?2425The ABO locus (Cont.)The Hardy-Weinberg equilibrium rule states that in equilibrium the frequencies of the three alleles qa,qb,qo in the population determine the frequencies of the genotypes as follows: qa/b= 2qa qb, qa/o= 2qa qo, qb/o= 2qb qo, qa/a= [qa]2, qb/b= [qb]2, qo/o= [qo]2.

In fact, Hardy-Weinberg equilibrium rule follows from modeling this problem as data x with hidden parameters y:2526The ABO locus (Cont.)The dice outcome are the three possible alleles a, b and o. The observed data are the blood types A, B, AB or O.Each blood type is determined by two successive random sampling of alleles, which is an ordered genotypes pair this is the hidden data. A ={(a,a), (a,o),(o,a)}; B={(b,b),(b,o),(o,b); AB={(a,b),(b,a)}; O={(o,o)}.

So we have three parameters of one dice qa,qb,qo - that we need to estimate.We start with parameters = (qa,qb,qo), and then use EM to improve them.

2627EM setting for the ABO locusThe observed data x =(x1,..,xn) is a sequence of elements (blood types) from the set {A,B,AB,O}. eg: (B,A,B,B,O,A,B,A,O,B, AB) are observations (x1,x11).

The hidden data (ie the ys) for each xj is the set of ordered pairs of alleles that generates it. For instance, for A it is the set {aa, ao, oa}.

The parameters = {qa ,qb, qo} are the (current) probabilities of the alleles.

The complete implementation of the EM algorithm for this problem will be given in the tutorial.

27