Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition...

Post on 15-Jan-2016

252 views 0 download

Transcript of Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition...

Expectation-Maximization (EM)

Chapter 3 (Duda et al.) – Section 3.9

CS479/679 Pattern RecognitionDr. George Bebis

Expectation-Maximization (EM)

• EM is an iterative method to perform ML estimation:

– Starts with an initial estimate for θ.

– Refines the current estimate iteratively to increase the likelihood of the observed data:

p(D/ θ)

Expectation-Maximization (EM)

• EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete)

– Some creativity is required to recognize where the EM algorithm can be used.

– Standard method for estimating the parameters of Mixtures of Gaussians (MoG).

Incomplete Data

• Many times, it is impossible to apply ML estimation because certain features cannot be measured directly.

• The EM algorithm is ideal for problems with unobserved (missing) data.

Example (Moon, 1996)

xx11!x!x22!x!x33!!

x1+x2+x3=k

Assume a trinomialdistribution:

s

k!

Example (Moon, 1996) (cont’d)

EM: Main Idea

• If x was available, we could use ML to estimate θ, i.e.,

• Since x is not available:

Maximize the expectation of ln p(Dx / θ) with

respect to the unknown variables given Dy and an estimate of θ.

θθ̂ arg max ln ( / θ)xp D

( ; ) (ln ( / ) / , )unobserved

t tx x yQ E p D D

EM Steps

(1) Initialization(2) Expectation(3) Maximization(4) Test for convergence

EM Steps (cont’d)

(1) Initialization Step: initialize the algorithm with a guess θ0

(2) Expectation step: it is performed with respect to the unobserved variables, using the current estimate of parameters and conditioned upon the observations:

– When ln p(Dx / θ) is a linear function of the unobserved variables, the expectation step is equivalent to:

( ; ) (ln ( / ) / , )unobserved

t tx x yQ E p D D

( ; ) ( / , )t tunobserved yQ E x D

EM Steps (cont’d)

(3) Maximization Step: provides a new estimate of the parameters:

(4) Test for Convergence:

stop; otherwise, go to Step 2.

t+1 tθθ arg max (θ;θ )Q

t+1 t|θ - θ | if

Example (Moon, 1996) (cont’d)

xx11!x!x22!x!x33!!

• Suppose:k!

k!

Example (Moon, 1996) (cont’d)

Let’s look at the M-step for a minute before completing the E-step …

• Take expected value:

k!

Example (Moon, 1996) (cont’d)

Let’s go back and complete the E-step now …

• We only need to estimate:

22ΣΣii

ΣΣii

Example (Moon, 1996) (cont’d)

(see Moon’s paper, page 53, for a proof)

Example (Moon, 1996) (cont’d)

• Initialization: θ0

• Expectation Step:

• Maximization Step:

• Convergence Step: t+1 t|θ -θ |

22ΣΣii

ΣΣii

Example (Moon, 1996) (cont’d)

θθtt

Convergence properties of EM

• The solution depends on the initial estimate θ0

• At each iteration, a value of θ is computed so that the likelihood function does not decrease.

• There is no guarantee that it will convergence to a global maximum.

• The algorithm is guaranteed to be stable. • i.e., there is no chance of "overshooting" or diverging

from the maximum.

Expectation-Maximization (EM)

• EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete)

– Some creativity is required to recognize where the EM algorithm can be used.

– Standard method for estimating the parameters of Mixtures of Gaussians (MoG).

Mixture of 2D Gaussians - Example

Mixture Model

ππ11

ππ22ππ33

ππkk

Mixture of 1D Gaussians - Example

π1=0.3

π2=0.2

π3=0.5

Mixture Parameters

Fitting a Mixture Model toa set of observations Dx

• Two fundamental problems:

(1) Estimate the number of mixture components K

(2) Estimate mixture parameters (πk , θk), k=1,2,…,K

Mixtures of Gaussians(see Chapter 10)

where each p(x/θ)=

• The parameters θk are (μk,Σk)

Mixtures of Gaussians (cont’d)

ππ11

ππ22ππ33

ππkk

Estimating Mixture Parameters Using ML – not easy!

Estimating Mixture Parameters Using EM: Case of Unknown Means• Assumptions

Observation

… but we don’t!

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)

• Introduce hidden or unobserved variables zi

• Main steps using EM

Estimating Mixture Parameters Using EM: Case of Unknown Means

(cont’d)

Estimating Mixture Parameters Using EM: Case of Unknown Means

(cont’d)• Expectation Step

Estimating Mixture Parameters Using EM: Case of Unknown Means

(cont’d)• Expectation Step

Estimating Mixture Parameters Using EM: Case of Unknown Means

(cont’d)• Expectation Step

Estimating Mixture Parameters Using EM: Case of Unknown Means

(cont’d)• Expectation Step

E(zik) is just the probability that xi was generated by the k-th

component:

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)

• Maximization Step

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)

• Summary

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)

• Summary

Estimating Mixture Parameters Using EM: General Case

• Need to review Lagrange Optimization first …

Lagrange Optimization

g(x)=0

solve forx and λ

n+1 equations / n+1 unknowns

Lagrange Optimization (cont’d)

• Example

Maximize f(x1,x2)=x1x2 subject to the constraint g(x1,x2)=x1+x2-1=0

1 22

1

1 21

2

1 2

( , , )0

( , , )0

1 0

L x xx

x

L x xx

x

x x

1 2 1 2 1 2( , , ) ( , ) ( , )L x x f x x g x x

3 equations / 3 unknowns

Estimating Mixture Parameters Using EM: General Case

• Introduce hidden or unobserved variables zi

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Expectation Step

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Expectation Step (cont’d)

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Expectation Step (cont’d)

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Maximization Stepuse Lagrangeoptimization

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Maximization Step (cont’d)

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Summary

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Summary

Estimating the Number of Components K