Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

47
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis

description

Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9. CS479/679 Pattern Recognition Dr. George Bebis. Expectation-Maximization (EM). EM is an iterative method to perform ML estimation: Starts with an initial estimate for θ . - PowerPoint PPT Presentation

Transcript of Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Page 1: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Expectation-Maximization (EM)

Chapter 3 (Duda et al.) – Section 3.9

CS479/679 Pattern RecognitionDr. George Bebis

Page 2: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Expectation-Maximization (EM)

• EM is an iterative ML estimation method:

– Starts with an initial estimate for θ.

– Refines the current estimate iteratively to increase the likelihood of the observed data:

p(D/ θ)

Page 3: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Expectation-Maximization (cont’d)

• EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete)

– Some creativity is required to recognize where the EM algorithm can be used.

– Standard method for estimating the parameters of Mixtures of Gaussians (MoG).

Page 4: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Incomplete Data

• Many times, it is impossible to apply ML estimation because certain features cannot be measured directly.

• The EM algorithm is ideal for problems with unobserved (missing) data.

Page 5: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Example (Moon, 1996)

xx11!x!x22!x!x33!!

x1+x2+x3=k

Assume a trinomialdistribution:

s

k!

Page 6: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Example (Moon, 1996) (cont’d)

y1

y2

Page 7: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

EM: Main Idea

• If x was available, we could estimate θ using ML:

• Given that only y is available, estimate θ by:

Maximizing the expectation of ln p(Dx / θ) (with

respect to the unknown variables) given Dy and an estimate of θ.

θθ̂ arg max ln ( / θ)xp D

( ; ) (ln ( / ) / , )unobserved

t tx x yQ E p D D

Page 8: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

EM Steps

(1) Initialization(2) E-Step: Expectation(3) M-Step: Maximization(4) Test for convergence

Page 9: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

EM Steps (cont’d)

(1) Initialization Step: initialize the algorithm with a guess θ0

(2) Expectation step: it is performed with respect to the unobserved variables, using the current estimate of parameters and conditioned upon the observations:

– Note: if ln p(Dx / θ) is a linear function of the unobserved variables, the expectation step is equivalent to:

( ; ) (ln ( / ) / , )unobserved

t tx x yQ E p D D

( ; ) ( / , )t tunobserved yQ E x D

Page 10: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

EM Steps (cont’d)

(3) Maximization Step: provides a new estimate of the parameters:

(4) Test for Convergence:

stop; otherwise, go to Step 2.

t+1 tθθ arg max (θ;θ )Q

t+1 t|θ - θ | if

Page 11: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Example (Moon, 1996) (cont’d)

xx11!x!x22!x!x33!!

k!

k!

where xi=(xi1,xi2,xi3)

Page 12: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Example (Moon, 1996) (cont’d)

Let’s look at the M-step before completing the E-step …

• Take the expected value:

k!

Page 13: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Example (Moon, 1996) (cont’d)

Let’s complete the E-step now …

• We only need to estimate:

22ΣΣii

ΣΣii

=

Page 14: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Example (Moon, 1996) (cont’d)

(see Moon’s paper, page 53)

Page 15: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Example (Moon, 1996) (cont’d)

• Initialization: θ0

• Expectation Step:

• Maximization Step:

• Convergence Step: t+1 t|θ -θ |

22ΣΣii

ΣΣii

Page 16: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Example (Moon, 1996) (cont’d)

θθtt

Page 17: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Convergence properties of EM

• The solution depends on the initial estimate θ0

• At each iteration, a value of θ is computed so that the likelihood function does not decrease.

• The algorithm is guaranteed to be stable (i.e., does not oscillate).

• There is no guarantee that it will convergence to a global maximum.

Page 18: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

• EM is the standard method for estimating the parameters of “mixture models”.

Mixture Models

Example:

mixture of2D Gaussians

Page 19: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Mixture Model (cont’d)

ππ11

ππ22ππ33

ππkk

Page 20: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Mixture of 1D Gaussians - Example

π1=0.3

π2=0.2

π3=0.5

Page 21: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Mixture Model (cont’d)

Page 22: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating the parameters of a Mixture Model

• Two fundamental problems:

(1) Estimate the number of mixture components K

(2) Estimate mixture parameters (πk, θk), k=1,2,…,K

Page 23: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Mixtures of Gaussians(Chapter 10)

where p(x/θk)=

• In this case, θk = (μk,Σk)

kk

k k

Page 24: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Data Generation Process Using Mixtures of Gaussians

ππ11

ππ22ππ33

ππkk

Page 25: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using ML – not easy!

• ML works my maximizing:

• The density function is a mixture:

• Using ML, we must maximize:

Page 26: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using EM: Case of Unknown Means• Assumptions

Observation

… but we don’t!

Page 27: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)

• Introduce hidden or unobserved variables zi

Page 28: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

• Main steps using EM

Estimating Mixture Parameters Using EM: Case of Unknown Means

(cont’d)

Page 29: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using EM: Case of Unknown Means

(cont’d)• Expectation Step

substitute

Page 30: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using EM: Case of Unknown Means

(cont’d)• Expectation Step

Page 31: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using EM: Case of Unknown Means

(cont’d)• Expectation Step

Page 32: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using EM: Case of Unknown Means

(cont’d)• Expectation Step

E(zik) is just the probability that xi was generated by the k-th

component:

Page 33: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)

• Maximization Step

Page 34: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)

Page 35: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)

Page 36: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using EM: General Case

• Need to review Lagrange Optimization first …

Page 37: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Lagrange Optimization

g(x)=0

solve forx and λ

n+1 equations / n+1 unknowns

Page 38: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Lagrange Optimization (cont’d)

• Example

Maximize f(x1,x2)=x1x2 subject to the constraint

g(x1,x2)=x1+x2-1=0

1 22

1

1 21

2

1 2

( , , )0

( , , )0

1 0

L x xx

x

L x xx

x

x x

1 2 1 2 1 2( , , ) ( , ) ( , )L x x f x x g x x

3 equations / 3 unknowns

Page 39: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using EM: General Case

• Introduce hidden or unobserved variables zi

Page 40: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Expectation Step

substitute

Page 41: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Expectation Step (cont’d)

Page 42: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Expectation Step (cont’d)

Page 43: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Maximization Stepuse Lagrangeoptimization

g(x)=0

n

Page 44: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Maximization Step (cont’d)

Page 45: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using EM: General Case (cont’d)

Page 46: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating Mixture Parameters Using EM: General Case (cont’d)

Page 47: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

Estimating the Number of Components K