Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition...

48
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis

Transcript of Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition...

Page 1: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Expectation-Maximization (EM)

Chapter 3 (Duda et al.) – Section 3.9

CS479/679 Pattern RecognitionDr. George Bebis

Page 2: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Expectation-Maximization (EM)

• EM is an iterative method to perform ML estimation:

– Starts with an initial estimate for θ.

– Refines the current estimate iteratively to increase the likelihood of the observed data:

p(D/ θ)

Page 3: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Expectation-Maximization (EM)

• EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete)

– Some creativity is required to recognize where the EM algorithm can be used.

– Standard method for estimating the parameters of Mixtures of Gaussians (MoG).

Page 4: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Incomplete Data

• Many times, it is impossible to apply ML estimation because certain features cannot be measured directly.

• The EM algorithm is ideal for problems with unobserved (missing) data.

Page 5: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Example (Moon, 1996)

xx11!x!x22!x!x33!!

x1+x2+x3=k

Assume a trinomialdistribution:

s

k!

Page 6: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Example (Moon, 1996) (cont’d)

Page 7: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

EM: Main Idea

• If x was available, we could use ML to estimate θ, i.e.,

• Since x is not available:

Maximize the expectation of ln p(Dx / θ) with

respect to the unknown variables given Dy and an estimate of θ.

θθ̂ arg max ln ( / θ)xp D

( ; ) (ln ( / ) / , )unobserved

t tx x yQ E p D D

Page 8: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

EM Steps

(1) Initialization(2) Expectation(3) Maximization(4) Test for convergence

Page 9: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

EM Steps (cont’d)

(1) Initialization Step: initialize the algorithm with a guess θ0

(2) Expectation step: it is performed with respect to the unobserved variables, using the current estimate of parameters and conditioned upon the observations:

– When ln p(Dx / θ) is a linear function of the unobserved variables, the expectation step is equivalent to:

( ; ) (ln ( / ) / , )unobserved

t tx x yQ E p D D

( ; ) ( / , )t tunobserved yQ E x D

Page 10: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

EM Steps (cont’d)

(3) Maximization Step: provides a new estimate of the parameters:

(4) Test for Convergence:

stop; otherwise, go to Step 2.

t+1 tθθ arg max (θ;θ )Q

t+1 t|θ - θ | if

Page 11: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Example (Moon, 1996) (cont’d)

xx11!x!x22!x!x33!!

• Suppose:k!

k!

Page 12: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Example (Moon, 1996) (cont’d)

Let’s look at the M-step for a minute before completing the E-step …

• Take expected value:

k!

Page 13: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Example (Moon, 1996) (cont’d)

Let’s go back and complete the E-step now …

• We only need to estimate:

22ΣΣii

ΣΣii

Page 14: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Example (Moon, 1996) (cont’d)

(see Moon’s paper, page 53, for a proof)

Page 15: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Example (Moon, 1996) (cont’d)

• Initialization: θ0

• Expectation Step:

• Maximization Step:

• Convergence Step: t+1 t|θ -θ |

22ΣΣii

ΣΣii

Page 16: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Example (Moon, 1996) (cont’d)

θθtt

Page 17: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Convergence properties of EM

• The solution depends on the initial estimate θ0

• At each iteration, a value of θ is computed so that the likelihood function does not decrease.

• There is no guarantee that it will convergence to a global maximum.

• The algorithm is guaranteed to be stable. • i.e., there is no chance of "overshooting" or diverging

from the maximum.

Page 18: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Expectation-Maximization (EM)

• EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete)

– Some creativity is required to recognize where the EM algorithm can be used.

– Standard method for estimating the parameters of Mixtures of Gaussians (MoG).

Page 19: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Mixture of 2D Gaussians - Example

Page 20: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Mixture Model

ππ11

ππ22ππ33

ππkk

Page 21: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Mixture of 1D Gaussians - Example

π1=0.3

π2=0.2

π3=0.5

Page 22: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Mixture Parameters

Page 23: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Fitting a Mixture Model toa set of observations Dx

• Two fundamental problems:

(1) Estimate the number of mixture components K

(2) Estimate mixture parameters (πk , θk), k=1,2,…,K

Page 24: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Mixtures of Gaussians(see Chapter 10)

where each p(x/θ)=

• The parameters θk are (μk,Σk)

Page 25: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Mixtures of Gaussians (cont’d)

ππ11

ππ22ππ33

ππkk

Page 26: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using ML – not easy!

Page 27: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using EM: Case of Unknown Means• Assumptions

Observation

… but we don’t!

Page 28: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)

• Introduce hidden or unobserved variables zi

Page 29: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

• Main steps using EM

Estimating Mixture Parameters Using EM: Case of Unknown Means

(cont’d)

Page 30: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using EM: Case of Unknown Means

(cont’d)• Expectation Step

Page 31: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using EM: Case of Unknown Means

(cont’d)• Expectation Step

Page 32: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using EM: Case of Unknown Means

(cont’d)• Expectation Step

Page 33: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using EM: Case of Unknown Means

(cont’d)• Expectation Step

E(zik) is just the probability that xi was generated by the k-th

component:

Page 34: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)

• Maximization Step

Page 35: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)

• Summary

Page 36: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)

• Summary

Page 37: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using EM: General Case

• Need to review Lagrange Optimization first …

Page 38: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Lagrange Optimization

g(x)=0

solve forx and λ

n+1 equations / n+1 unknowns

Page 39: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Lagrange Optimization (cont’d)

• Example

Maximize f(x1,x2)=x1x2 subject to the constraint g(x1,x2)=x1+x2-1=0

1 22

1

1 21

2

1 2

( , , )0

( , , )0

1 0

L x xx

x

L x xx

x

x x

1 2 1 2 1 2( , , ) ( , ) ( , )L x x f x x g x x

3 equations / 3 unknowns

Page 40: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using EM: General Case

• Introduce hidden or unobserved variables zi

Page 41: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Expectation Step

Page 42: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Expectation Step (cont’d)

Page 43: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Expectation Step (cont’d)

Page 44: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Maximization Stepuse Lagrangeoptimization

Page 45: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Maximization Step (cont’d)

Page 46: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Summary

Page 47: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating Mixture Parameters Using EM: General Case (cont’d)

• Summary

Page 48: Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition Dr. George Bebis.

Estimating the Number of Components K