Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition...
-
Upload
russell-davidson -
Category
Documents
-
view
252 -
download
0
Transcript of Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9 CS479/679 Pattern Recognition...
Expectation-Maximization (EM)
Chapter 3 (Duda et al.) – Section 3.9
CS479/679 Pattern RecognitionDr. George Bebis
Expectation-Maximization (EM)
• EM is an iterative method to perform ML estimation:
– Starts with an initial estimate for θ.
– Refines the current estimate iteratively to increase the likelihood of the observed data:
p(D/ θ)
Expectation-Maximization (EM)
• EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete)
– Some creativity is required to recognize where the EM algorithm can be used.
– Standard method for estimating the parameters of Mixtures of Gaussians (MoG).
Incomplete Data
• Many times, it is impossible to apply ML estimation because certain features cannot be measured directly.
• The EM algorithm is ideal for problems with unobserved (missing) data.
Example (Moon, 1996)
xx11!x!x22!x!x33!!
x1+x2+x3=k
Assume a trinomialdistribution:
s
k!
Example (Moon, 1996) (cont’d)
EM: Main Idea
• If x was available, we could use ML to estimate θ, i.e.,
• Since x is not available:
Maximize the expectation of ln p(Dx / θ) with
respect to the unknown variables given Dy and an estimate of θ.
θθ̂ arg max ln ( / θ)xp D
( ; ) (ln ( / ) / , )unobserved
t tx x yQ E p D D
EM Steps
(1) Initialization(2) Expectation(3) Maximization(4) Test for convergence
EM Steps (cont’d)
(1) Initialization Step: initialize the algorithm with a guess θ0
(2) Expectation step: it is performed with respect to the unobserved variables, using the current estimate of parameters and conditioned upon the observations:
– When ln p(Dx / θ) is a linear function of the unobserved variables, the expectation step is equivalent to:
( ; ) (ln ( / ) / , )unobserved
t tx x yQ E p D D
( ; ) ( / , )t tunobserved yQ E x D
EM Steps (cont’d)
(3) Maximization Step: provides a new estimate of the parameters:
(4) Test for Convergence:
stop; otherwise, go to Step 2.
t+1 tθθ arg max (θ;θ )Q
t+1 t|θ - θ | if
Example (Moon, 1996) (cont’d)
xx11!x!x22!x!x33!!
• Suppose:k!
k!
Example (Moon, 1996) (cont’d)
Let’s look at the M-step for a minute before completing the E-step …
• Take expected value:
k!
Example (Moon, 1996) (cont’d)
Let’s go back and complete the E-step now …
• We only need to estimate:
22ΣΣii
ΣΣii
Example (Moon, 1996) (cont’d)
(see Moon’s paper, page 53, for a proof)
Example (Moon, 1996) (cont’d)
• Initialization: θ0
• Expectation Step:
• Maximization Step:
• Convergence Step: t+1 t|θ -θ |
22ΣΣii
ΣΣii
Example (Moon, 1996) (cont’d)
θθtt
Convergence properties of EM
• The solution depends on the initial estimate θ0
• At each iteration, a value of θ is computed so that the likelihood function does not decrease.
• There is no guarantee that it will convergence to a global maximum.
• The algorithm is guaranteed to be stable. • i.e., there is no chance of "overshooting" or diverging
from the maximum.
Expectation-Maximization (EM)
• EM represents a general framework – works best in situations where the data is incomplete (or can be thought as being incomplete)
– Some creativity is required to recognize where the EM algorithm can be used.
– Standard method for estimating the parameters of Mixtures of Gaussians (MoG).
Mixture of 2D Gaussians - Example
Mixture Model
ππ11
ππ22ππ33
ππkk
Mixture of 1D Gaussians - Example
π1=0.3
π2=0.2
π3=0.5
Mixture Parameters
Fitting a Mixture Model toa set of observations Dx
• Two fundamental problems:
(1) Estimate the number of mixture components K
(2) Estimate mixture parameters (πk , θk), k=1,2,…,K
Mixtures of Gaussians(see Chapter 10)
where each p(x/θ)=
• The parameters θk are (μk,Σk)
Mixtures of Gaussians (cont’d)
ππ11
ππ22ππ33
ππkk
Estimating Mixture Parameters Using ML – not easy!
Estimating Mixture Parameters Using EM: Case of Unknown Means• Assumptions
Observation
… but we don’t!
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)
• Introduce hidden or unobserved variables zi
• Main steps using EM
Estimating Mixture Parameters Using EM: Case of Unknown Means
(cont’d)
Estimating Mixture Parameters Using EM: Case of Unknown Means
(cont’d)• Expectation Step
Estimating Mixture Parameters Using EM: Case of Unknown Means
(cont’d)• Expectation Step
Estimating Mixture Parameters Using EM: Case of Unknown Means
(cont’d)• Expectation Step
Estimating Mixture Parameters Using EM: Case of Unknown Means
(cont’d)• Expectation Step
E(zik) is just the probability that xi was generated by the k-th
component:
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)
• Maximization Step
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)
• Summary
Estimating Mixture Parameters Using EM: Case of Unknown Means (cont’d)
• Summary
Estimating Mixture Parameters Using EM: General Case
• Need to review Lagrange Optimization first …
Lagrange Optimization
g(x)=0
solve forx and λ
n+1 equations / n+1 unknowns
Lagrange Optimization (cont’d)
• Example
Maximize f(x1,x2)=x1x2 subject to the constraint g(x1,x2)=x1+x2-1=0
1 22
1
1 21
2
1 2
( , , )0
( , , )0
1 0
L x xx
x
L x xx
x
x x
1 2 1 2 1 2( , , ) ( , ) ( , )L x x f x x g x x
3 equations / 3 unknowns
Estimating Mixture Parameters Using EM: General Case
• Introduce hidden or unobserved variables zi
Estimating Mixture Parameters Using EM: General Case (cont’d)
• Expectation Step
Estimating Mixture Parameters Using EM: General Case (cont’d)
• Expectation Step (cont’d)
Estimating Mixture Parameters Using EM: General Case (cont’d)
• Expectation Step (cont’d)
Estimating Mixture Parameters Using EM: General Case (cont’d)
• Maximization Stepuse Lagrangeoptimization
Estimating Mixture Parameters Using EM: General Case (cont’d)
• Maximization Step (cont’d)
Estimating Mixture Parameters Using EM: General Case (cont’d)
• Summary
Estimating Mixture Parameters Using EM: General Case (cont’d)
• Summary
Estimating the Number of Components K