Special Topics in Machine Learning Brown University CSCI ...

26
Applied Bayesian Nonparametrics Special Topics in Machine Learning Brown University CSCI 2950-P, Fall 2011 October 11: Variational Methods

Transcript of Special Topics in Machine Learning Brown University CSCI ...

Page 1: Special Topics in Machine Learning Brown University CSCI ...

Applied Bayesian Nonparametrics

Special Topics in Machine Learning Brown University CSCI 2950-P, Fall 2011

October 11: Variational Methods

Page 2: Special Topics in Machine Learning Brown University CSCI ...

Convexity & Jensen s Inequality

f(x)

chord

a x! b

Page 3: Special Topics in Machine Learning Brown University CSCI ...

Lower Bounds on Marginal Likelihood

ln p(X|θ)L(q,θ)

KL(q||p)

C. Bishop, Pattern Recognition & Machine Learning

Page 4: Special Topics in Machine Learning Brown University CSCI ...

Expectation Maximization Algorithm

C. Bishop, Pattern Recognition & Machine Learning

ln p(X|θold)L(q,θold)

KL(q||p) = 0

ln p(X|θnew)L(q,θnew)

KL(q||p)

E Step: Optimize distribution on hidden variables given parameters

M Step: Optimize parameters given distribution on hidden variables

Page 5: Special Topics in Machine Learning Brown University CSCI ...

EM: A Sequence of Lower Bounds

C. Bishop, Pattern Recognition & Machine Learning

θold θnew

L (q, θ)

ln p(X|θ)

Page 6: Special Topics in Machine Learning Brown University CSCI ...

Fitting Gaussian Mixtures

C. Bishop, Pattern Recognition & Machine Learning

(a)

0 0.5 1

0

0.5

1 (b)

0 0.5 1

0

0.5

1

Complete Data Labeled by True Cluster Assignments

Incomplete Data: Points to be Clustered

Page 7: Special Topics in Machine Learning Brown University CSCI ...

Posterior Assignment Probabilities

C. Bishop, Pattern Recognition & Machine Learning

(b)

0 0.5 1

0

0.5

1

Posterior Probabilities of Assignment to Each Cluster

Incomplete Data: Points to be Clustered

(c)

0 0.5 1

0

0.5

1

Page 8: Special Topics in Machine Learning Brown University CSCI ...

EM Algorithm

C. Bishop, Pattern Recognition & Machine Learning

(a)!2 0 2

!2

0

2

Page 9: Special Topics in Machine Learning Brown University CSCI ...

EM Algorithm

C. Bishop, Pattern Recognition & Machine Learning

(b)!2 0 2

!2

0

2

Page 10: Special Topics in Machine Learning Brown University CSCI ...

EM Algorithm

C. Bishop, Pattern Recognition & Machine Learning

(c)

L = 1

!2 0 2

!2

0

2

Page 11: Special Topics in Machine Learning Brown University CSCI ...

EM Algorithm

C. Bishop, Pattern Recognition & Machine Learning

(d)

L = 2

!2 0 2

!2

0

2

Page 12: Special Topics in Machine Learning Brown University CSCI ...

EM Algorithm

C. Bishop, Pattern Recognition & Machine Learning

(e)

L = 5

!2 0 2

!2

0

2

Page 13: Special Topics in Machine Learning Brown University CSCI ...

EM Algorithm

C. Bishop, Pattern Recognition & Machine Learning

(f)

L = 20

!2 0 2

!2

0

2

Page 14: Special Topics in Machine Learning Brown University CSCI ...

Pairwise Markov Random Fields

•! Product of arbitrary positive clique potential functions

•! Guaranteed Markov with respect to corresponding graph

set of nodes

set of edges connecting nodes normalization constant (partition function)

Page 15: Special Topics in Machine Learning Brown University CSCI ...

Markov Chain Factorizations

Page 16: Special Topics in Machine Learning Brown University CSCI ...

Energy Functions

Interpretation and terminology from statistical physics

Page 17: Special Topics in Machine Learning Brown University CSCI ...

Approximate Inference Framework

•!Choose a family of approximating distributions which is tractable. The simplest example:

•!Define a distance to measure the quality of different approximations. Two possibilities:

•!Find the approximation minimizing this distance

Page 18: Special Topics in Machine Learning Brown University CSCI ...

Fully Factored Approximations

•!Trivially minimized by setting

•!Doesn’t provide a computational method!

Marginal Entropies

Joint Entropy

Page 19: Special Topics in Machine Learning Brown University CSCI ...

Variational Approximations

(Multiply by one)

(Jensen’s inequality)

•!Minimizing KL divergence maximizes a lower bound on the data likelihood

Page 20: Special Topics in Machine Learning Brown University CSCI ...

Free Energies

Negative Entropy

Average Energy

Gibbs Free Energy

Normalization

•!Free energies equivalent to KL divergence, up to a normalization constant

Page 21: Special Topics in Machine Learning Brown University CSCI ...

Mean Field Free Energy

Page 22: Special Topics in Machine Learning Brown University CSCI ...

Mean Field Equations

•!Add Lagrange multipliers to enforce

•!Taking derivatives and simplifying, we find a set of fixed point equations:

•!Updating one marginal at a time gives convergent coordinate descent

Page 23: Special Topics in Machine Learning Brown University CSCI ...

Mean Field Message Passing

Want products of messages to be simple

Want expectations of log potential functions to be simple

Page 24: Special Topics in Machine Learning Brown University CSCI ...

Exponential Families •! Natural or canonical parameters determine log-linear

combination of sufficient statistics:

•! Log partition function normalizes to produce valid probability distribution:

Page 25: Special Topics in Machine Learning Brown University CSCI ...

Directed Mean Field

•! Can derive updates using exponential family form of the conditional distribution of each variable, given its parents

•! Can also just take derivatives, collect terms, simplify!

Variational Message Passing, Winn &

Bishop, JMLR 2005

Page 26: Special Topics in Machine Learning Brown University CSCI ...

Structured Mean Field

Original Graph Naïve Mean Field Structured Mean Field

•!Any subgraph for which inference is tractable leads to a mean field style approximation for which the update equations are tractable