Probabilistic Inference Lecture 7

48
Probabilistic Inference Lecture 7 M. Pawan Kumar [email protected] es available online http://cvc.centrale-ponts.fr/personnel/pa

description

Probabilistic Inference Lecture 7. M. Pawan Kumar [email protected]. Slides available online http:// cvc.centrale-ponts.fr /personnel/ pawan /. Recap. Loopy Belief Propagation. Initialize all messages to 1. In some order of edges, update messages. - PowerPoint PPT Presentation

Transcript of Probabilistic Inference Lecture 7

Page 1: Probabilistic Inference Lecture 7

Probabilistic InferenceLecture 7

M. Pawan [email protected]

Slides available online http://cvc.centrale-ponts.fr/personnel/pawan/

Page 2: Probabilistic Inference Lecture 7

Recap

Page 3: Probabilistic Inference Lecture 7

Loopy Belief Propagation

Initialize all messages to 1

In some order of edges, update messages

Mab;k = Σiψa(li)ψab(li,lk)Πn≠bMna;i

Until Convergence

Rate of changes in messages < threshold

Not Guaranteed !!

Page 4: Probabilistic Inference Lecture 7

Loopy Belief Propagation

B’ab(i,j) =

Normalize to compute beliefs Ba(i), Bab(i,j)

B’a(i) =

ψa(li)ψb(lj)ψab(li,lj)Πn≠bMna;iΠn≠aMnb;j

ψa(li)ΠnMna;i

At convergence Σj Bab(i,j) = Ba(i)

Page 5: Probabilistic Inference Lecture 7

Outline

• Free Energy

• Mean-Field Approximation

• Bethe Approximation

• Kikuchi Approximation

Yedidia, Freeman and Weiss, 2000

Page 6: Probabilistic Inference Lecture 7

Exponential FamilyP(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}

A(θ) : log Z

Probability P(v) =Πa ψa(va) Π(a,b) ψab(va,vb)

Z

ψa(li) : exp(-θa(i)) ψa(li,lk) : exp(-θab(i,k))

Page 7: Probabilistic Inference Lecture 7

Exponential FamilyP(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}

A(θ) : log Z

Probability P(v) =Πa ψa(va) Π(a,b) ψab(va,vb)

Z

ψa(li) : exp(-θa(i)) ψa(li,lk) : exp(-θab(i,k))

Energy Q(v) = Σa θa(va) + Σa,b θab(va,vb)

exp(-Q(v))

Z=

Page 8: Probabilistic Inference Lecture 7

Exponential Family

Probability P(v) =Πa ψa(va) Π(a,b) ψab(va,vb)

Z

exp(-Q(v))

Z=

Approximate probability distribution B(v)

Minimize KL divergence between B(v) and P(v)

B(v) has a simpler form than P(v)

Page 9: Probabilistic Inference Lecture 7

Kullback-Leibler Divergence

D = B(v)P(v)

Σv B(v) log

Page 10: Probabilistic Inference Lecture 7

Kullback-Leibler Divergence

D = Σv B(v) log B(v) - Σv B(v) log P(v)

Page 11: Probabilistic Inference Lecture 7

Kullback-Leibler Divergence

D = Σv B(v) log B(v) + Σv B(v) Q(v)

- (- log Z)

Helmholz free energy

Constant with respect to B

Page 12: Probabilistic Inference Lecture 7

Kullback-Leibler Divergence

Σv B(v) log B(v) + Σv B(v) Q(v)

Negative Entropy U(B)

Page 13: Probabilistic Inference Lecture 7

Kullback-Leibler Divergence

Σv B(v) log B(v) + Σv B(v) Q(v)

Average Energy S(B)

Page 14: Probabilistic Inference Lecture 7

Kullback-Leibler Divergence

Σv B(v) log B(v) + Σv B(v) Q(v)

Gibbs free energy

Page 15: Probabilistic Inference Lecture 7

Outline

• Free Energy

• Mean-Field Approximation

• Bethe Approximation

• Kikuchi Approximation

Page 16: Probabilistic Inference Lecture 7

Simpler Distribution

One-node marginals Ba(i)

Joint probability B(v) = Πa Ba(va)

Page 17: Probabilistic Inference Lecture 7

Average Energy

Σv B(v) Q(v)

Page 18: Probabilistic Inference Lecture 7

Average Energy

Σv B(v) (Σa θa(va) + Σa,b θab(va,vb))

* = Simplify on board !!!

*

Page 19: Probabilistic Inference Lecture 7

Average Energy

Σa Σi Ba(i)θa(i) + Σa,b Σi,k Ba(i)Bb(k)θab(i,k)

Page 20: Probabilistic Inference Lecture 7

Negative Entropy

Σv B(v) log (B(v))*

Page 21: Probabilistic Inference Lecture 7

Negative Entropy

Σa Σi Ba(i)log(Ba(i))

Page 22: Probabilistic Inference Lecture 7

Mean-Field Free Energy

Σa Σi Ba(i)θa(i) + Σa,b Σi,k Ba(i)Bb(k)θab(i,k)

+ Σa Σi Ba(i)log(Ba(i))

Page 23: Probabilistic Inference Lecture 7

Optimization Problem

Σa Σi Ba(i)θa(i) + Σa,b Σi,k Ba(i)Bb(k)θab(i,k)

+ Σa Σi Ba(i)log(Ba(i))

minB

Σi Ba(i) = 1s.t.

*

Page 24: Probabilistic Inference Lecture 7

KKT Condition

log(Ba(i)) = -θa(i) -Σb Σk Bb(k)θab(i,k) + λa-1

Ba(i) = exp(-θa(i) -Σb Σk Bb(k)θab(i,k))/Za

Page 25: Probabilistic Inference Lecture 7

OptimizationInitialize Ba (random, uniform, domain knowledge)

Ba(i) = exp(-θa(i) -Σb Σk Bb(k)θab(i,k))/Za

Set all random variables to unprocessed

Pick an unprocessed random variable Va

If Ba changes, set neighbors to unprocessed

Until Convergence Guaranteed !!

Tutorial: Jaakkola, 2000 (one of several)

Page 26: Probabilistic Inference Lecture 7

Outline

• Free Energy

• Mean-Field Approximation

• Bethe Approximation

• Kikuchi Approximation

Page 27: Probabilistic Inference Lecture 7

Simpler Distribution

One-node marginals Ba(i)

Two-node marginals Bab(i,k)

Joint probability hard to write down

But not for trees

Page 28: Probabilistic Inference Lecture 7

Simpler Distribution

One-node marginals Ba(i)

Two-node marginals Bab(i,k)

B(v) = Πa,b Bab(va,vb)

Πa Ba(va)n(a)-1

Pearl, 1988

n(a) = number of neighbors of Va

Page 29: Probabilistic Inference Lecture 7

Average Energy

Σv B(v) Q(v)

Page 30: Probabilistic Inference Lecture 7

Average Energy

Σv B(v) (Σa θa(va) + Σa,b θab(va,vb))*

Page 31: Probabilistic Inference Lecture 7

Average Energy

Σa Σi Ba(i)θa(i) + Σa,b Σi,k Bab(i,k)θab(i,k) *

Page 32: Probabilistic Inference Lecture 7

Average Energy

-Σa (n(a)-1)Σi Ba(i)θa(i)

+ Σa,b Σi,k Bab(i,k)(θa(i)+θb(k)+θab(i,k))

n(a) = number of neighbors of Va

Page 33: Probabilistic Inference Lecture 7

Negative Entropy

Σv B(v) log (B(v))*

Page 34: Probabilistic Inference Lecture 7

Negative Entropy

-Σa (n(a)-1)Σi Ba(i)log(Ba(i))

+ Σa,b Σi,k Bab(i,k)log(Bab(i,k))

Exact for tree

Approximate for general MRF

Page 35: Probabilistic Inference Lecture 7

Bethe Free Energy

-Σa (n(a)-1)Σi Ba(i)(θa(i)+log(Ba(i)))

+ Σa,b Σi,k Bab(i,k)(θa(i)+θb(k)+θab(i,k)+log(Bab(i,k))

Exact for tree

Approximate for general MRF

Page 36: Probabilistic Inference Lecture 7

Optimization Problem

-Σa (n(a)-1)Σi Ba(i)(θa(i)+log(Ba(i)))minB

Σk Bab(i,k) = Ba(i)

Σi,k Bab(i,k) = 1

Σi Ba(i) = 1

s.t.

*

+ Σa,b Σi,k Bab(i,k)(θa(i)+θb(k)+θab(i,k)+log(Bab(i,k))

Page 37: Probabilistic Inference Lecture 7

KKT Condition

log(Bab(i,k)) = -(θa(i)+θb(k)+θab(i,k)) + λab(k) + λba(i) + μab - 1

λab(k) = log(Mab;k)

Page 38: Probabilistic Inference Lecture 7

Optimization

BP tries to optimize Bethe free energy

But it may not converge

Convergent alternatives exist

Yuille and Rangarajan, 2003

Page 39: Probabilistic Inference Lecture 7

Outline

• Free Energy

• Mean-Field Approximation

• Bethe Approximation

• Kikuchi Approximation

Page 40: Probabilistic Inference Lecture 7

Local Free Energy

V3 V4

V1 V2Cluster of variablesc

Gc = Σvc Bc(vc)(log(Bc(vc)) + Σd “subset of c” θd(vd))

G12 = Σv1,v2 B12(v1,v2)(log(B12(v1,v2)) +

θ1(v1) + θ2(v2) + θ12(v1,v2))

Page 41: Probabilistic Inference Lecture 7

Local Free Energy

V3 V4

V1 V2Cluster of variablesc

Gc = Σvc Bc(vc)(log(Bc(vc)) + Σd “subset of c” θd(vd))

G1 = Σv1 B1(v1)(log(B1(v1)) + θ1(v1))

Page 42: Probabilistic Inference Lecture 7

Local Free Energy

V3 V4

V1 V2Cluster of variablesc

Gc = Σvc Bc(vc)(log(Bc(vc)) + Σd “subset of c” θd(vd))

G12 = Σv1,v2 B12(v1,v2)(log(B1234(v1,v2,v3,v4)) +

θ1(v1) + θ2(v2) + θ3(v3) + θ4(v4) +θ12(v1,v2) + θ13(v1,v3) + θ24(v2,v4) + θ34(v3,v4))

Page 43: Probabilistic Inference Lecture 7

Sum of Local Free Energies

V3 V4

V1 V2

G12 + G13 + G24 + G34

Overcounts G1, G2, G3, G4 once !!!

Sum of free energies of all pairwise clusters

Page 44: Probabilistic Inference Lecture 7

Sum of Local Free Energies

V3 V4

V1 V2

G12 + G13 + G24 + G34

Sum of free energies of all pairwise clusters

- G1 - G2 - G3 - G4

Page 45: Probabilistic Inference Lecture 7

Sum of Local Free Energies

V3 V4

V1 V2

G12 + G13 + G24 + G34

Sum of free energies of all pairwise clusters

- G1 - G2 - G3 - G4

Bethe Approximation !!!

Page 46: Probabilistic Inference Lecture 7

Kikuchi Approximations

V3 V4

V1 V2

G1234

Use bigger clusters

Page 47: Probabilistic Inference Lecture 7

Kikuchi Approximations

V4 V5

V1 V2

G1245 + G2356

Use bigger clusters

V6

V3

- G25

Derive message passing using KKT conditions!

Page 48: Probabilistic Inference Lecture 7

Generalized Belief Propagation

V4 V5

V1 V2

G1245 + G2356

Use bigger clusters

V6

V3

- G25

Derive message passing using KKT conditions!