Probabilistic Inference Lecture 7
description
Transcript of Probabilistic Inference Lecture 7
Probabilistic InferenceLecture 7
M. Pawan [email protected]
Slides available online http://cvc.centrale-ponts.fr/personnel/pawan/
Recap
Loopy Belief Propagation
Initialize all messages to 1
In some order of edges, update messages
Mab;k = Σiψa(li)ψab(li,lk)Πn≠bMna;i
Until Convergence
Rate of changes in messages < threshold
Not Guaranteed !!
Loopy Belief Propagation
B’ab(i,j) =
Normalize to compute beliefs Ba(i), Bab(i,j)
B’a(i) =
ψa(li)ψb(lj)ψab(li,lj)Πn≠bMna;iΠn≠aMnb;j
ψa(li)ΠnMna;i
At convergence Σj Bab(i,j) = Ba(i)
Outline
• Free Energy
• Mean-Field Approximation
• Bethe Approximation
• Kikuchi Approximation
Yedidia, Freeman and Weiss, 2000
Exponential FamilyP(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}
A(θ) : log Z
Probability P(v) =Πa ψa(va) Π(a,b) ψab(va,vb)
Z
ψa(li) : exp(-θa(i)) ψa(li,lk) : exp(-θab(i,k))
Exponential FamilyP(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}
A(θ) : log Z
Probability P(v) =Πa ψa(va) Π(a,b) ψab(va,vb)
Z
ψa(li) : exp(-θa(i)) ψa(li,lk) : exp(-θab(i,k))
Energy Q(v) = Σa θa(va) + Σa,b θab(va,vb)
exp(-Q(v))
Z=
Exponential Family
Probability P(v) =Πa ψa(va) Π(a,b) ψab(va,vb)
Z
exp(-Q(v))
Z=
Approximate probability distribution B(v)
Minimize KL divergence between B(v) and P(v)
B(v) has a simpler form than P(v)
Kullback-Leibler Divergence
D = B(v)P(v)
Σv B(v) log
Kullback-Leibler Divergence
D = Σv B(v) log B(v) - Σv B(v) log P(v)
Kullback-Leibler Divergence
D = Σv B(v) log B(v) + Σv B(v) Q(v)
- (- log Z)
Helmholz free energy
Constant with respect to B
Kullback-Leibler Divergence
Σv B(v) log B(v) + Σv B(v) Q(v)
Negative Entropy U(B)
Kullback-Leibler Divergence
Σv B(v) log B(v) + Σv B(v) Q(v)
Average Energy S(B)
Kullback-Leibler Divergence
Σv B(v) log B(v) + Σv B(v) Q(v)
Gibbs free energy
Outline
• Free Energy
• Mean-Field Approximation
• Bethe Approximation
• Kikuchi Approximation
Simpler Distribution
One-node marginals Ba(i)
Joint probability B(v) = Πa Ba(va)
Average Energy
Σv B(v) Q(v)
Average Energy
Σv B(v) (Σa θa(va) + Σa,b θab(va,vb))
* = Simplify on board !!!
*
Average Energy
Σa Σi Ba(i)θa(i) + Σa,b Σi,k Ba(i)Bb(k)θab(i,k)
Negative Entropy
Σv B(v) log (B(v))*
Negative Entropy
Σa Σi Ba(i)log(Ba(i))
Mean-Field Free Energy
Σa Σi Ba(i)θa(i) + Σa,b Σi,k Ba(i)Bb(k)θab(i,k)
+ Σa Σi Ba(i)log(Ba(i))
Optimization Problem
Σa Σi Ba(i)θa(i) + Σa,b Σi,k Ba(i)Bb(k)θab(i,k)
+ Σa Σi Ba(i)log(Ba(i))
minB
Σi Ba(i) = 1s.t.
*
KKT Condition
log(Ba(i)) = -θa(i) -Σb Σk Bb(k)θab(i,k) + λa-1
Ba(i) = exp(-θa(i) -Σb Σk Bb(k)θab(i,k))/Za
OptimizationInitialize Ba (random, uniform, domain knowledge)
Ba(i) = exp(-θa(i) -Σb Σk Bb(k)θab(i,k))/Za
Set all random variables to unprocessed
Pick an unprocessed random variable Va
If Ba changes, set neighbors to unprocessed
Until Convergence Guaranteed !!
Tutorial: Jaakkola, 2000 (one of several)
Outline
• Free Energy
• Mean-Field Approximation
• Bethe Approximation
• Kikuchi Approximation
Simpler Distribution
One-node marginals Ba(i)
Two-node marginals Bab(i,k)
Joint probability hard to write down
But not for trees
Simpler Distribution
One-node marginals Ba(i)
Two-node marginals Bab(i,k)
B(v) = Πa,b Bab(va,vb)
Πa Ba(va)n(a)-1
Pearl, 1988
n(a) = number of neighbors of Va
Average Energy
Σv B(v) Q(v)
Average Energy
Σv B(v) (Σa θa(va) + Σa,b θab(va,vb))*
Average Energy
Σa Σi Ba(i)θa(i) + Σa,b Σi,k Bab(i,k)θab(i,k) *
Average Energy
-Σa (n(a)-1)Σi Ba(i)θa(i)
+ Σa,b Σi,k Bab(i,k)(θa(i)+θb(k)+θab(i,k))
n(a) = number of neighbors of Va
Negative Entropy
Σv B(v) log (B(v))*
Negative Entropy
-Σa (n(a)-1)Σi Ba(i)log(Ba(i))
+ Σa,b Σi,k Bab(i,k)log(Bab(i,k))
Exact for tree
Approximate for general MRF
Bethe Free Energy
-Σa (n(a)-1)Σi Ba(i)(θa(i)+log(Ba(i)))
+ Σa,b Σi,k Bab(i,k)(θa(i)+θb(k)+θab(i,k)+log(Bab(i,k))
Exact for tree
Approximate for general MRF
Optimization Problem
-Σa (n(a)-1)Σi Ba(i)(θa(i)+log(Ba(i)))minB
Σk Bab(i,k) = Ba(i)
Σi,k Bab(i,k) = 1
Σi Ba(i) = 1
s.t.
*
+ Σa,b Σi,k Bab(i,k)(θa(i)+θb(k)+θab(i,k)+log(Bab(i,k))
KKT Condition
log(Bab(i,k)) = -(θa(i)+θb(k)+θab(i,k)) + λab(k) + λba(i) + μab - 1
λab(k) = log(Mab;k)
Optimization
BP tries to optimize Bethe free energy
But it may not converge
Convergent alternatives exist
Yuille and Rangarajan, 2003
Outline
• Free Energy
• Mean-Field Approximation
• Bethe Approximation
• Kikuchi Approximation
Local Free Energy
V3 V4
V1 V2Cluster of variablesc
Gc = Σvc Bc(vc)(log(Bc(vc)) + Σd “subset of c” θd(vd))
G12 = Σv1,v2 B12(v1,v2)(log(B12(v1,v2)) +
θ1(v1) + θ2(v2) + θ12(v1,v2))
Local Free Energy
V3 V4
V1 V2Cluster of variablesc
Gc = Σvc Bc(vc)(log(Bc(vc)) + Σd “subset of c” θd(vd))
G1 = Σv1 B1(v1)(log(B1(v1)) + θ1(v1))
Local Free Energy
V3 V4
V1 V2Cluster of variablesc
Gc = Σvc Bc(vc)(log(Bc(vc)) + Σd “subset of c” θd(vd))
G12 = Σv1,v2 B12(v1,v2)(log(B1234(v1,v2,v3,v4)) +
θ1(v1) + θ2(v2) + θ3(v3) + θ4(v4) +θ12(v1,v2) + θ13(v1,v3) + θ24(v2,v4) + θ34(v3,v4))
Sum of Local Free Energies
V3 V4
V1 V2
G12 + G13 + G24 + G34
Overcounts G1, G2, G3, G4 once !!!
Sum of free energies of all pairwise clusters
Sum of Local Free Energies
V3 V4
V1 V2
G12 + G13 + G24 + G34
Sum of free energies of all pairwise clusters
- G1 - G2 - G3 - G4
Sum of Local Free Energies
V3 V4
V1 V2
G12 + G13 + G24 + G34
Sum of free energies of all pairwise clusters
- G1 - G2 - G3 - G4
Bethe Approximation !!!
Kikuchi Approximations
V3 V4
V1 V2
G1234
Use bigger clusters
Kikuchi Approximations
V4 V5
V1 V2
G1245 + G2356
Use bigger clusters
V6
V3
- G25
Derive message passing using KKT conditions!
Generalized Belief Propagation
V4 V5
V1 V2
G1245 + G2356
Use bigger clusters
V6
V3
- G25
Derive message passing using KKT conditions!