Lab 13: Pseudo-likelihoodjernite/ir14/Lab13.pdf · Lab 13: Pseudo-likelihood Yacine Jernite...
Embed Size (px)
Transcript of Lab 13: Pseudo-likelihoodjernite/ir14/Lab13.pdf · Lab 13: Pseudo-likelihood Yacine Jernite...

Lab 13: Pseudo-likelihood
Yacine Jernite
December 11, 2014
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood: definition
Original log-likelihood objective
l(x ; θ) = log(∏i
p(xi |x1,...,i−1; θ))
Definition of the pseudo-likelihood:
lPL(x ; θ) = log(∏i
p(xi |xN(i); θ)) ≈ l(x ; θ)
justification: Markov Blanket
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood: definition
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood: definition
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood: definition
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood: definition
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood: definition
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood: definition
Could have started with a green node
Why not just sum over all nodes?
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood: definition
In a pair-wise Markov Random Field:
p(xi |xN(i); θ) =exp(
∑j∈N(i) θi ,j(xi , xj))∑
u exp(∑
j∈N(i) θi ,j(u, xj))
=1
Zi (xN(i))exp(
∑j∈N(i)
θi ,j(xi , xj))
Now only computing local partition function: complexity is |xi |
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood: definition
Pseudo-likelihood learning is consistent!
For infinite data:
arg maxθ
l(D; θ) = arg maxθ
lPL(D; θ)
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood of an Ising model
Figure : Ising model
θi ,j(xi , xj) = Wi ,jxixj
θi (xi ) = Uixi
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood of an Ising model
Figure : Ising model
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood of an Ising model
Figure : Ising model: two data points
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood of an Ising model
Pseudo-likelihood objective:
D data points: x = (x1, . . . , xD)
N nodes: xd = (xd1 , . . . , xdN)
then:
lPL(x) =D∑
d=1
N∑i=1
log(p(xdi |xdN (i)))
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood of an Ising model
XU
XL XP XR
XD
XU
XL XP XR
XD
Figure : Ising model: two data points
log(p(x1p |x1u , x1r , x1d , x1l )) =?
log(p(x2p |x2u , x2r , x2d , x2l )) =?
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood of an Ising model
log(p(xp|xu, xr , xd , xl)) = xp(Up + Wp,uxu + Wp,rxr + Wp,dxd + Wp,lxl)
− log (exp(Up + Wp,uxu + Wp,rxr + Wp,dxd + Wp,lxl)
+ exp(−(Up + Wp,uxu + Wp,rxr + Wp,dxd + Wp,lxl)))
Or:
p(xp|xu, xr , xd , xl) = σ(xp(Up+Wp,uxu+Wp,rxr +Wp,dxd +Wp,lxl))
Hence:
p(x1p |x1u , x1r , x1d , x1l ) = σ(Up + Wp,u −Wp,r −Wp,d + Wp,l)
p(x2p |x2u , x2r , x2d , x2l ) = σ(−Up + Wp,u + Wp,r −Wp,d + Wp,l)
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood of an Ising model
Figure : Ising model: shared parameters
Wi ,j = W
Ui = U
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood of an Ising model
Then:p(xi |xNi
) = σ(xi (U + W∑j∈Ni
xj))
And we have:
∂ log(p(xi |xNi))
∂W= (1− σ(xi (U + W
∑j∈Ni
xj)))xi∑j∈Ni
xj
∂ log(p(xi |xNi))
∂W= (1− σ(xi (U + W
∑j∈Ni
xj)))xi
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood of an Ising model
Which gives us:
∂lPL∂W
= N×D×∑
u∈{−1,1},v
p̂(x = u, sum(N (x)) = v)(1−σ(u(U+Wv)))uv
∂lPL∂U
= N×D×∑
u∈{−1,1},v
p̂(x = u, sum(N (x)) = v)(1−σ(u(U+Wv)))u
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood of an Ising model
No closed form solution
gradient easy to compute
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood: proof of consistence
Recall:
lPL(x ; θ) = log(∏i
p(xi |xN(i); θ))
And:
p(xi |xN(i); θ) =1
Zi (xN(i))exp(
∑j∈N(i)
θi ,j(xi , xj))
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood: proof of consistence
lPL(x ; θ) =∑i
∑j∈N(i)
θi ,j(xi , xj)− log(Zi (xN(i)))
Hence:
lPL(DM ; θ) =1
M
M∑m=1
lPL(xm; θ)
=∑i
∑j∈N(i)
1
M
M∑m=1
θi ,j(xmi , x
mj )− 1
M
M∑m=1
log(Zi (xmN(i)))
=
∑i
∑j∈N(i)
∑xi ,xj
p̂M(xi , xj)θi ,j(xi , xj)
−∑xN(i)
p̂M(xN(i)) log(Zi (xN(i)))
Yacine Jernite Lab 13: Pseudo-likelihood

Pseudo-likelihood: proof of consistence
Decomposes over i , concave in θ
∂ log(Zi (xN(i));θ)
∂θi,j (xi ,xj )= p(xi |xN(i); θ) leads to (see note):
p(xi |xN(i); θ∗PL) = lim
m→∞p̂m(xi |xN(i))
We have seen this property before!
Unique optimum case.
Yacine Jernite Lab 13: Pseudo-likelihood

Exercises
M 24.1: Gibbs sampling from a 2D Gaussian
M 11.7: manual calculation of M step for GMM (EMalgorithm)
Yacine Jernite Lab 13: Pseudo-likelihood