of 24 /24
Lab 13: Pseudo-likelihood Yacine Jernite December 11, 2014 Yacine Jernite Lab 13: Pseudo-likelihood

others
• Category

## Documents

• view

16

0

Embed Size (px)

### Transcript of Lab 13: Pseudo-likelihoodjernite/ir14/Lab13.pdf · Lab 13: Pseudo-likelihood Yacine Jernite... Lab 13: Pseudo-likelihood

Yacine Jernite

December 11, 2014

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood: definition

Original log-likelihood objective

l(x ; θ) = log(∏i

p(xi |x1,...,i−1; θ))

Definition of the pseudo-likelihood:

lPL(x ; θ) = log(∏i

p(xi |xN(i); θ)) ≈ l(x ; θ)

justification: Markov Blanket

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood: definition

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood: definition

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood: definition

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood: definition

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood: definition

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood: definition

Could have started with a green node

Why not just sum over all nodes?

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood: definition

In a pair-wise Markov Random Field:

p(xi |xN(i); θ) =exp(

∑j∈N(i) θi ,j(xi , xj))∑

u exp(∑

j∈N(i) θi ,j(u, xj))

=1

Zi (xN(i))exp(

∑j∈N(i)

θi ,j(xi , xj))

Now only computing local partition function: complexity is |xi |

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood: definition

Pseudo-likelihood learning is consistent!

For infinite data:

arg maxθ

l(D; θ) = arg maxθ

lPL(D; θ)

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood of an Ising model

Figure : Ising model

θi ,j(xi , xj) = Wi ,jxixj

θi (xi ) = Uixi

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood of an Ising model

Figure : Ising model

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood of an Ising model

Figure : Ising model: two data points

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood of an Ising model

Pseudo-likelihood objective:

D data points: x = (x1, . . . , xD)

N nodes: xd = (xd1 , . . . , xdN)

then:

lPL(x) =D∑

d=1

N∑i=1

log(p(xdi |xdN (i)))

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood of an Ising model

XU

XL XP XR

XD

XU

XL XP XR

XD

Figure : Ising model: two data points

log(p(x1p |x1u , x1r , x1d , x1l )) =?

log(p(x2p |x2u , x2r , x2d , x2l )) =?

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood of an Ising model

log(p(xp|xu, xr , xd , xl)) = xp(Up + Wp,uxu + Wp,rxr + Wp,dxd + Wp,lxl)

− log (exp(Up + Wp,uxu + Wp,rxr + Wp,dxd + Wp,lxl)

+ exp(−(Up + Wp,uxu + Wp,rxr + Wp,dxd + Wp,lxl)))

Or:

p(xp|xu, xr , xd , xl) = σ(xp(Up+Wp,uxu+Wp,rxr +Wp,dxd +Wp,lxl))

Hence:

p(x1p |x1u , x1r , x1d , x1l ) = σ(Up + Wp,u −Wp,r −Wp,d + Wp,l)

p(x2p |x2u , x2r , x2d , x2l ) = σ(−Up + Wp,u + Wp,r −Wp,d + Wp,l)

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood of an Ising model

Figure : Ising model: shared parameters

Wi ,j = W

Ui = U

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood of an Ising model

Then:p(xi |xNi

) = σ(xi (U + W∑j∈Ni

xj))

And we have:

∂ log(p(xi |xNi))

∂W= (1− σ(xi (U + W

∑j∈Ni

xj)))xi∑j∈Ni

xj

∂ log(p(xi |xNi))

∂W= (1− σ(xi (U + W

∑j∈Ni

xj)))xi

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood of an Ising model

Which gives us:

∂lPL∂W

= N×D×∑

u∈{−1,1},v

p̂(x = u, sum(N (x)) = v)(1−σ(u(U+Wv)))uv

∂lPL∂U

= N×D×∑

u∈{−1,1},v

p̂(x = u, sum(N (x)) = v)(1−σ(u(U+Wv)))u

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood of an Ising model

No closed form solution

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood: proof of consistence

Recall:

lPL(x ; θ) = log(∏i

p(xi |xN(i); θ))

And:

p(xi |xN(i); θ) =1

Zi (xN(i))exp(

∑j∈N(i)

θi ,j(xi , xj))

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood: proof of consistence

lPL(x ; θ) =∑i

∑j∈N(i)

θi ,j(xi , xj)− log(Zi (xN(i)))

Hence:

lPL(DM ; θ) =1

M

M∑m=1

lPL(xm; θ)

=∑i

∑j∈N(i)

1

M

M∑m=1

θi ,j(xmi , x

mj )− 1

M

M∑m=1

log(Zi (xmN(i)))

=

∑i

∑j∈N(i)

∑xi ,xj

p̂M(xi , xj)θi ,j(xi , xj)

−∑xN(i)

p̂M(xN(i)) log(Zi (xN(i)))

Yacine Jernite Lab 13: Pseudo-likelihood Pseudo-likelihood: proof of consistence

Decomposes over i , concave in θ

∂ log(Zi (xN(i));θ)

∂θi,j (xi ,xj )= p(xi |xN(i); θ) leads to (see note):

p(xi |xN(i); θ∗PL) = lim

m→∞p̂m(xi |xN(i))

We have seen this property before!

Unique optimum case.

Yacine Jernite Lab 13: Pseudo-likelihood Exercises

M 24.1: Gibbs sampling from a 2D Gaussian

M 11.7: manual calculation of M step for GMM (EMalgorithm)

Yacine Jernite Lab 13: Pseudo-likelihood