# Beyond Junction Trees: Loopy Graphical jebara/6772/notes/ · Loopy Belief Propagation

date post

08-Sep-2018Category

## Documents

view

212download

0

Embed Size (px)

### Transcript of Beyond Junction Trees: Loopy Graphical jebara/6772/notes/ · Loopy Belief Propagation

Beyond Junction Trees: Loopy Graphical Models

Tony Jebara

April 16, 2013

Graphical Models

x1 x2 x3 x4

x5

x6

Figure: p(X ) = 1Z1,2(x1, x2)2,3(x2, x3)3,4,5(x3, x4, x5)4,5,6(x4, x5, x6).

Key for efficient inference, learning, structured prediction

A compact way of representing p(X ) = p(x1, . . . , xk)

We want p(xi ) =

X\xip(X ) or xi where p(X

) p(X )Factor graph: a more precise way to specify an undirectedgraphical model

Factor Graphs

x1 x2 x3 x4

x5

x6

Figure: p(X ) = 1Z1,2(x1, x2)2,3(x2, x3)3,4,5(x3, x4, x5)4,5,6(x4, x5, x6).

A bipartite graph G with variable vertices V = {1, . . . , k},factor vertices W = {1, . . . , l} and edges E between V and EWith a universe of discrete random variables X = {x1, . . . , xk}each associated with an element of V

With a set of strictly positive potential functions = {1, . . . , l} each associated with an element of WFactor graph represents p(x1, . . . , xk) =

1Z

cW c (Xc)where Xc are variables associated with neighbors of c

Junction Tree Algorithm

Given a graphical model representing p(X ) = p(x1, . . . , xk)

Junction Tree Algorithm exactly finds p(xi) or xi via:

MoralizeTriangulateConnect cliques into junction tree (Kruskal)Send messages via Collect and Distribute

Caveat: messages are exponential in clique size

Problem: triangulation create giant cliques in many graphicalmodels and structured prediction problems!

Belief Propagation Messages

The messages in the Junction Tree Algorithm basically are:

A message from a variable node v to a factor node u is theproduct of the messages from all other neighbouring factornodes (except the recipient)

vu(xv ) =

uNe(v)\{u}

uv (xv )

A message from a factor node u to a variable node v is theproduct of the factor with messages from all other nodes,marginalised over all variables except xv :

uv(xv ) =

X u :xv=xv

u(Xu)

vNe(u)\{v}

vu(xv)

Upon convergence p(Xu) u(Xu)

vNe(u) vu(xu)

Loopy Belief Propagation

These messages lead to exact inference for trees!

To get ArgMax junction tree, replace sum with max

What happens if we skip triangulation and dont have a tree?

We can still run and hope for the best!

This is called loopy belief propagation

Used in turbo coding, image processing, protein folding, etc.

It works remarkably well in many experiments!

We can also prove it will work... on PERFECT GRAPHS!

This includes trees, matchings, associative graphical models

Loopy Belief Propagation Experiments

Alarm Network and Results

Loopy Belief Propagation Experiments

Pyramid Network and Results

Loopy Belief Propagation Experiments

QMR-DT Network and Results

Canonical Problems for Graphical Models

Given a factorized distribution

p(x1, . . . , xn) =1

Z

cC

c(Xc)

Inference: recover various marginals like p(xi) or p(xi , xj)

p(xi) =

x1

xi1

xi+1

xn

p(x1, . . . , xn)

Estimation: find most likely configurations xi or (x1 , . . . , x

n )

xi = arg maxxi

maxx1

maxxi1

maxxi+1

maxxn

p(x1, . . . , xn)

In general both are NP-hard, but for chains and trees just P

Example for Chain

Given a chain-factorized distribution

p(x1, . . . , x5) =1

Z(x1, x2)(x2, x3)(x3, x4)(x4, x5)

Inference: recover various marginals like p(xi) or p(xi , xj)

p(x5)

x1

x2

(x1, x2)

x3

(x2, x3)

x4

(x3, x4)(x4, x5)

Estimation: find most likely configurations xi or (x1 , . . . , x

n )

x5 = arg maxx5

maxx1

maxx2

(x1, x2)maxx3

(x2, x3)maxx4

(x3, x4)(x4, x5)

The work distributes and becomes efficient!

Canonical Problems on Trees

The idea of distributed computation extends nicely to trees

On trees (which subsume chains) do a collect/distribute step

Alternatively, can perform distributed updates asynchronously

Each step is a sum-product or a max-product update

Canonical Problems on Trees

The idea of distributed computation extends nicely to trees

On trees (which subsume chains) do a collect/distribute step

Alternatively, can perform distributed updates asynchronously

Each step is a sum-product or a max-product update

MAP Estimation

Lets focus on finding most probable configurations efficiently

X = arg max p(x1, . . . , xn)

Useful for image processing, protein folding, coding, etc.

Brute force requiresn

i=1 |xi |Efficient for trees and singly linked graphs (Pearl 1988)

NP-hard for general graphs (Shimony 1994)

Loopy Max Product Message Passing

1. For each xi to each Xc : mt+1ic =

dNe(i)\c mtdi

2. For each Xc to each xi : mt+1ci = maxXc\xi c(Xc)

jc\i mtjc

3. Set t = t + 1 and goto 1 until convergence4. Output xi = arg maxxi

dNe(i) mtdi

Simple and fast algorithm for MAP

Performs well in practice for images, turbo codes, etc.

Exact for trees (Pearl 1988)

Exact for matching (Bayati et al. 2005)

Exact for generalized b matching (Huang and J 2007)

Bipartite Matching

Motorola Apple IBM

laptop 0$ 2$ 2$server 0$ 2$ 3$phone 2$ 3$ 0$

C =

0 1 00 0 11 0 0

GivenW , maxCBnn

ij WijCij such that

i Cij =

j Cij = 1

Classical Hungarian marriage problem O(n3)

Creates a very loopy graphical model

Max product takes O(n3) for exact MAP (Bayati et al. 2005)

Bipartite Generalized Matching

Motorola Apple IBM

laptop 0$ 2$ 2$server 0$ 2$ 3$phone 2$ 3$ 0$

C =

0 1 11 0 11 1 0

GivenW , maxCBnn

ij WijCij such that

i Cij =

j Cij = b

Combinatorial b-matching problem O(bn3), (Google Adwords)

Creates a very loopy graphical model

Max product takes O(bn3) for exact MAP (Huang & J 2007)

Bipartite Generalized Matching

u1 u2 u3 u4

v1 v2 v3 v4

Graph G = (U,V ,E ) with U = {u1, . . . , un} andV = {v1, . . . , vn} and M(.), a set of neighbors of node ui or vjDefine xi X and yi Y where xi = M(ui ) and yi = M(vj)Then p(X ,Y ) = 1

Z

i

j (xi , yj )

k (xk)(yk) where(yj ) = exp(

uiyjWij) and (xi , yj ) = (vj xi ui yj)

Bipartite Generalized Matching

Theorem (Huang & J 2007)

Max product on G converges in O(bn3) time.

Proof.

Form unwrapped tree T of depth (n), maximizing belief at rootof T is equivalent to maximizing belief at corresponding node in G

u1

v1 v2 v3 v4

u2 u2 u2 u2u3 u3 u3 u3u4 u4 u4 u4

Theorem (Salez & Shah 2009)

Under mild assumptions, max product 1-matching is O(n2).

Bipartite Generalized Matching

Code at http://www.cs.columbia.edu/jebara/code

Generalized Matching

2040

50100

0

0.05

0.1

0.15

b

BP median running time

n

t

2040

50100

0

50

100

150

b

GOBLIN median running time

n

t

20 40 60 80 1000

1

2

3

n

t1/3

Median Running time when B=5

20 40 60 80 1000

1

2

3

4

n

t1/4

Median Running time when B= n/2

BPGOBLIN

BPGOBLIN

Applications:unipartite matchingclustering (J & S 2006)classification (H & J 2007)collaborative filtering (H & J 2008)semisupervised (J et al. 2009)visualization (S & J 2009)

Max product is O(n2) on dense graphs (Salez & Shah 2009)Much faster than other solvers

Unipartite Generalized Matching

Above is k-nearest neighbors with k = 2

Unipartite Generalized Matching

Above is unipartite b-matching with b = 2

Unipartite Generalized Matching

Left is k-nearest neighbors, right is unipartite b-matching.

Unipartite Generalized Matching

p1 p2 p3 p4p1 0 2 1 2P2 2 0 2 1p3 1 2 0 2p4 2 1 2 0

C =

0 1 0 11 0 1 00 1 0 11 0 1 0

maxCBnn,Cii=0

ij WijCij such that

i Cij = b,Cij = Cji

Combinatorial unipartite matching is efficient (Edmonds 1965)

Makes an LP with exponentially many blossom inequalities

Max product exact if LP is integral (Sanghavi et al. 2008)

p(X ) =

iV (

jNe(i) xij 1)

ijE exp(Wijxij)

From Integral LP to Perfect Graphs

Max product and exact MAP depend on the LPs integrality

Matchings have special integral LPs (Edmonds 1965)

How to generalize beyond matchings?

Perfect graphs guarantee LP integrality!

Whats a Perfect Graph?

In 1960, Claude Berge introduces perfect graphs and twoconjectures

Perfect iff every induced subgraph has clique # = coloring #

Weak conjecture: G is perfect iff its complement is perfectStrong conjecture: all perfect graphs are Berge graphs

Weak perfect graph theorem (Lovasz 1972)

Link between perfection and integral LPs (Lovasz 1972)

Strong perfect graph theorem (SPGT) open for 4+ decades

Whats a Perfect Graph?

SPGT Proof (Chudnovsky, Robertson, Seymour, Thomas 2003)

Berge passes away shortly after hearing of the proof

Many NP-hard problems are polynomially solvable for perfectgraphs:

Graph coloringMaximum cliqueMaximum independent set

Whats a Perfect Graph?

x1

x2x3

x4x5

x1

x2x3

x4x5

Recommended

*View more*