Beyond Junction Trees: Loopy Graphical jebara/6772/notes/  · Loopy Belief Propagation

download Beyond Junction Trees: Loopy Graphical jebara/6772/notes/  · Loopy Belief Propagation

of 59

  • date post

    08-Sep-2018
  • Category

    Documents

  • view

    212
  • download

    0

Embed Size (px)

Transcript of Beyond Junction Trees: Loopy Graphical jebara/6772/notes/  · Loopy Belief Propagation

  • Beyond Junction Trees: Loopy Graphical Models

    Tony Jebara

    April 16, 2013

  • Graphical Models

    x1 x2 x3 x4

    x5

    x6

    Figure: p(X ) = 1Z1,2(x1, x2)2,3(x2, x3)3,4,5(x3, x4, x5)4,5,6(x4, x5, x6).

    Key for efficient inference, learning, structured prediction

    A compact way of representing p(X ) = p(x1, . . . , xk)

    We want p(xi ) =

    X\xip(X ) or xi where p(X

    ) p(X )Factor graph: a more precise way to specify an undirectedgraphical model

  • Factor Graphs

    x1 x2 x3 x4

    x5

    x6

    Figure: p(X ) = 1Z1,2(x1, x2)2,3(x2, x3)3,4,5(x3, x4, x5)4,5,6(x4, x5, x6).

    A bipartite graph G with variable vertices V = {1, . . . , k},factor vertices W = {1, . . . , l} and edges E between V and EWith a universe of discrete random variables X = {x1, . . . , xk}each associated with an element of V

    With a set of strictly positive potential functions = {1, . . . , l} each associated with an element of WFactor graph represents p(x1, . . . , xk) =

    1Z

    cW c (Xc)where Xc are variables associated with neighbors of c

  • Junction Tree Algorithm

    Given a graphical model representing p(X ) = p(x1, . . . , xk)

    Junction Tree Algorithm exactly finds p(xi) or xi via:

    MoralizeTriangulateConnect cliques into junction tree (Kruskal)Send messages via Collect and Distribute

    Caveat: messages are exponential in clique size

    Problem: triangulation create giant cliques in many graphicalmodels and structured prediction problems!

  • Belief Propagation Messages

    The messages in the Junction Tree Algorithm basically are:

    A message from a variable node v to a factor node u is theproduct of the messages from all other neighbouring factornodes (except the recipient)

    vu(xv ) =

    uNe(v)\{u}

    uv (xv )

    A message from a factor node u to a variable node v is theproduct of the factor with messages from all other nodes,marginalised over all variables except xv :

    uv(xv ) =

    X u :xv=xv

    u(Xu)

    vNe(u)\{v}

    vu(xv)

    Upon convergence p(Xu) u(Xu)

    vNe(u) vu(xu)

  • Loopy Belief Propagation

    These messages lead to exact inference for trees!

    To get ArgMax junction tree, replace sum with max

    What happens if we skip triangulation and dont have a tree?

    We can still run and hope for the best!

    This is called loopy belief propagation

    Used in turbo coding, image processing, protein folding, etc.

    It works remarkably well in many experiments!

    We can also prove it will work... on PERFECT GRAPHS!

    This includes trees, matchings, associative graphical models

  • Loopy Belief Propagation Experiments

    Alarm Network and Results

  • Loopy Belief Propagation Experiments

    Pyramid Network and Results

  • Loopy Belief Propagation Experiments

    QMR-DT Network and Results

  • Canonical Problems for Graphical Models

    Given a factorized distribution

    p(x1, . . . , xn) =1

    Z

    cC

    c(Xc)

    Inference: recover various marginals like p(xi) or p(xi , xj)

    p(xi) =

    x1

    xi1

    xi+1

    xn

    p(x1, . . . , xn)

    Estimation: find most likely configurations xi or (x1 , . . . , x

    n )

    xi = arg maxxi

    maxx1

    maxxi1

    maxxi+1

    maxxn

    p(x1, . . . , xn)

    In general both are NP-hard, but for chains and trees just P

  • Example for Chain

    Given a chain-factorized distribution

    p(x1, . . . , x5) =1

    Z(x1, x2)(x2, x3)(x3, x4)(x4, x5)

    Inference: recover various marginals like p(xi) or p(xi , xj)

    p(x5)

    x1

    x2

    (x1, x2)

    x3

    (x2, x3)

    x4

    (x3, x4)(x4, x5)

    Estimation: find most likely configurations xi or (x1 , . . . , x

    n )

    x5 = arg maxx5

    maxx1

    maxx2

    (x1, x2)maxx3

    (x2, x3)maxx4

    (x3, x4)(x4, x5)

    The work distributes and becomes efficient!

  • Canonical Problems on Trees

    The idea of distributed computation extends nicely to trees

    On trees (which subsume chains) do a collect/distribute step

    Alternatively, can perform distributed updates asynchronously

    Each step is a sum-product or a max-product update

  • Canonical Problems on Trees

    The idea of distributed computation extends nicely to trees

    On trees (which subsume chains) do a collect/distribute step

    Alternatively, can perform distributed updates asynchronously

    Each step is a sum-product or a max-product update

  • MAP Estimation

    Lets focus on finding most probable configurations efficiently

    X = arg max p(x1, . . . , xn)

    Useful for image processing, protein folding, coding, etc.

    Brute force requiresn

    i=1 |xi |Efficient for trees and singly linked graphs (Pearl 1988)

    NP-hard for general graphs (Shimony 1994)

  • Loopy Max Product Message Passing

    1. For each xi to each Xc : mt+1ic =

    dNe(i)\c mtdi

    2. For each Xc to each xi : mt+1ci = maxXc\xi c(Xc)

    jc\i mtjc

    3. Set t = t + 1 and goto 1 until convergence4. Output xi = arg maxxi

    dNe(i) mtdi

    Simple and fast algorithm for MAP

    Performs well in practice for images, turbo codes, etc.

    Exact for trees (Pearl 1988)

    Exact for matching (Bayati et al. 2005)

    Exact for generalized b matching (Huang and J 2007)

  • Bipartite Matching

    Motorola Apple IBM

    laptop 0$ 2$ 2$server 0$ 2$ 3$phone 2$ 3$ 0$

    C =

    0 1 00 0 11 0 0

    GivenW , maxCBnn

    ij WijCij such that

    i Cij =

    j Cij = 1

    Classical Hungarian marriage problem O(n3)

    Creates a very loopy graphical model

    Max product takes O(n3) for exact MAP (Bayati et al. 2005)

  • Bipartite Generalized Matching

    Motorola Apple IBM

    laptop 0$ 2$ 2$server 0$ 2$ 3$phone 2$ 3$ 0$

    C =

    0 1 11 0 11 1 0

    GivenW , maxCBnn

    ij WijCij such that

    i Cij =

    j Cij = b

    Combinatorial b-matching problem O(bn3), (Google Adwords)

    Creates a very loopy graphical model

    Max product takes O(bn3) for exact MAP (Huang & J 2007)

  • Bipartite Generalized Matching

    u1 u2 u3 u4

    v1 v2 v3 v4

    Graph G = (U,V ,E ) with U = {u1, . . . , un} andV = {v1, . . . , vn} and M(.), a set of neighbors of node ui or vjDefine xi X and yi Y where xi = M(ui ) and yi = M(vj)Then p(X ,Y ) = 1

    Z

    i

    j (xi , yj )

    k (xk)(yk) where(yj ) = exp(

    uiyjWij) and (xi , yj ) = (vj xi ui yj)

  • Bipartite Generalized Matching

    Theorem (Huang & J 2007)

    Max product on G converges in O(bn3) time.

    Proof.

    Form unwrapped tree T of depth (n), maximizing belief at rootof T is equivalent to maximizing belief at corresponding node in G

    u1

    v1 v2 v3 v4

    u2 u2 u2 u2u3 u3 u3 u3u4 u4 u4 u4

    Theorem (Salez & Shah 2009)

    Under mild assumptions, max product 1-matching is O(n2).

  • Bipartite Generalized Matching

    Code at http://www.cs.columbia.edu/jebara/code

  • Generalized Matching

    2040

    50100

    0

    0.05

    0.1

    0.15

    b

    BP median running time

    n

    t

    2040

    50100

    0

    50

    100

    150

    b

    GOBLIN median running time

    n

    t

    20 40 60 80 1000

    1

    2

    3

    n

    t1/3

    Median Running time when B=5

    20 40 60 80 1000

    1

    2

    3

    4

    n

    t1/4

    Median Running time when B= n/2

    BPGOBLIN

    BPGOBLIN

    Applications:unipartite matchingclustering (J & S 2006)classification (H & J 2007)collaborative filtering (H & J 2008)semisupervised (J et al. 2009)visualization (S & J 2009)

    Max product is O(n2) on dense graphs (Salez & Shah 2009)Much faster than other solvers

  • Unipartite Generalized Matching

    Above is k-nearest neighbors with k = 2

  • Unipartite Generalized Matching

    Above is unipartite b-matching with b = 2

  • Unipartite Generalized Matching

    Left is k-nearest neighbors, right is unipartite b-matching.

  • Unipartite Generalized Matching

    p1 p2 p3 p4p1 0 2 1 2P2 2 0 2 1p3 1 2 0 2p4 2 1 2 0

    C =

    0 1 0 11 0 1 00 1 0 11 0 1 0

    maxCBnn,Cii=0

    ij WijCij such that

    i Cij = b,Cij = Cji

    Combinatorial unipartite matching is efficient (Edmonds 1965)

    Makes an LP with exponentially many blossom inequalities

    Max product exact if LP is integral (Sanghavi et al. 2008)

    p(X ) =

    iV (

    jNe(i) xij 1)

    ijE exp(Wijxij)

  • From Integral LP to Perfect Graphs

    Max product and exact MAP depend on the LPs integrality

    Matchings have special integral LPs (Edmonds 1965)

    How to generalize beyond matchings?

    Perfect graphs guarantee LP integrality!

  • Whats a Perfect Graph?

    In 1960, Claude Berge introduces perfect graphs and twoconjectures

    Perfect iff every induced subgraph has clique # = coloring #

    Weak conjecture: G is perfect iff its complement is perfectStrong conjecture: all perfect graphs are Berge graphs

    Weak perfect graph theorem (Lovasz 1972)

    Link between perfection and integral LPs (Lovasz 1972)

    Strong perfect graph theorem (SPGT) open for 4+ decades

  • Whats a Perfect Graph?

    SPGT Proof (Chudnovsky, Robertson, Seymour, Thomas 2003)

    Berge passes away shortly after hearing of the proof

    Many NP-hard problems are polynomially solvable for perfectgraphs:

    Graph coloringMaximum cliqueMaximum independent set

  • Whats a Perfect Graph?

    x1

    x2x3

    x4x5

    x1

    x2x3

    x4x5