Approximate Inference in Graphical Models using LP...

26
Approximate Inference in Graphical Models using LP Relaxations David Sontag Based on joint work with Tommi Jaakkola, Amir Globerson, Talya Meltzer, and Yair Weiss

Transcript of Approximate Inference in Graphical Models using LP...

Approximate Inference in Graphical Models using LP Relaxations

David Sontag�

Based on joint work with Tommi Jaakkola, Amir Globerson, Talya Meltzer, and Yair Weiss

Small scale validation cont’d• We can also directly examine changes due to OR1 knock-out

Preliminary validation• The !-phage (bacteria infecting virus)!"#$%&'#($)*#+,-./

0#,12(#'

3-142(#

5&1-6#

7&8.6,

OR3 OR1OR2cI cro

cI RNAp2

Arkin et al. 1998

• We can find the equilibrium of the game (binding frequencies)as a function of overall protein concentrations.

50

10!2

100

102

0

0.2

0.4

0.6

0.8

1

Binding in OR

3

frepressor

/fRNA

Bin

din

g F

req

ue

ncy (

tim

e!

ave

rag

e)

RepressorRNA!polymerase

(a) OR3

10!2

100

102

0

0.2

0.4

0.6

0.8

1

Binding in OR

2

frepressor

/fRNA

Bin

din

g F

req

ue

ncy (

tim

e!

ave

rag

e)

RepressorRNA!polymerase

(b) OR2

10!2

100

102

0

0.2

0.4

0.6

0.8

1

Binding in OR

1

frepressor

/fRNA

Bin

din

g F

req

ue

ncy (

tim

e!

ave

rag

e)

RepressorRNA!polymerase

(c) OR1

Figure 3: Predicted protein binding to sites OR3, OR2, and mutated OR1 for increasing amounts of cI2.

became su!ciently high do we find cI2 at the mutatedOR1 as well. Note, however, that cI2 inhibits transcrip-tion at OR3 prior to occupying OR1. Thus the bindingat the mutated OR1 could not be observed without in-terventions.

7 Discussion

We believe the game theoretic approach provides a com-pelling causal abstraction of biological systems with re-source constraints. The model is complete with prov-ably convergent algorithms for finding equilibria on agenome-wide scale.

The results from the small scale application are en-couraging. Our model successfully reproduces knownbehavior of the !!switch on the basis of molecularlevel competition and resource constraints, without theneed to assume protein-protein interactions between cI2dimers and cI2 and RNA-polymerase. Even in the con-text of this well-known sub-system, however, few quan-titative experimental results are available about bind-ing. Proper validation and use of our model thereforerelies on estimating the game parameters from availableprotein-DNA binding data (in progress). Once the gameparameters are known, the model provides valid pre-dictions for a number of possible perturbations to thesystem, including changing nuclear concentrations andknock-outs.

Acknowledgments

This work was supported in part by NIH grant GM68762and by NSF ITR grant 0428715. Luis Perez-Breva is a“Fundacion Rafael del Pino” Fellow.

References

[1] Adam Arkin, John Ross, and Harley H. McAdams.Stochastic kinetic analysis of developmental path-way bifurcation in phage !-infected excherichia colicells. Genetics, 149:1633–1648, August 1998.

[2] Kenneth J. Arrow and Gerard Debreu. Existence ofan equilibrium for a competitive economy. Econo-metrica, 22(3):265–290, July 1954.

[3] Z. Bar-Joseph, G. Gerber, T. Lee, N. Rinaldi,J. Yoo, B. Gordon F. Robert, E. Fraenkel,T. Jaakkola, R. Young, and D. Gi"ord. Compu-tational discovery of gene modules and regulatorynetworks. Nature Biotechnology, 21(11):1337–1342,2003.

[4] Otto G. Berg, Robert B. Winter, and Peter H. vonHippel. Di"usion- driven mechanisms of proteintranslocation on nucleic acids. 1. models and theory.Biochemistry, 20(24):6929–48, November 1981.

[5] Drew Fudenberg and Jean Tirole. Game Theory.The MIT Press, 1991.

10

• Predictions are again qualitatively correct

52

MAP in Undirected Graphical Models

θih(xi, xh)

θjk(xj, xk) θij(xi, xj)

xi

xj

xk

xh G=(V,E)

Protein backbone

Side-chains

θij(xi, xj) xj

xi

Real-world problems:

Protein design

Stereo vision

Find most likely assignment:

How to solve MAP?

  MAP is known to be NP-hard (e.g., MAP on binary MRFs is equivalent to Max-Cut)

  Real-world MAP problems are not necessarily as hard as theoretical worst case

How to solve MAP?

  New toolkit: Message-passing algorithms based on linear programming relaxations�(Schlensinger ’76, Kolmogorov & Wainwright ‘05, Vontobel & Koetter ‘06, Johnson et al. ’07, Komodakis et al. ‘07, Globerson & Jaakkola ’08…)

  Solves exactly when LP relaxation is tight:�trees, binary submodular MRFs, and matchings

  In practice, we seldom have these structures

  By tightening the relaxation (problem specific), we can solve hard real-world problems, exactly

We can formulate the MAP problem as a linear program

MAP as a linear program

The marginal polytope constrains the to be marginals of some distribution:

Very many constraints! Obj.

xmap

Vertices correspond to assignments

where the variables are defined over edges.

Relaxing the MAP LP

Relaxing the MAP LP

Such that�

Simplest outer bound:

Relaxation

MAP Objective

Partial pairwise�consistency }

Tightening the LP

Such that�

Relaxation

MAP Objective

1 2 3

4 5 6

7 8 9

Partial pairwise�consistency }

Tightening the LP

Such that�

Relaxation

MAP Objective

1 2 3

4 5 6

7 8 9

j

i

k Pairwise�consistency }

Tightening the LP

Such that�

Relaxation

MAP Objective

j

i k } Triplet�

consistency

Tightening the LP

Such that�

Relaxation

MAP Objective

… …

Tightening the LP

Such that�

Relaxation

MAP Objective

… …

} Quadruplet �consistency

j

i k

l

Tightening the LP

Such that�

Relaxation

MAP Objective

… …

  Can we efficiently solve the LP?

  What clusters to add?

  How do we avoid re-solving?

Great! But…

Might be “lucky”�and solve earlier

Our solution

  Can we efficiently solve the LP?   We work in one of the dual LPs (Globerson & Jaakkola ‘07)

  Dual can be solved by an efficient message-passing algorithm   Corresponds to coordinate-descent algorithm

  What cluster to add next?   We propose a greedy bound minimization algorithm   Add clusters with guaranteed improvement – upper bound gets tighter

  How do we avoid re-solving?   “Warm start” of new messages using the old messages

Dual

Primal MAP Obj.

Dual algorithm

Iteration

2. Decode assignment from messages

MAP

Dual

Objective

1. Run message-passing

4. Warm start: initialize new cluster messages 3. Choose a cluster to add to relaxation

Is gap (dual obj – assignment val) small?

Integer solution

Messages

Done! No.

Same objective value

Dual algorithm

What cluster to add next?

Iteration

MAP

Dual

Objective

Full decrease due to cluster One full iteration with cluster One outgoing message from cluster

What cluster to add next?

1 2

3

What cluster to add next?

if otherwise

1 2

3

If dual decreases, there was frustration

xi

zj

zk

zi

zkxk

zi

xi

xj zj,c! zj,c!

zi,c! zi,c!

zk,c!

Coarsened cluster consistency

  Each new cluster requires adding a large number of LP variables and constraints

  Is it possible to use just a subset of these constraints?

  We give a new class of sparse cluster constraints, enforcing consistency on coarsened variables

(Sontag, Globerson, Jaakkola, NIPS ‘08)

µijk(xi, xj , xk)

Experiments: Protein design

  Given protein’s 3D shape, choose amino-acids giving the most stable structure

  Each state corresponds to a choice of amino-acid and side-chain angle   MRFs have 41-180 variables, each variable with 95-158 states   Hard to solve

  Very large treewidth   Many small cycles (20,000 triangles) and frustration

θih(xi, xh)

θjk(xj, xk) θik(xi, xk)

xi

xk

xj

xh G=(V,E)

Protein backbone

Side-chains

(MRFs from Yanover, Meltzer, Weiss ‘06)

Primal LP, pairwise, is large

(Yanover, Meltzer, Weiss, JMLR ‘06)

CPLEX can only run on 3:�must move to dual!

  Pairwise constraints solve only 2 of the 97 proteins

  Iteratively tightening relaxation with triplets, we exactly solve 96 of the 97 proteins (!!!)

  Using the coarsened clusters, average time to solve 15 largest proteins is 1.5 hours

  Bound criterion finds the right constraints: Only 5 to 735 triplets needed to be added per problem

Protein design results

Coarsening clusters really helps

0 1 2 3 4 5160

180

200

220

240

260

Hours

Ob

ject

ive

This paper

Sontag et al. UAI ’08

Primal (best decoding)

Dual

Sparse constraints (NIPS ’08)

Original (UAI ’08)

1000 1200 1400 1600 18000

5

10

15

20

25

30

35

Iteration Number

Tim

e (S

econds)

This paper

Sontag et al. UAI ’08

Sparse constraints (NIPS ’08)

Original (UAI ’08)

Related Work

  Similar ideas can be done directly in the primal   Selection criteria of constraint violation instead of bound minimization

  (Sontag & Jaakkola ’08)

  Can also be applied to marginals   Guidance by bound on partition function rather than MAP value   Similar to region-pursuit algorithm for generalized BP (Welling

UAI ’04)

Conclusions & Future Work

  New toolkit of message-passing algorithms based on dual LP relaxations� +�Iterative tightening of LP relaxation� =�Ability to solve interesting real world-problems

  More generally, when can we expect these MAP inference techniques to be successful?

  How should we do learning with approximate inference – in particular, with LP relaxations?