Supplementary Material LPQP for MAP: Putting LP Solvers...

3

Click here to load reader

Transcript of Supplementary Material LPQP for MAP: Putting LP Solvers...

Page 1: Supplementary Material LPQP for MAP: Putting LP Solvers …pletscher.org/papers/pletscher2012lpqp-supp.pdf · Supplementary Material ... the sub-problems are called slave problems

Supplementary MaterialLPQP for MAP: Putting LP Solvers to Better Use

Patrick Pletscher [email protected] Wulff [email protected]

Department of Computer Science, ETH Zurich, Switzerland

A. Convergence

Proposition 1. The convex-concave procedure in Al-gorithm 1 converges to a stationary point of the LPQPobjective in (5) with ρ = ρfinal, the paremeter valuereached when the marginals do not change further.

Proof. It was shown in (Sriperumbudur & Lanckriet,2009) that the CCCP with a convex constraint set con-verges to a stationary point of the objective. In the lastDC iteration, a CCCP is solved with ρ = ρfinal.

B. Dual Decomposition

In the dual decomposition framework the global vari-ables, in our case µ, are replaced with local copiesof the variables for each sub-problem, denoted here byνa. To enforce the multiple copies of the local variablesto assume the same value, a designated constraint isintroduced. In the literature, the sub-problems arecalled slave problems whereas the sum over the slaveproblems, subject to the unifying constraint, is re-ferred to as the master problem.

We consider the following master problem

a∈Amin

νa∈LGasa(νa) (15)

s.t. νai =1

|A(i)|∑

a′∈A(i)

νa′i ∀i, a ∈ A(i)

νaij =1

|A(i, j)|∑

a′∈A(i,j)

νa′ij ∀(i, j), a ∈ A(i, j).

Here we use the idea from Domke (2011) who formu-lates the constraint on the replicated marginal vari-ables to agree with the mean. This is simpler than theconstraint in (13). We can write the Lagrangian and

rearrange to get

L(ν1, . . . ,ν|A|,λ) =

a∈Amin

νa∈LGa

(sa(νa)+

i∈Vaθai (λ)νai +

(i,j)∈Eaθaij(λ)νaij

),

with

θai (λ) = λai −1

|A(i)|∑

a′∈A(i)

λa′i

θaij(λ) = λaij −1

|A(i, j)|∑

a′∈A(i,j)

λa′ij .

The Lagrange multipliers vector λ is of the same lengthas all the ν concatenated together, where for variablesthat are only replicated once, the corresponding La-grange multiplier can be dropped. We can think ofthe potentials as being a function of λ and thus thedual problem of (15) is given by

maxλ

a∈Amin

νa∈LGasa(νa,λ).

Here sa(νa,λ) is defined by

sa(ν,λ) :=∑

i∈Vaθai (λ)

Tνi +

(i,j)∈Eaθaij(λ)

Tνij − ρηaHa

tree(ν),

and the modified potentials are given as:

θai (λ) = θi + θai (λ)

θaij(λ) = θij + θaij(λ).

All the slave computations can be carried out exactlyusing the sum-product algorithm. For the maximiza-tion w.r.t. λ we use FISTA (Beck & Teboulle, 2009),a modern variant of Nesterov’s traditional fast gradi-ent method (Nesterov, 1983). Our dual decompositionapproach is very similar to (Savchynskyy et al., 2011),

Page 2: Supplementary Material LPQP for MAP: Putting LP Solvers …pletscher.org/papers/pletscher2012lpqp-supp.pdf · Supplementary Material ... the sub-problems are called slave problems

LPQP for MAP: Putting LP Solvers to Better Use

with two key differences. First, the authors studyonly a specific choice of the decomposition for the 4-connected grid graph in which each node marginal isreplicated twice and each edge is only considered once.Second, we are also interested in settings of ρ � 0,which is not meaningful in the context of the standardLP relaxation.

The algorithm we used for solving the master problemis given in Algorithm 2. It is based on FISTA de-scent as described in (El Ghaoui, 2012; Vandenberghe,2012).

Algorithm 2 FISTA ascent for the master problem.

1: initialize λ(0) = v(0), k = 1.2: repeat3: θk = 2/(k + 1).4: y = (1− θk)λ(k−1) + θkv

(k−1).5: u = y + tk∇f(y), perform line-search for tk.6: ensure ascent for λ(k):

λ(k) =

{u f(u) ≥ f(λ(k−1))

λ(k−1) otherwise.

7: v(k) = λ(k−1) + 1θk

(u− λ(k−1)).8: k = k + 1.9: until converged.

10: return λ(k−1).

C. Additional Experiments

Figure 5 and 6 show the run time of the LPQP algo-rithm for different initial ρ0 for a grid graph of size40 × 40 with K = 3. We can see that for smaller val-ues of ρ0 one generally obtains better solutions. For

10−2 10−1 100 101 102 103

−700

−680

−660

−640

−620

ρ0

ener

gy

LPQP-TLPQP-UTRWS

Figure 5. Energy of the solution found by the LPQP-U andLPQP-T algorithm as a function of the initial ρ0.

larger values of ρ0, the optimization problem becomesmore and more similar to the standard QP relaxation,as violations in the pairwise marginals are strongly

penalized. The run time of the algorithms howeveralso increases substantially for smaller ρ0, especiallyfor LPQP-T.

10−2 10−1 100 101 102 103

0

50

100

150

ρ0

runti

me

(sec

on

ds)

LPQP-TLPQP-U

Figure 6. Run time of the two different LPQP solvers. Forsmaller ρ0 the run time of LPQP-T is much worse affectedthan the one of LPQP-U.

Finally, Figure 7 visualizes the run time of LPQP-U,MPLP and TRWS for the protein design dataset in abox plot. The mean speedup of LPQP-U over MPLPis slightly above two, but the variance is much higher.TRWS is substantially faster than LPQP and TRWS.

0

10

20

30

40

50

60

70

80

90

100

LPQP MPLP TRWS

run

tim

e (

ho

urs

)

Figure 7. Run time of the three solvers for the protein de-sign experiment. We excluded the instance where MPLPdid not finish within 7 days (168 hours).

Supplement References

Beck, A and Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems.SIAM Journal on Imaging Sciences, 2(1), 2009.

Domke, J. Dual decomposition for marginal inference.In AAAI, 2011.

Page 3: Supplementary Material LPQP for MAP: Putting LP Solvers …pletscher.org/papers/pletscher2012lpqp-supp.pdf · Supplementary Material ... the sub-problems are called slave problems

LPQP for MAP: Putting LP Solvers to Better Use

El Ghaoui, L. Proximal gradient method, 2012. Lec-ture notes.

Nesterov, Y. A method of solving a convex program-ming problem with convergence rate o(1/k2). Soviet.Math. Dokl., 27:372–376, 1983.

Savchynskyy, B, Kappes, J H, Schmidt, S, andSchnorr, C. A study of Nesterov’s scheme forLagrangian decomposition and MAP labeling. InCVPR, pp. 1817–1823, 2011.

Sriperumbudur, B. and Lanckriet, G. On the conver-gence of the concave-convex procedure. In NIPS,pp. 1759–1767. 2009.

Vandenberghe, L. Fast proximal gradient methods,2012. Lecture notes.