Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected...

30
Efficient Linear Programming for Dense CRFs Thalaiyasingam Ajanthan Alban Desmaison Rudy Bunel Mathieu Salzmann Philip H. S. Torr M. Pawan Kumar Oxford, December 2016

Transcript of Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected...

Page 1: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Efficient Linear Programming for Dense CRFs

Thalaiyasingam Ajanthan Alban DesmaisonRudy Bunel Mathieu Salzmann Philip H. S. Torr

M. Pawan Kumar

Oxford, December 2016

Page 2: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Dense CRF

I Fully connected CRF with Gaussian pairwise potentials.

E (x) =

n∑a=1

φa(xa) +

n∑a=1

n∑b=1b 6=a

ψab(xa , xb) ,

ψab(xa , xb) = µ(xa , xb)︸ ︷︷ ︸Label compatibility

exp

(−‖fa − fb‖2

2

)︸ ︷︷ ︸

Pixel compatibility

,

where xa ∈ L and fa ∈ IRd.

Why?

I Captures long-range interactions and provides fine grainedsegmentations [Krahenbuhl-2011].

Page 3: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Dense CRF

I Fully connected CRF with Gaussian pairwise potentials.

E (x) =

n∑a=1

φa(xa) +

n∑a=1

n∑b=1b 6=a

ψab(xa , xb) ,

ψab(xa , xb) = µ(xa , xb)︸ ︷︷ ︸Label compatibility

exp

(−‖fa − fb‖2

2

)︸ ︷︷ ︸

Pixel compatibility

,

where xa ∈ L and fa ∈ IRd.

Why?

I Captures long-range interactions and provides fine grainedsegmentations [Krahenbuhl-2011].

Page 4: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Dense CRF

E (x) =

n∑a=1

φa(xa) +

n∑a=1

n∑b=1b 6=a

ψab(xa , xb) ,

ψab(xa , xb) = µ(xa , xb) exp

(−‖fa − fb‖2

2

).

Difficulty

I Requires O(n2) computations ⇒ Infeasible.

Idea

I Approximate using the filtering method [Adams-2010]⇒ O(n) computations.

Page 5: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Dense CRF

E (x) =

n∑a=1

φa(xa) +

n∑a=1

n∑b=1b 6=a

ψab(xa , xb) ,

ψab(xa , xb) = µ(xa , xb) exp

(−‖fa − fb‖2

2

).

Difficulty

I Requires O(n2) computations ⇒ Infeasible.

Idea

I Approximate using the filtering method [Adams-2010]⇒ O(n) computations.

Page 6: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Current algorithms for MAP inference in dense CRFs

I Rely on the efficient filtering method [Adams-2010].

AlgorithmTime complexityper iteration

Theoreticalbound

Mean Field (MF) [1] O(n) NoQuadratic Programming (QP) [2] O(n) YesDifference of Convex (DC) [2] O(n) YesLinear Programming (LP) [2] O(n log(n)) Yes (best)

Contribution

I LP in O(n) time per iteration⇒ An order of magnitude speedup.

[1] [Krahenbuhl-2011]

[2] [Desmaison-2016]

Page 7: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Current algorithms for MAP inference in dense CRFs

I Rely on the efficient filtering method [Adams-2010].

AlgorithmTime complexityper iteration

Theoreticalbound

Mean Field (MF) [1] O(n) NoQuadratic Programming (QP) [2] O(n) YesDifference of Convex (DC) [2] O(n) YesLinear Programming (LP) [2] O(n log(n)) Yes (best)

Contribution

I LP in O(n) time per iteration⇒ An order of magnitude speedup.

[1] [Krahenbuhl-2011]

[2] [Desmaison-2016]

Page 8: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

LP minimization

Current method

I Projected subgradient descent ⇒ Too slow.I Linearithmic time per iteration.I Expensive line search.I Require large number of iterations.

Our algorithm

I Proximal minimization using block coordinate descent.I One block: Significantly smaller subproblems.I The other block: Efficient conditional gradient descent.

I Linear time conditional gradient computation.I Optimal step size.

I Guarantees optimality and converges faster.

Page 9: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

LP minimization

Current method

I Projected subgradient descent ⇒ Too slow.I Linearithmic time per iteration.I Expensive line search.I Require large number of iterations.

Our algorithm

I Proximal minimization using block coordinate descent.I One block: Significantly smaller subproblems.I The other block: Efficient conditional gradient descent.

I Linear time conditional gradient computation.I Optimal step size.

I Guarantees optimality and converges faster.

Page 10: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

LP relaxation of a dense CRF

miny

E (y) =∑a

∑i

φa:i ya:i +∑a,b 6=a

∑i

Kab|ya:i − yb:i |

2,

s.t. y ∈M =

{y

∑i ya:i = 1, a ∈ {1 . . .n}

ya:i ∈ [0, 1], a ∈ {1 . . .n}, i ∈ L

},

where Kab = exp(−‖fa−fb‖2

2

).

Assumption: Label compatibility is the Potts model.

Standard solvers would require O(n2) variables.

Page 11: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

LP relaxation of a dense CRF

miny

E (y) =∑a

∑i

φa:i ya:i +∑a,b 6=a

∑i

Kab|ya:i − yb:i |

2,

s.t. y ∈M =

{y

∑i ya:i = 1, a ∈ {1 . . .n}

ya:i ∈ [0, 1], a ∈ {1 . . .n}, i ∈ L

},

where Kab = exp(−‖fa−fb‖2

2

).

Assumption: Label compatibility is the Potts model.

Standard solvers would require O(n2) variables.

Page 12: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Proximal minimization of LP

miny

E (y) +1

∥∥∥y − yk∥∥∥2 ,

s.t. y ∈M ,

where yk is the current estimate.

Why?

I Initialization using MF or DC.

I Smooth dual ⇒ Sophisticated optimization.

Approach

I Block coordinate descent tailored to this problem.

Page 13: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Proximal minimization of LP

miny

E (y) +1

∥∥∥y − yk∥∥∥2 ,

s.t. y ∈M ,

where yk is the current estimate.

Why?

I Initialization using MF or DC.

I Smooth dual ⇒ Sophisticated optimization.

Approach

I Block coordinate descent tailored to this problem.

Page 14: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Proximal minimization of LP

miny

E (y) +1

∥∥∥y − yk∥∥∥2 ,

s.t. y ∈M ,

where yk is the current estimate.

Why?

I Initialization using MF or DC.

I Smooth dual ⇒ Sophisticated optimization.

Approach

I Block coordinate descent tailored to this problem.

Page 15: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Dual of the proximal problem

minα,β,γ

g(α,β,γ) =λ

2‖Aα + Bβ + γ − φ‖2

+⟨

Aα + Bβ + γ − φ,yk⟩− 〈1,β〉 ,

s.t. γa:i ≥ 0 ∀ a ∈ {1 . . .n} ∀ i ∈ L ,

α ∈ C =

α1ab:i + α2

ab:i = Kab2 , a 6= b, i ∈ L

α1ab:i , α

2ab:i ≥ 0, a 6= b, i ∈ L

}.

Block coordinate descent

I α ∈ C: Compact domain ⇒ Conditional gradient descent.

I β: Unconstrained ⇒ Set derivative to zero.

I γ: Unbounded and separable ⇒ Small QP for each pixel.

Guarantees optimality since g is strictly convex and smooth.

Page 16: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Dual of the proximal problem

minα,β,γ

g(α,β,γ) =λ

2‖Aα + Bβ + γ − φ‖2

+⟨

Aα + Bβ + γ − φ,yk⟩− 〈1,β〉 ,

s.t. γa:i ≥ 0 ∀ a ∈ {1 . . .n} ∀ i ∈ L ,

α ∈ C =

α1ab:i + α2

ab:i = Kab2 , a 6= b, i ∈ L

α1ab:i , α

2ab:i ≥ 0, a 6= b, i ∈ L

}.

Block coordinate descent

I α ∈ C: Compact domain ⇒ Conditional gradient descent.

I β: Unconstrained ⇒ Set derivative to zero.

I γ: Unbounded and separable ⇒ Small QP for each pixel.

Guarantees optimality since g is strictly convex and smooth.

Page 17: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Dual of the proximal problem

minα,β,γ

g(α,β,γ) =λ

2‖Aα + Bβ + γ − φ‖2

+⟨

Aα + Bβ + γ − φ,yk⟩− 〈1,β〉 ,

s.t. γa:i ≥ 0 ∀ a ∈ {1 . . .n} ∀ i ∈ L ,

α ∈ C =

α1ab:i + α2

ab:i = Kab2 , a 6= b, i ∈ L

α1ab:i , α

2ab:i ≥ 0, a 6= b, i ∈ L

}.

Block coordinate descent

I α ∈ C: Compact domain ⇒ Conditional gradient descent.

I β: Unconstrained ⇒ Set derivative to zero.

I γ: Unbounded and separable ⇒ Small QP for each pixel.

Guarantees optimality since g is strictly convex and smooth.

Page 18: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Conditional gradient computation

∀ a ∈ {1 . . .n}, v ′a =∑b

Kab vb 1[ya ≥ yb ] ,

where ya , yb ∈ [0, 1].

Difficulty

I The permutohedral lattice based filtering methodof [Adams-2010] cannot handle the ordering constraint.

Current method [Desmaison-2016]

I Repeated application of the original filtering method using adivide and conquer strategy ⇒ O(d2n log(n)) computations1.

Our ideaI Discretize the interval [0, 1] to H levels and instantiate H

permutohedral lattices ⇒ O(Hdn) computations2.

1d - filter dimension, usually 2 or 5.2H = 10 in our case.

Page 19: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Conditional gradient computation

∀ a ∈ {1 . . .n}, v ′a =∑b

Kab vb 1[ya ≥ yb ] ,

where ya , yb ∈ [0, 1].

Difficulty

I The permutohedral lattice based filtering methodof [Adams-2010] cannot handle the ordering constraint.

Current method [Desmaison-2016]

I Repeated application of the original filtering method using adivide and conquer strategy ⇒ O(d2n log(n)) computations1.

Our ideaI Discretize the interval [0, 1] to H levels and instantiate H

permutohedral lattices ⇒ O(Hdn) computations2.

1d - filter dimension, usually 2 or 5.2H = 10 in our case.

Page 20: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Conditional gradient computation

∀ a ∈ {1 . . .n}, v ′a =∑b

Kab vb 1[ya ≥ yb ] ,

where ya , yb ∈ [0, 1].

Difficulty

I The permutohedral lattice based filtering methodof [Adams-2010] cannot handle the ordering constraint.

Current method [Desmaison-2016]

I Repeated application of the original filtering method using adivide and conquer strategy ⇒ O(d2n log(n)) computations1.

Our ideaI Discretize the interval [0, 1] to H levels and instantiate H

permutohedral lattices ⇒ O(Hdn) computations2.

1d - filter dimension, usually 2 or 5.2H = 10 in our case.

Page 21: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Conditional gradient computation

∀ a ∈ {1 . . .n}, v ′a =∑b

Kab vb 1[ya ≥ yb ] ,

where ya , yb ∈ [0, 1].

Difficulty

I The permutohedral lattice based filtering methodof [Adams-2010] cannot handle the ordering constraint.

Current method [Desmaison-2016]

I Repeated application of the original filtering method using adivide and conquer strategy ⇒ O(d2n log(n)) computations1.

Our ideaI Discretize the interval [0, 1] to H levels and instantiate H

permutohedral lattices ⇒ O(Hdn) computations2.

1d - filter dimension, usually 2 or 5.2H = 10 in our case.

Page 22: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Segmentation results

0 5 10 15 20 25 30 35

Time (s)

1

2

3

4

5

6

7

8

9

10

Integral

energy

×106

MFDCnegSG-LPℓ

PROX-LPacc

0 20 40 60 80 100 120

Time (s)

5

5.2

5.4

5.6

5.8

6

6.2

6.4

6.6

6.8

7

Integral

energy

×105

MFDCnegSG-LPℓ

PROX-LPacc

Assignment energy as a function of time for an image in (left) MSRCand (right) Pascal.

I Both the LP minimization algorithms are initialized withDCneg.

Page 23: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Segmentation results

Ave. E (×103) Ave. T (s) Acc.M

SR

C

MF5 8078.0 0.2 79.33MF 8062.4 0.5 79.35DCneg 3539.6 1.3 83.01SG-LP` 3335.6 13.6 83.15PROX-LPacc 1340.0 3.7 84.16

Pas

cal

MF5 1220.8 0.8 79.13MF 1220.8 0.7 79.13DCneg 629.5 3.7 80.43SG-LP` 617.1 84.4 80.49PROX-LPacc 507.7 14.7 80.58

Results on the MSRC and Pascal datasets.

Page 24: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Segmentation results

Image MF DCneg SG-LP` PROX-LPaccGround truth

Qualitative results on MSRC.

Page 25: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Modified filtering algorithm

0.5 1 1.5 2

Number of pixels ×105

10

20

30

40

50

60

Speedup

0 5 10 15 20

Number of labels

10

20

30

40

50

60

Spatial kernel (d = 2)

0.5 1 1.5 2

Number of pixels ×105

10

20

30

40

50

Speedup

0 5 10 15 20

Number of labels

10

20

30

40

50

Bilateral kernel (d = 5)

Speedup of our modified filtering algorithm over the divide andconquer strategy on a Pascal image.

Speedup is around 45− 65 on the standard image.

Page 26: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Modified filtering algorithm

0.5 1 1.5 2

Number of pixels ×105

10

20

30

40

50

60

Speedup

0 5 10 15 20

Number of labels

10

20

30

40

50

60

Spatial kernel (d = 2)

0.5 1 1.5 2

Number of pixels ×105

10

20

30

40

50

Speedup

0 5 10 15 20

Number of labels

10

20

30

40

50

Bilateral kernel (d = 5)

Speedup of our modified filtering algorithm over the divide andconquer strategy on a Pascal image.

Speedup is around 45− 65 on the standard image.

Page 27: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Conclusion

I We have introduced the first LP minimization algorithmfor dense CRFs whose iterations are linear in the numberof pixels and labels.

Arxiv: https://arxiv.org/abs/1611.09718

Page 28: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Thank you!

Page 29: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Permutohedral lattice

A 2-dimensional hyperplane tessellated by the permutohedral lattice.

Page 30: Efficient Linear Programming for Dense CRFs · LP minimization Current method I Projected subgradient descent )Too slow. I Linearithmictime per iteration. I Expensiveline search.

Modified filtering algorithm

Splat Blur Slice

Top row: Original filtering method. Bottom row: Our modifiedfiltering method. H = 3.