of 37 /37
Duality Nuno Vasconcelos ECE Department, UCSD

others
• Category

## Documents

• view

3

0

Embed Size (px)

### Transcript of Duality - SVCLDuality gap the following theorems are relevant (note: proofs are hard,,p y g , ) not...

Duality

Nuno Vasconcelos ECE Department, UCSDp ,

Optimizationgoal: find maximum or minimum of a functionDefinition: given functions f g i=1 k and h i=1 mDefinition: given functions f, gi, i=1,...,k and hi, i=1,...mdefined on some domain Ω ∈ Rn

wwf Ω∈ ),( min

iwhiwg

f

i

i

∀=∀≤

,0)( ,0)( subject to

)(

for compactness we write g(w) ≤ 0 instead of gi(w) ≤ 0, ∀i. Similarly h(w) = 0we derived nec. and suf. conds for (local) optimality in• the absence of constraints• equality constraints only

2

• equality constraints only• equality and inequality

Minima conditions (unconstrained)let f(w) be continuously differentiablew* is a local minimum of f(w) if and only ifw is a local minimum of f(w) if and only if• f has zero gradient at w*

0*)( =∇ wf• and the Hessian of f at w* is positive definite

0*)( =∇ wf

nt ddwfd ℜ∈∀≥∇ 0*)(2

• where

⎥⎤

⎢⎡ ∂∂ )()(

22

xfxf

ddwfd ℜ∈∀≥∇ ,0)(

⎥⎥⎥⎥

⎢⎢⎢⎢

∂∂

∂∂∂=∇

)()(

)(22

1020

2

ff

xxx

xx

xfn

M

L

3

⎥⎥

⎦⎢⎢

⎣ ∂∂

∂∂∂

−−

)()( 2101

xx

fxxx

f

nn

L

Maxima conditions (unconstrained)let f(w) be continuously differentiablew* is a local maximum of f(w) if and only ifw is a local maximum of f(w) if and only if• f has zero gradient at w*

0*)( =∇ wf• and the Hessian of f at w* is negative definite

0*)( =∇ wf

nt ddwfd ℜ∈∀≤∇ 0*)(2

• where

⎥⎤

⎢⎡ ∂∂ )()(

22

xfxf

ddwfd ℜ∈∀≤∇ ,0)(

⎥⎥⎥⎥

⎢⎢⎢⎢

∂∂

∂∂∂=∇

)()(

)(22

1020

2

ff

xxx

xx

xfn

M

L

4

⎥⎥

⎦⎢⎢

⎣ ∂∂

∂∂∂

−−

)()( 2101

xx

fxxx

f

nn

L

Constrained optimization with equality constraints onlyTheorem: consider the problemTheorem: consider the problem

0)()(minarg* == xhxfxx

tosubject

where the constraint gradients ∇hi(x*) are linearly independent. Then x* is a solution if and only if there exits a unique vector λ, such that

0*)(*)() =∇+∇ ∑ xhxfi i

m

0*)(,0*)(*)()

0)()()

22

1

=∇∀≥⎥⎦

⎤⎢⎣

⎡∇+∇

∇+∇

∑=

yxhyyxhxfyii

xhxfi

Ti

m

iT

ii

i

s.t.

λ

λ

5

1⎥⎦

⎢⎣

∑=i

Alternative formulationstate the conditions through the Lagrangian

m

th th b tl itt

)()(),(1

xhxfxL i

m

ii∑

=

+= λλ

the theorem can be compactly written as

)*,()*()

** =⎥

⎤⎢⎡∇

=∇xL

xLi x 0λ

λ

0*)(,0)*,()

)*,(),()

*2

*

=∇∀≥∇

=⎥⎦

⎢⎣∇

=∇

yxhyyxLyii

xLxLi

Txx

T s.t.

0

λ

λλ

λ

the entries of λ are referred to as Lagrange multipliers

6

Geometric viewconsider the tangent space to the iso-contour h(x) = 0since h grows in any direction along which ∇h(x) is notsince h grows in any direction along which ∇h(x) is not zero, ∇h(x) ⊥ to the iso-contourhence, the subspace of first order feasible variations is, p

f ∆ f hi h ∆ ti fi th t i t t

ixxhxxV Ti ∀=∆∇∆= ,0*)(|*)(

space of ∆x for which x + ∆x satisfies the constraint up to first order approximation

V(x*) feasible variations

x* ∇h(x*)h(x)=0

7

Feasible variationsmultiplying our first Lagrangian condition by ∆x

0*)(*)( ∆∇+∆∇ ∑ xxhxxf Tm

T λ

it follows that ∇f(x*) must satisfy

0*)(*)(1

=∆∇+∆∇ ∑=

xxhxxf ii

*)(0*)( Vf T ∆∆

i.e.∇f(x*) ⊥ V(x*) : gradient orthogonal to all feasible steps

*)(,0*)( xVxxxf T ∈∆∀=∆∇

no growth is possible along the constraintthis is a generalization of ∇f(x*)=0 in unconstrained case

tnote:• Hessian constraint only defined for y in V(x*)• makes sense: we cannot move anywhere else, does not really

8

y , ymatter what Hessian is outside V(x*)

Inequality constraintswith inequalities

0)(0)(tosubject)(minarg* ≤== xgxhxfx

the only ones that matter are those which are active

0)( ,0)( tosubject )(minarg* ≤== xgxhxfxx

and these are equalities

0)(| )( == xgjxA j

innactive

*

9

x*x* active

Constrained optimizationhence, the problem

0)(0)(tosubject)(minarg* ≤== xgxhxfx

is equivalent to

0)( ,0)( tosubject )(minarg* ≤== xgxhxfxx

this is a problem with equality constraints there must be

*)(,0)( ,0)( tosubject )(minarg* xAixgxhxfx ix

∈∀===

g(x) ≤ 0

this is a problem with equality constraints, there must be a λ* and µj*, such that

0*)(*)(*)( ** ∇∇∇ ∑∑ hfrm

λ ∇f

∇gwith µj* = 0, j ∉ A(x*)

0*)(*)(*)( 11

=∇+∇+∇ ∑∑==

xgxhxf jj

jii

i µλ

10

finally, we need µj* ≥ 0, for all j, to guarantee this

The KKT conditionsTheorem: for the problem

0)( ,0)( tosubject )(minarg* ≤== xgxhxfx

x* is a local minimum if and only if there exist λ* and µ* such that

x

0*)(*)(*)(

**

1

*

1

* xgxhxfi) j

r

jji

m

ii =∇+∇+∇

==∑∑ µλ

0*)()*)(,0),0) **

xhivxAjiiijii

rm

jj

⎤⎡

=

∉∀=∀≥ µµ

( )

*)(,0*)(,0*)(|*)(

*)(,0)()()*1

*

1

*

xAjyxgiyxhyxVwhere

xVyyxgxhxfyv

Tj

Ti

xxj

r

jji

m

ii

T

∈∀=∇∀=∇=

∈∀≥⎥⎦

⎤⎢⎣

⎡∇+∇+∇∇

===∑∑

and

µλ

11

)(,0)(,0)(|)( xAjyxgiyxhyxVwhere ji ∈∀∇∀∇ and

Geometric interpretationwe consider the case without equality constraints

0)(tbj t)(i* ≤f

from the KKT conditions, the solution satisfies

0)( tosubject )(minarg* ≤= xgxfxx

[ ]

with

[ ]*)(,0) ,0)

0*)*,( ** xAjiiijii

xLi)

jj ∉∀=∀≥

=∇

µµ

µ

with

*)(*)(*)*,(1

* xgxfxL j

r

jj∑

=

+= µµ

which is equivalent to[ ] ( )[ ])(*)(min*),(minL*

**

xgxfxL T

xx+== µµ

12

*)(,0 and ,0 ** xAjjwith jj ∉∀=∀≥ µµ

Geometric interpretation[ ] ( )[ ]

*)(,0and,0

)(*)(min*),(minL*** xAjjwith

xgxfxL

jj

T

xx

∉∀=∀≥

+==

µµ

µµ

is equivalent to• x = x* ⇒ w*Tz - b = 0

)(,0 and ,0 xAjjwith jj ∉∀∀≥ µµ

⎥⎤

⎢⎡

⎥⎤

⎢⎡ )(1

**xf

Lb• x ≠ x* ⇒ w*Tz - b ≥ 0

can be visualized as

⎥⎦

⎢⎣

=⎥⎦

⎢⎣

==)(

,*

* *,xg

zwLbµ

f ∈ Rf ∈ R

w*

g(x*)=0 g(x*)<0

feasible x

w*(0,f*)(g*,f*)=

(0 f*)

13

g∈ Rr-1g(x*)<0

(g*,f*)

g∈ Rr-1

(0,f*)

[ ] [ ])()(min)(min)(q +== µµµ xgxfxL T⎤⎡⎤⎡

i t ith L* l d b ( ) * b

[ ] [ ]0

)()(min),(min)(q

+==

µ

µµµ

with

xgxfxLxx

⎥⎦

⎤⎢⎣

⎡=⎥

⎤⎢⎣

⎡==

)()(

,1

),(xgxf

zwqbµ

µ

same picture with L* replaced by q(µ), µ* by µ

f ∈ Rf ∈ Rg(x*)=0 g(x*)<0

w

f ∈ Rw* feasible xw

g Rr-1

w*(0,f*)

g(x*)<0g∈ Rr-1

(g*,f*)=

(0,f*)

14

g∈ Rr-1g(x )<0g∈ Rr 1(0,q(µ))

(0,q(µ))

Dualitynote that • q(µ) ≤ L* = f*• if we keep increasing q(µ) we will get q(µ) = L*• we cannot go beyond L* (x* would move to g(x*) > 0)

this is exactly the definition of the dual problemthis is exactly the definition of the dual problem

[ ] [ ])()(min),(min)(q +== µµµ xgxfxL T

xx)(q max

µ≥

note:• q(µ) may go to -∞ for some µ.

0 ≥µwith

• this is avoided by introducing the constraint

−∞>=∈ )(| µµµ qDq

15

Equality constraintsso far we have disregard them. What about

0)(0)(tosubject)(minarg* ≤== xgxhxfx

intuitively, nothing should change, since

0)( ,0)( tosubject )(minarg* ≤== xgxhxfxx

i.e. each equality is the same as two inequalities 0)(- and 0)(0)( ≤≤⇔= xhxhxh

this has Lagrangian

0)(,0)(- ,0)( tosubject )(minarg* ≤≤≤= xgxhxhxfxx

this has Lagrangian

)()()()(),,,( xhxhxgxfxL i

m

ii

m

ii

r

i ∑∑∑ −+−+ −++= ααµααµ

16

111 iii ===

Equality constraintswhich is equivalent to

hhfLmmr

++ ∑∑∑ )()()()()(

)α(αλxhxgxf

xhxhxgxfxL

iiii

m

ii

r

i

ii

iii

iii

i

−+

=

=

+

=

−+

−=++=

−++=

∑∑

∑∑∑

with )()()(

)()()()(),,,(111

λµ

ααµααµ

i.e. basically the same, but λi do not have to be ≥ 0in summary (µ* λ*) is a Lagrange multiplier if µ∗ ≥ 0 and

iiiii

iii

i==∑∑

11

in summary, (µ ,λ ) is a Lagrange multiplier if µ ≥ 0 and

*)*,,(min* λµxLfx

=

the dual is[ ]

mx

with

xL

ℜ∈≥

=

λµ

λµλµ

0

),,(min),(q( )

),(q max,0

λµλµ q

m D∈ℜ∈≥

17

with ℜ∈≥ λµ ,0 −∞>= ),(|, λµλµ qDq

Dualityvarious nice propertiesTheorem: Dq is a convex set and q is concave on Dq

• very appealing because• convex optimization problems are among the easiest to solve• dual is always concave irrespective of primal• dual is always concave, irrespective of primal

Theorem: (weak duality) it is always true that q* ≤ f*if q* = f* we say that there is no duality gapif q = f we say that there is no duality gapTheorem:• if there is no duality gap, the set of Lagrange multipliers is theif there is no duality gap, the set of Lagrange multipliers is the

set of optimal dual solutions• if there is a duality gap, there are no Lagrange multipliers

18

Duality gapthe KKT theorem assures a local minimum only when there is a set of Lagrange multipliers that satisfies themg g pthis is impossible if there is a duality gapwhen is this the case?• as far as I know this is still an open question• there are various results which characterize the existence of

l ti f t i l f blsolutions for certain classes of problems• the bulk of the results are for the case of convex programming

problems

recall: the problem is convex if the function f is convex, and the constraints h and g are convex

19

Duality gapthe following theorems are relevant (note: proofs are hard, not particularly insightful, therefore omitted), p y g , )Theorem: (strong duality) Consider the problem

0)(tosubject)(minarg* ≤= xgxfx

where X, f, and gi are all convex, the optimal value f*

0)( tosubject )(minarg ≤∈

xgxfxXx

finite and there is a vector x, s.t.

.,0)( jxg j ∀< (*)

Then there is at least one Lagrange multiplier vector and there is no duality gap

20

i.e. convex problems have dual as long as (*) holds!

Duality gap(*) is needed to guarantee that there are Lagrange multiplierspconsider for example

0)()(min 2 ≤==ℜ

xxgtosubjectxxf

clearly Dq = ∅, since

ℜ∈x

[ ])()(i)( f

and there is no Lagrange multiplier, even though x*=0 is

[ ] 0,)()(min)( ≥∀−∞=+= µµµ xgxfx

q

a solution

21

Duality gapf

w*

geometrically, we would need the supporting plane to be ertical

f ∈ R

gf =to be verticalbut this cannot happen since the first coordinate of

gf =

the first coordinate of w* is 1condition (*) guarantees that this never happens

g∈ R

f ∈ Rf ∈ R

w*

g(x*)=0 g(x*)<0

feasible x(f(x),g(x)) (f(x),g(x))

w*(0,f*)(g*,f*)=

( ( ),g( ))

22g∈ Rr-1

w

g(x*)<0

(g*,f*)

g∈ Rr-1

(0,f*)

Duality gapthere is also a slightly more general result when the constraints are linearTheorem: (strong duality) Consider the problem

0tosubject)(minarg* ≤−= T dxexfx (**)

where X, and f are convex, and the optimal value f* finite.

0 tosubject )(minarg ≤∈

jjXx

dxexfx ( )

Then there is at least one Lagrange multiplier vector and there is no duality gapCorollary: if in addition to (**) f is linear and X polyhedralCorollary: if, in addition to ( ), f is linear and X polyhedralthere is no duality gapthese problems are called linear programming problems

23

p p g g p

Linear programmingconsider the problem

0tosubjectminarg* ≤bxAxcx TT

the dual function is

0 tosubject minarg*0

≤−=≥

bxAxcxx

( ) ( ) ( ) =−+=

=−+=

TTT

x

TT

x

bxAc

bAxxcq

µµ

µµ

0

0

min

min

( )( )⎭⎬⎫

⎩⎨⎧

−+= ∑∑≥

iii

iii

Tix

x

bxAc µµ 0

0

min

note: if, for any i, ci + (µTA)i < 0, we can make q(µ)=-∞ by making xi arbitrarily large. So, to have solution we need

24

( ) iAc iT

i ∀≥+ ,0µ

Linear programmingsince , when this is( ) ( )( )

⎭⎬⎫

⎩⎨⎧

−+= ∑∑≥ iii

iii

Tix

bxAcq µµµ min0

the case the minimum is at x=0 and q* = -µTbthis leads to (switching to –b, and -A)

primal: dual: min xcT

x max µ

µ

Tb

0 s.t.

≥≥

xbxAT

0 s.t. T

≥≤

µµ TcA

which the standard form of duality for linear programming problemsth d l b bt i d ith i l i

25

the dual can be obtained with a simple recipe

Linear programmingprimal: dual:min xcT max µTb

0 s.t.

min

≥≥

xbxA

xcT

x

0 s.t. T

≥≤

µµ

µµ

TcA

recipe for primal to dual conversion1. interchange x with µT and b with cT

0 ≥x 0 ≥µ

g µ

2. reverse the constraint inequalities3. maximize instead of minimizing

can be applied to any problem, e.g. equality constraints

minmin ⇔ xcxc T

x

T

x

26

0,,0, ≥−≥≥≥= xbxAbxAxbxA TTTxx

- s.t. s.t.

Linear programmingminmin

⎤⎡⎤⎡

bA

xcxc T

x

T

x 1. interchange x with µT

and b with cT

00

,,

≥≥

⎥⎦

⎤⎢⎣

⎡−

≥⎥⎦

⎤⎢⎣

⎡−

=

xxb

bx

AA

bAx

s.t. s.t. 2. reverse the inequalities

3. maximize

and the dual is

( ) maxmax µµµ ⇔ TTT bbb( )

[ ]21

21

,,

maxmax

µµµ

µµµµµ

≤≤⎥⎦

⎤⎢⎣

⇔−

s.t. s.t.

TTTTT cAcA

A

bbb

this has a nice geometric interpretation

( )2121 0,0 µµµµµ −=≥≥⎦⎣−

A

27

this has a nice geometric interpretation

Linear programming exampleprimal dual min xc T T bµmax

s.t.

0,

min

≥=

xbAx

xcx

TT cA

b

≤µ

µµ

s.t.

max

for the example the dual is

0

( )421212min 4321 +++ xxxx ( )22max 21 + µµ( )

s.t.

23223

421

4321

4321

=−+=+−+

xxxxxxx

x( )

,12,123

21

21

≤+≤+

µµµµ

µ

3 s.t.

0≥x4,2

21

1

≤≤

µµµ 2-

-

28

Primal( )

s t

223

421212min 4321

=++

+++

xxxx

xxxxx a1

s.t.

023223

421

4321

≥=−+=+−+

xxxxxxxx

a2

b

solution is a linear combination of⎤⎡⎤⎡−⎤⎡⎤⎡ 1213

a3 a4

⎥⎦

⎤⎢⎣

⎡−

=⎥⎦

⎤⎢⎣

⎡=⎥

⎤⎢⎣

⎡=⎥

⎤⎢⎣

⎡=

11

,02

,31

,13

4321 aaaa

p

it is not ob io s hat it is What abo t the d al?

⎥⎦

⎤⎢⎣

⎡=

22

b

29

it is not obvious what it is. What about the dual?

Dualµ2

optimal l ti

( )123

22max 21

≤+

+ µµµ

s t

a1

b

µ2 solution

,2,12

,123

1

21

21

≤≤+

≤+

µµµ

µµ

2- 3 s.t.

a2

a

b

µ1

⎤⎡⎤⎡−⎤⎡⎤⎡ 1213

4,

21

1

≤µµµ -

a3 a4µ1

vectors

are normal to planes in (µ1,µ2) space.

⎥⎦

⎤⎢⎣

⎡−

=⎥⎦

⎤⎢⎣

⎡=⎥

⎤⎢⎣

⎡=⎥

⎤⎢⎣

⎡=

11

,02

,31

,13

4321 aaaa

p (µ1 µ2) pthe bias of each plane is set by c = (12,12,2,4)T and defines a half space where the solution must be

30

solution can be obtained by inspection

Dualµ2

optimal l ti

noting that only constraints1 and 2 are active

a1

b

µ2 solution

th b i t f th i l⎩⎨⎧

==

⇔⎩⎨⎧

=+=+

,3,3

,12,123

2

1

21

21

µµ

µµµµ

3 a2

a

b

µ1the basis vectors for the primal solution are

13⎥⎤

⎢⎡

⎥⎤

⎢⎡

aa

a3 a4µ1

and add to b=(2,2)T when x1 = x2 = ½.

,3

,1 21 ⎥

⎦⎢⎣

=⎥⎦

⎢⎣

= aa

( ) 1 2

hence, the optimal solution is

x* = (1/2 1/2 0 0)T

31

x = (1/2,1/2,0,0)

Notesµ2

optimal l ti

by using the dual1 we were able to solve the

a1

b

µ2 solution

1.we were able to solve theproblem with minimal(none?) computation

2 we quickly identified what

a2

a

b

µ12.we quickly identified whatconstraints are active

property 2 is always true:

a3 a4µ1

• at any given region of the spaceonly a few of the constraints are active

• by taking the remaining Lagrange multipliers to zero, the dualby taking the remaining Lagrange multipliers to zero, the dual solution automatically identifies those

property 1:

32

• dual much simpler whenever # of constraints << # variables

Noteson linear programming problems

1 solution is one entire constraint (1 multiplier)1.solution is one entire constraint (1 multiplier) 2.solution is at the intersection of two constraints (2 multipliers)3.more mults only if several constraints intersect at single point

µ2

optimal solution(2 mults)µ

optimal solution(1 mult)

a1

ab

µ2 (2 mults)

a1

b

µ2( )

a2

a3 a4µ1

a2

a3 a4µ1

33

⎫⎧1

where Q is positive definite.

cAxxbQxxxf TT

xx≤

⎭⎬⎫

⎩⎨⎧ −= tosubject

21min)(min

where Q is positive definite. this is a convex problem with linear constraints and has no duality gapthe dual problem is

( ) ⎬⎫

⎨⎧ ⎤⎡ AbQ TTT1i*

setting gradient w.r.t. x to zero we obtain

( ) ⎭⎬⎫

⎩⎨⎧

⎥⎦⎤

⎢⎣⎡ −+−=

≥cAxxbQxxq TTT

α 2minmax*

0

34

g g

( )αα TT AbQxAbQx −=⇔=+− −10

( ) ( ) ( ) ( )( )⎫⎧ TTTTTTT 11111 ( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )( )⎭⎬⎫

⎩⎨⎧ −−+−−−−=

⎭⎬⎫

⎩⎨⎧ −−+−−−−=

−−−

−−−−

cAbAQAbQbAbQAb

cAbAQAbQbAbQQQAbq

TTTTTTT

TTTTTTT

111

0

1111

0

21max

21max*

ααααα

ααααα

α

α

( ) ( ) ( ) ( )⎫⎧

⎭⎬⎫

⎩⎨⎧ −−−−−−=

⎭⎩

−−

cAbQAbAbQAb TTTTTTT 11

0

0

121max

2

αααααα

α

( ) ( )

⎭⎬⎫

⎩⎨⎧ −+−=

⎭⎬⎫

⎩⎨⎧ −−−−=

−−

cbAQAAQ

cAbQAb

TTTT

TTTT

11

1

0

21max

21max

αααα

αααα

( )⎭⎬⎫

⎩⎨⎧ −+−=

⎭⎬

⎩⎨

−−

cbAQAAQ TTT 11

0

0

21max

2

αααα

α

35

Quadratic programminghence, the dual problem is of the form

⎫⎧ 1 with

t th t lik th i l thi i d ti bl

⎭⎬⎫

⎩⎨⎧ +−=

≥dPq TT ααα

α 21max*

0

cbAQdAAQP T

−=

=−

1

1 ,

note that, like the primal this is a quadratic problemthe advantage is that the constraints are now much simplersimplerthis is the optimization problem defined by the support vector machinemore on this next class

36

37