Duality  SVCLDuality gap the following theorems are relevant (note: proofs are hard,,p y g , ) not...
Embed Size (px)
Transcript of Duality  SVCLDuality gap the following theorems are relevant (note: proofs are hard,,p y g , ) not...
Duality
Nuno Vasconcelos ECE Department, UCSDp ,
Optimizationgoal: find maximum or minimum of a functionDefinition: given functions f g i=1 k and h i=1 mDefinition: given functions f, gi, i=1,...,k and hi, i=1,...mdefined on some domain Ω ∈ Rn
wwf Ω∈ ),( min
iwhiwg
f
i
i
∀=∀≤
,0)( ,0)( subject to
)(
for compactness we write g(w) ≤ 0 instead of gi(w) ≤ 0, ∀i. Similarly h(w) = 0we derived nec. and suf. conds for (local) optimality in• the absence of constraints• equality constraints only
2
• equality constraints only• equality and inequality
Minima conditions (unconstrained)let f(w) be continuously differentiablew* is a local minimum of f(w) if and only ifw is a local minimum of f(w) if and only if• f has zero gradient at w*
0*)( =∇ wf• and the Hessian of f at w* is positive definite
0*)( =∇ wf
nt ddwfd ℜ∈∀≥∇ 0*)(2
• where
⎥⎤
⎢⎡ ∂∂ )()(
22
xfxf
ddwfd ℜ∈∀≥∇ ,0)(
⎥⎥⎥⎥
⎢⎢⎢⎢
∂∂
∂∂∂=∇
−
)()(
)(22
1020
2
ff
xxx
xx
xfn
M
L
3
⎥⎥
⎦⎢⎢
⎣ ∂∂
∂∂∂
−−
)()( 2101
xx
fxxx
f
nn
L
Maxima conditions (unconstrained)let f(w) be continuously differentiablew* is a local maximum of f(w) if and only ifw is a local maximum of f(w) if and only if• f has zero gradient at w*
0*)( =∇ wf• and the Hessian of f at w* is negative definite
0*)( =∇ wf
nt ddwfd ℜ∈∀≤∇ 0*)(2
• where
⎥⎤
⎢⎡ ∂∂ )()(
22
xfxf
ddwfd ℜ∈∀≤∇ ,0)(
⎥⎥⎥⎥
⎢⎢⎢⎢
∂∂
∂∂∂=∇
−
)()(
)(22
1020
2
ff
xxx
xx
xfn
M
L
4
⎥⎥
⎦⎢⎢
⎣ ∂∂
∂∂∂
−−
)()( 2101
xx
fxxx
f
nn
L
Constrained optimization with equality constraints onlyTheorem: consider the problemTheorem: consider the problem
0)()(minarg* == xhxfxx
tosubject
where the constraint gradients ∇hi(x*) are linearly independent. Then x* is a solution if and only if there exits a unique vector λ, such that
0*)(*)() =∇+∇ ∑ xhxfi i
m
iλ
0*)(,0*)(*)()
0)()()
22
1
=∇∀≥⎥⎦
⎤⎢⎣
⎡∇+∇
∇+∇
∑
∑=
yxhyyxhxfyii
xhxfi
Ti
m
iT
ii
i
s.t.
λ
λ
5
1⎥⎦
⎢⎣
∑=i
Alternative formulationstate the conditions through the Lagrangian
m
th th b tl itt
)()(),(1
xhxfxL i
m
ii∑
=
+= λλ
the theorem can be compactly written as
)*,()*()
** =⎥
⎤⎢⎡∇
=∇xL
xLi x 0λ
λ
0*)(,0)*,()
)*,(),()
*2
*
=∇∀≥∇
=⎥⎦
⎢⎣∇
=∇
yxhyyxLyii
xLxLi
Txx
T s.t.
0
λ
λλ
λ
the entries of λ are referred to as Lagrange multipliers
6
Geometric viewconsider the tangent space to the isocontour h(x) = 0since h grows in any direction along which ∇h(x) is notsince h grows in any direction along which ∇h(x) is not zero, ∇h(x) ⊥ to the isocontourhence, the subspace of first order feasible variations is, p
f ∆ f hi h ∆ ti fi th t i t t
ixxhxxV Ti ∀=∆∇∆= ,0*)(*)(
space of ∆x for which x + ∆x satisfies the constraint up to first order approximation
V(x*) feasible variations
x* ∇h(x*)h(x)=0
7
Feasible variationsmultiplying our first Lagrangian condition by ∆x
0*)(*)( ∆∇+∆∇ ∑ xxhxxf Tm
T λ
it follows that ∇f(x*) must satisfy
0*)(*)(1
=∆∇+∆∇ ∑=
xxhxxf ii
iλ
*)(0*)( Vf T ∆∆
i.e.∇f(x*) ⊥ V(x*) : gradient orthogonal to all feasible steps
*)(,0*)( xVxxxf T ∈∆∀=∆∇
no growth is possible along the constraintthis is a generalization of ∇f(x*)=0 in unconstrained case
tnote:• Hessian constraint only defined for y in V(x*)• makes sense: we cannot move anywhere else, does not really
8
y , ymatter what Hessian is outside V(x*)
Inequality constraintswith inequalities
0)(0)(tosubject)(minarg* ≤== xgxhxfx
the only ones that matter are those which are active
0)( ,0)( tosubject )(minarg* ≤== xgxhxfxx
and these are equalities
0)( )( == xgjxA j
innactive
*
9
x*x* active
Constrained optimizationhence, the problem
0)(0)(tosubject)(minarg* ≤== xgxhxfx
is equivalent to
0)( ,0)( tosubject )(minarg* ≤== xgxhxfxx
this is a problem with equality constraints there must be
*)(,0)( ,0)( tosubject )(minarg* xAixgxhxfx ix
∈∀===
g(x) ≤ 0
this is a problem with equality constraints, there must be a λ* and µj*, such that
0*)(*)(*)( ** ∇∇∇ ∑∑ hfrm
λ ∇f
∇gwith µj* = 0, j ∉ A(x*)
0*)(*)(*)( 11
=∇+∇+∇ ∑∑==
xgxhxf jj
jii
i µλ
10
finally, we need µj* ≥ 0, for all j, to guarantee this
The KKT conditionsTheorem: for the problem
0)( ,0)( tosubject )(minarg* ≤== xgxhxfx
x* is a local minimum if and only if there exist λ* and µ* such that
x
0*)(*)(*)(
**
1
*
1
* xgxhxfi) j
r
jji
m
ii =∇+∇+∇
==∑∑ µλ
0*)()*)(,0),0) **
xhivxAjiiijii
rm
jj
⎤⎡
=
∉∀=∀≥ µµ
( )
*)(,0*)(,0*)(*)(
*)(,0)()()*1
*
1
*
xAjyxgiyxhyxVwhere
xVyyxgxhxfyv
Tj
Ti
xxj
r
jji
m
ii
T
∈∀=∇∀=∇=
∈∀≥⎥⎦
⎤⎢⎣
⎡∇+∇+∇∇
===∑∑
and
µλ
11
)(,0)(,0)()( xAjyxgiyxhyxVwhere ji ∈∀∇∀∇ and
Geometric interpretationwe consider the case without equality constraints
0)(tbj t)(i* ≤f
from the KKT conditions, the solution satisfies
0)( tosubject )(minarg* ≤= xgxfxx
[ ]
with
[ ]*)(,0) ,0)
0*)*,( ** xAjiiijii
xLi)
jj ∉∀=∀≥
=∇
µµ
µ
with
*)(*)(*)*,(1
* xgxfxL j
r
jj∑
=
+= µµ
which is equivalent to[ ] ( )[ ])(*)(min*),(minL*
**
xgxfxL T
xx+== µµ
12
*)(,0 and ,0 ** xAjjwith jj ∉∀=∀≥ µµ
Geometric interpretation[ ] ( )[ ]
*)(,0and,0
)(*)(min*),(minL*** xAjjwith
xgxfxL
jj
T
xx
∉∀=∀≥
+==
µµ
µµ
is equivalent to• x = x* ⇒ w*Tz  b = 0
)(,0 and ,0 xAjjwith jj ∉∀∀≥ µµ
⎥⎤
⎢⎡
⎥⎤
⎢⎡ )(1
**xf
Lb• x ≠ x* ⇒ w*Tz  b ≥ 0
can be visualized as
⎥⎦
⎢⎣
=⎥⎦
⎢⎣
==)(
,*
* *,xg
zwLbµ
f ∈ Rf ∈ R
w*
g(x*)=0 g(x*)<0
feasible x
w*(0,f*)(g*,f*)=
(0 f*)
13
g∈ Rr1g(x*)<0
(g*,f*)
g∈ Rr1
(0,f*)
Dualitywe solve instead
[ ] [ ])()(min)(min)(q +== µµµ xgxfxL T⎤⎡⎤⎡
i t ith L* l d b ( ) * b
[ ] [ ]0
)()(min),(min)(q
≥
+==
µ
µµµ
with
xgxfxLxx
⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡==
)()(
,1
),(xgxf
zwqbµ
µ
same picture with L* replaced by q(µ), µ* by µ
f ∈ Rf ∈ Rg(x*)=0 g(x*)<0
w
f ∈ Rw* feasible xw
g Rr1
w*(0,f*)
g(x*)<0g∈ Rr1
(g*,f*)=
(0,f*)
14
g∈ Rr1g(x )<0g∈ Rr 1(0,q(µ))
(0,q(µ))
Dualitynote that • q(µ) ≤ L* = f*• if we keep increasing q(µ) we will get q(µ) = L*• we cannot go beyond L* (x* would move to g(x*) > 0)
this is exactly the definition of the dual problemthis is exactly the definition of the dual problem
[ ] [ ])()(min),(min)(q +== µµµ xgxfxL T
xx)(q max
0µ
µ≥
note:• q(µ) may go to ∞ for some µ.
0 ≥µwith
• this is avoided by introducing the constraint
−∞>=∈ )( µµµ qDq
15
Equality constraintsso far we have disregard them. What about
0)(0)(tosubject)(minarg* ≤== xgxhxfx
intuitively, nothing should change, since
0)( ,0)( tosubject )(minarg* ≤== xgxhxfxx
i.e. each equality is the same as two inequalities 0)( and 0)(0)( ≤≤⇔= xhxhxh
this has Lagrangian
0)(,0)( ,0)( tosubject )(minarg* ≤≤≤= xgxhxhxfxx
this has Lagrangian
)()()()(),,,( xhxhxgxfxL i
m
ii
m
ii
r
i ∑∑∑ −+−+ −++= ααµααµ
16
111 iii ===
Equality constraintswhich is equivalent to
hhfLmmr
++ ∑∑∑ )()()()()(
)α(αλxhxgxf
xhxhxgxfxL
iiii
m
ii
r
i
ii
iii
iii
i
−+
=
−
=
+
=
−+
−=++=
−++=
∑∑
∑∑∑
with )()()(
)()()()(),,,(111
λµ
ααµααµ
i.e. basically the same, but λi do not have to be ≥ 0in summary (µ* λ*) is a Lagrange multiplier if µ∗ ≥ 0 and
iiiii
iii
i==∑∑
11
in summary, (µ ,λ ) is a Lagrange multiplier if µ ≥ 0 and
*)*,,(min* λµxLfx
=
the dual is[ ]
mx
with
xL
ℜ∈≥
=
λµ
λµλµ
0
),,(min),(q( )
),(q max,0
λµλµ q
m D∈ℜ∈≥
17
with ℜ∈≥ λµ ,0 −∞>= ),(, λµλµ qDq
Dualityvarious nice propertiesTheorem: Dq is a convex set and q is concave on Dq
• very appealing because• convex optimization problems are among the easiest to solve• dual is always concave irrespective of primal• dual is always concave, irrespective of primal
Theorem: (weak duality) it is always true that q* ≤ f*if q* = f* we say that there is no duality gapif q = f we say that there is no duality gapTheorem:• if there is no duality gap, the set of Lagrange multipliers is theif there is no duality gap, the set of Lagrange multipliers is the
set of optimal dual solutions• if there is a duality gap, there are no Lagrange multipliers
18
Duality gapthe KKT theorem assures a local minimum only when there is a set of Lagrange multipliers that satisfies themg g pthis is impossible if there is a duality gapwhen is this the case?• as far as I know this is still an open question• there are various results which characterize the existence of
l ti f t i l f blsolutions for certain classes of problems• the bulk of the results are for the case of convex programming
problems
recall: the problem is convex if the function f is convex, and the constraints h and g are convex
19
Duality gapthe following theorems are relevant (note: proofs are hard, not particularly insightful, therefore omitted), p y g , )Theorem: (strong duality) Consider the problem
0)(tosubject)(minarg* ≤= xgxfx
where X, f, and gi are all convex, the optimal value f*
0)( tosubject )(minarg ≤∈
xgxfxXx
finite and there is a vector x, s.t.
.,0)( jxg j ∀< (*)
Then there is at least one Lagrange multiplier vector and there is no duality gap
20
i.e. convex problems have dual as long as (*) holds!
Duality gap(*) is needed to guarantee that there are Lagrange multiplierspconsider for example
0)()(min 2 ≤==ℜ
xxgtosubjectxxf
clearly Dq = ∅, since
ℜ∈x
[ ])()(i)( f
and there is no Lagrange multiplier, even though x*=0 is
[ ] 0,)()(min)( ≥∀−∞=+= µµµ xgxfx
q
a solution
21
Duality gapf
w*
geometrically, we would need the supporting plane to be ertical
f ∈ R
gf =to be verticalbut this cannot happen since the first coordinate of
gf =
the first coordinate of w* is 1condition (*) guarantees that this never happens
g∈ R
f ∈ Rf ∈ R
w*
g(x*)=0 g(x*)<0
feasible x(f(x),g(x)) (f(x),g(x))
w*(0,f*)(g*,f*)=
( ( ),g( ))
22g∈ Rr1
w
g(x*)<0
(g*,f*)
g∈ Rr1
(0,f*)
Duality gapthere is also a slightly more general result when the constraints are linearTheorem: (strong duality) Consider the problem
0tosubject)(minarg* ≤−= T dxexfx (**)
where X, and f are convex, and the optimal value f* finite.
0 tosubject )(minarg ≤∈
jjXx
dxexfx ( )
Then there is at least one Lagrange multiplier vector and there is no duality gapCorollary: if in addition to (**) f is linear and X polyhedralCorollary: if, in addition to ( ), f is linear and X polyhedralthere is no duality gapthese problems are called linear programming problems
23
p p g g p
Linear programmingconsider the problem
0tosubjectminarg* ≤bxAxcx TT
the dual function is
0 tosubject minarg*0
≤−=≥
bxAxcxx
( ) ( ) ( ) =−+=
=−+=
≥
≥
TTT
x
TT
x
bxAc
bAxxcq
µµ
µµ
0
0
min
min
( )( )⎭⎬⎫
⎩⎨⎧
−+= ∑∑≥
≥
iii
iii
Tix
x
bxAc µµ 0
0
min
note: if, for any i, ci + (µTA)i < 0, we can make q(µ)=∞ by making xi arbitrarily large. So, to have solution we need
24
( ) iAc iT
i ∀≥+ ,0µ
Linear programmingsince , when this is( ) ( )( )
⎭⎬⎫
⎩⎨⎧
−+= ∑∑≥ iii
iii
Tix
bxAcq µµµ min0
the case the minimum is at x=0 and q* = µTbthis leads to (switching to –b, and A)
primal: dual: min xcT
x max µ
µ
Tb
0 s.t.
≥≥
xbxAT
0 s.t. T
≥≤
µµ TcA
which the standard form of duality for linear programming problemsth d l b bt i d ith i l i
25
the dual can be obtained with a simple recipe
Linear programmingprimal: dual:min xcT max µTb
0 s.t.
min
≥≥
xbxA
xcT
x
0 s.t. T
≥≤
µµ
µµ
TcA
recipe for primal to dual conversion1. interchange x with µT and b with cT
0 ≥x 0 ≥µ
g µ
2. reverse the constraint inequalities3. maximize instead of minimizing
can be applied to any problem, e.g. equality constraints
minmin ⇔ xcxc T
x
T
x
26
0,,0, ≥−≥≥≥= xbxAbxAxbxA TTTxx
 s.t. s.t.
Linear programmingminmin
⎤⎡⎤⎡
⇔
bA
xcxc T
x
T
x 1. interchange x with µT
and b with cT
00
,,
≥≥
⎥⎦
⎤⎢⎣
⎡−
≥⎥⎦
⎤⎢⎣
⎡−
=
xxb
bx
AA
bAx
s.t. s.t. 2. reverse the inequalities
3. maximize
and the dual is
( ) maxmax µµµ ⇔ TTT bbb( )
[ ]21
21
,,
maxmax
µµµ
µµµµµ
≤≤⎥⎦
⎤⎢⎣
⎡
⇔−
s.t. s.t.
TTTTT cAcA
A
bbb
this has a nice geometric interpretation
( )2121 0,0 µµµµµ −=≥≥⎦⎣−
A
27
this has a nice geometric interpretation
Linear programming exampleprimal dual min xc T T bµmax
s.t.
0,
min
≥=
xbAx
xcx
TT cA
b
≤µ
µµ
s.t.
max
for the example the dual is
0
( )421212min 4321 +++ xxxx ( )22max 21 + µµ( )
s.t.
23223
421
4321
4321
=−+=+−+
xxxxxxx
x( )
,12,123
21
21
≤+≤+
µµµµ
µ
3 s.t.
0≥x4,2
21
1
≤≤
µµµ 2

28
Primal( )
s t
223
421212min 4321
=++
+++
xxxx
xxxxx a1
s.t.
023223
421
4321
≥=−+=+−+
xxxxxxxx
a2
b
solution is a linear combination of⎤⎡⎤⎡−⎤⎡⎤⎡ 1213
a3 a4
that adds up to
⎥⎦
⎤⎢⎣
⎡−
=⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡=
11
,02
,31
,13
4321 aaaa
p
it is not ob io s hat it is What abo t the d al?
⎥⎦
⎤⎢⎣
⎡=
22
b
29
it is not obvious what it is. What about the dual?
Dualµ2
optimal l ti
( )123
22max 21
≤+
+ µµµ
s t
a1
b
µ2 solution
,2,12
,123
1
21
21
≤≤+
≤+
µµµ
µµ
2 3 s.t.
a2
a
b
µ1
⎤⎡⎤⎡−⎤⎡⎤⎡ 1213
4,
21
1
≤µµµ 
a3 a4µ1
vectors
are normal to planes in (µ1,µ2) space.
⎥⎦
⎤⎢⎣
⎡−
=⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡=
11
,02
,31
,13
4321 aaaa
p (µ1 µ2) pthe bias of each plane is set by c = (12,12,2,4)T and defines a half space where the solution must be
30
solution can be obtained by inspection
Dualµ2
optimal l ti
noting that only constraints1 and 2 are active
a1
b
µ2 solution
th b i t f th i l⎩⎨⎧
==
⇔⎩⎨⎧
=+=+
,3,3
,12,123
2
1
21
21
µµ
µµµµ
3 a2
a
b
µ1the basis vectors for the primal solution are
13⎥⎤
⎢⎡
⎥⎤
⎢⎡
aa
a3 a4µ1
and add to b=(2,2)T when x1 = x2 = ½.
,3
,1 21 ⎥
⎦⎢⎣
=⎥⎦
⎢⎣
= aa
( ) 1 2
hence, the optimal solution is
x* = (1/2 1/2 0 0)T
31
x = (1/2,1/2,0,0)
Notesµ2
optimal l ti
by using the dual1 we were able to solve the
a1
b
µ2 solution
1.we were able to solve theproblem with minimal(none?) computation
2 we quickly identified what
a2
a
b
µ12.we quickly identified whatconstraints are active
property 2 is always true:
a3 a4µ1
• at any given region of the spaceonly a few of the constraints are active
• by taking the remaining Lagrange multipliers to zero, the dualby taking the remaining Lagrange multipliers to zero, the dual solution automatically identifies those
property 1:
32
• dual much simpler whenever # of constraints << # variables
Noteson linear programming problems
1 solution is one entire constraint (1 multiplier)1.solution is one entire constraint (1 multiplier) 2.solution is at the intersection of two constraints (2 multipliers)3.more mults only if several constraints intersect at single point
µ2
optimal solution(2 mults)µ
optimal solution(1 mult)
a1
ab
µ2 (2 mults)
a1
b
µ2( )
a2
a3 a4µ1
a2
a3 a4µ1
33
Quadratic programmingconsider the problem
⎫⎧1
where Q is positive definite.
cAxxbQxxxf TT
xx≤
⎭⎬⎫
⎩⎨⎧ −= tosubject
21min)(min
where Q is positive definite. this is a convex problem with linear constraints and has no duality gapthe dual problem is
( ) ⎬⎫
⎨⎧ ⎤⎡ AbQ TTT1i*
setting gradient w.r.t. x to zero we obtain
( ) ⎭⎬⎫
⎩⎨⎧
⎥⎦⎤
⎢⎣⎡ −+−=
≥cAxxbQxxq TTT
xα
α 2minmax*
0
34
g g
( )αα TT AbQxAbQx −=⇔=+− −10
Quadratic programmingand
( ) ( ) ( ) ( )( )⎫⎧ TTTTTTT 11111 ( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )( )⎭⎬⎫
⎩⎨⎧ −−+−−−−=
⎭⎬⎫
⎩⎨⎧ −−+−−−−=
−−−
≥
−−−−
≥
cAbAQAbQbAbQAb
cAbAQAbQbAbQQQAbq
TTTTTTT
TTTTTTT
111
0
1111
0
21max
21max*
ααααα
ααααα
α
α
( ) ( ) ( ) ( )⎫⎧
⎭⎬⎫
⎩⎨⎧ −−−−−−=
⎭⎩
−−
≥
≥
cAbQAbAbQAb TTTTTTT 11
0
0
121max
2
αααααα
α
( ) ( )
⎭⎬⎫
⎩⎨⎧ −+−=
⎭⎬⎫
⎩⎨⎧ −−−−=
−−
−
≥
cbAQAAQ
cAbQAb
TTTT
TTTT
11
1
0
21max
21max
αααα
αααα
( )⎭⎬⎫
⎩⎨⎧ −+−=
⎭⎬
⎩⎨
−−
≥
≥
cbAQAAQ TTT 11
0
0
21max
2
αααα
α
35
Quadratic programminghence, the dual problem is of the form
⎫⎧ 1 with
t th t lik th i l thi i d ti bl
⎭⎬⎫
⎩⎨⎧ +−=
≥dPq TT ααα
α 21max*
0
cbAQdAAQP T
−=
=−
−
1
1 ,
note that, like the primal this is a quadratic problemthe advantage is that the constraints are now much simplersimplerthis is the optimization problem defined by the support vector machinemore on this next class
36
37