The Karush-Kuhn-Tucker conditi d d litditions and...
Transcript of The Karush-Kuhn-Tucker conditi d d litditions and...
-
The Karush-Kuhn-Tucker diti d d litconditions and duality
Nuno Vasconcelos ECE Department, UCSDp ,
-
Optimizationgoal: find maximum or minimum of a functionDefinition: given functions f g i=1 k and h i=1 mDefinition: given functions f, gi, i=1,...,k and hi, i=1,...mdefined on some domain Ω ∈ Rn
wwf Ω∈ ),( min
iwhiwg
f
i
i
∀=∀≤
,0)( ,0)( subject to
)(
for compactness we write g(w) ≤ 0 instead of gi(w) ≤ 0, ∀i. Similarly h(w) = 0we derived necessary and sufficient for (local) optimalityin• the absence of constraints
2
the absence of constraints• equality constraints only
-
Minima conditions (unconstrained)let f(w) be continuously differentiablew* is a local minimum of f(w) if and only ifw is a local minimum of f(w) if and only if• f has zero gradient at w*
0*)( =∇ wf• and the Hessian of f at w* is positive definite
0*)( =∇ wf
nt ddwfd ℜ∈∀≥∇ 0*)(2
• where
⎥⎤
⎢⎡ ∂∂ )()(
22
xfxf
ddwfd ℜ∈∀≥∇ ,0)(
⎥⎥⎥⎥
⎢⎢⎢⎢
∂∂
∂∂∂=∇
−
)()(
)(22
1020
2
ff
xxx
xx
xfn
M
L
3
⎥⎥
⎦⎢⎢
⎣ ∂∂
∂∂∂
−−
)()( 2101
xx
fxxx
f
nn
L
-
Maxima conditions (unconstrained)let f(w) be continuously differentiablew* is a local maximum of f(w) if and only ifw is a local maximum of f(w) if and only if• f has zero gradient at w*
0*)( =∇ wf• and the Hessian of f at w* is negative definite
0*)( =∇ wf
nt ddwfd ℜ∈∀≤∇ 0*)(2
• where
⎥⎤
⎢⎡ ∂∂ )()(
22
xfxf
ddwfd ℜ∈∀≤∇ ,0)(
⎥⎥⎥⎥
⎢⎢⎢⎢
∂∂
∂∂∂=∇
−
)()(
)(22
1020
2
ff
xxx
xx
xfn
M
L
4
⎥⎥
⎦⎢⎢
⎣ ∂∂
∂∂∂
−−
)()( 2101
xx
fxxx
f
nn
L
-
Constrained optimization with equality constraints onlyTheorem: consider the problemTheorem: consider the problem
0)()(minarg* == xhxfxx
tosubject
where the constraint gradients ∇hi(x*) are linearly independent. Then x* is a solution if and only if there exits a unique vector λ, such that
0*)(*)() =∇+∇ ∑ xhxfi im
iλ
0*)(,0*)(*)()
0)()()
22
1
=∇∀≥⎥⎦
⎤⎢⎣
⎡∇+∇
∇+∇
∑
∑=
yxhyyxhxfyii
xhxfi
Ti
m
iT
ii
i
s.t.
λ
λ
5
1⎥⎦
⎢⎣
∑=i
-
Alternative formulationstate the conditions through the Lagrangian
m
th th b tl itt
)()(),(1
xhxfxL im
ii∑
=
+= λλ
the theorem can be compactly written as
)*,( * ⎤⎡∇ xL λ
0*)(0)*()
)*,(),(
)
*2
*
=∇∀≥∇
=⎥⎦
⎤⎢⎣
⎡
∇∇
=∇
yxhyyxLyii
xLxL
i
TT
x
s t
0*)L(x*,
λ
λλ
λλ
the entries of λ are referred to as Lagrange multipliers
0)(,0),() =∇∀≥∇ yxhyyxLyii xx s.t. λ
6
-
Geometric interpretationderivative of f along d is
thi th t
))(,cos(.)(.)()()(lim0
wfdwfdwfdwfdwf T ∇∇=∇=−+→ α
αα
this means that• greatest increase when d || ∇f• no increase when d ⊥ ∇f
∇fno increase
no increase when d ⊥ ∇f• since there is no increase when
d is tangent to iso-contour f(x) = kh di i di l• the gradient is perpendicular to
the tangent of the iso-contour
allows geometric interpretation of the Lagrangian
7
g p g gconditions
-
Lagrangian optimizationgeometric interpretation:• since h(x)=0 is a iso-contour of h(x) ∇h(x*) is perpendicular to• since h(x)=0 is a iso-contour of h(x), ∇h(x ) is perpendicular to
the iso-contour• i) says that ∇f (x*) ∈ span{∇hi(x*)}• i.e. ∇f ⊥ to tangent space of the constraint surface• intuitive
• direction of largest increase of
span{∇h(x*)}
direction of largest increase off is ⊥ to constraint surface
• the gradient is zero along theconstraint
tg plane
• no way to give an infinitesimalgradient step, without ending upviolating it
8
h(x)=0• it is impossible to increase f and
still satisfy the constraint
-
Exampleconsider the problem
min x1 + x2 subject to x12 + x22 = 21 2 j 1 2∇f ⊥ to the iso-contours of f (x1 + x2 = k)
⎥⎤
⎢⎡
=∇ 12
)(x
xh
h(x)=02
⎥⎦
⎢⎣
=∇22
)(x
xh
⎥⎤
⎢⎡
=∇1
)(xf
2
⎥⎦
⎢⎣
∇1
)(xf
x1 + x2 = 0
x1 + x2 = 1
9
x1 + x2 = -1
-
Exampleconsider the problem
min x1 + x2 subject to x12 + x22 = 21 2 j 1 2∇h ⊥ to the iso-contour of h (x12 + x22 - 2 = 0)
⎥⎤
⎢⎡
=∇ 12
)(x
xh
h(x)=02
⎥⎦
⎢⎣
=∇22
)(x
xh
⎥⎤
⎢⎡
=∇1
)(xf
2
⎥⎦
⎢⎣
∇1
)(xf
x1 + x2 = 0
x1 + x2 = 1
10
x1 + x2 = -1
-
Examplerecall that derivative along d is
)()( wfdwf +α
- moving along the tangent
))(,cos(.)(.)()(lim0
wfdwfdwfdwf ∇∇=−+→ α
αα
critical pointmoving along the tangentis descent as long as2
0),cos(
-
Alternative viewconsider the tangent space to the iso-contour h(x) = 0this is the subspace of first order feasible variationsthis is the subspace of first order feasible variations
{ }ixxhxxV Ti ∀=∆∇∆= ,0*)(|*)(space of ∆x for which x + ∆x satisfies the constraint up to first order approximation
V(x*) feasible variations
x* ∇h(x*)h(x)=0
12
-
Feasible variationsmultiplying our first Lagrangian condition by ∆x
*)(*)( ∑ hf Tm
T λ
it follows that
0*)(*)(1
=∆∇+∆∇ ∑=
xxhxxf Tii
iT λ
this is a generalization of ∇f(x*)=0 in unconstrained case
*)(,0*)( xVxxxf T ∈∆∀=∆∇
this is a generalization of ∇f(x ) 0 in unconstrained case implies that ∇f(x*) ⊥ V(x*) and therefore ∇f(x*) || ∇h(x*)note:note:• Hessian constraint only defined for y in V(x*)• makes sense: we cannot move anywhere else, does not really
13
matter what Hessian is outside V(x*)
-
Inequality constraintswhat happens when we introduce inequalities?
0)(0)()(minarg* ≤== xgxhxfx tosubject
we start by defining the set of active inequality constraints
0)(,0)()(minarg* ≤== xgxhxfxx
tosubject
for example f(x1 x2) = x12 + x2 -5 ≤ x1 ≤ 5 -5 ≤ x2 ≤ 5
{ }0)(|)( == xgjxA j for example f(x1,x2) x1 x2, 5 ≤ x1 ≤ 5, 5 ≤ x2 ≤ 5
14
-
Active inequality constraintswe have a minimum at x* = (0,-5)• x*1 – 5 < 0 -x*1 - 5 < 0 and x*2 – 5 < 0 are inactivex 1 5 < 0, x 1 5 < 0, and x 2 5 < 0 are inactive• -x*2 – 5 = 0 is active (x*2 = -5)
note that a local minimum for this problem would still be a local minimum if we removed the innactive constraintsminimum if we removed the innactive constraints• innactive constraints do not do anything• active constraints are equalities
innactiveinnactive
15
x*x* active
-
Constrained optimizationhence, the problem
0)(0)()(minarg* ≤== xgxhxfx tosubject
is equivalent to
0)(,0)()(minarg* ≤== xgxhxfxx
tosubject
this is a problem with equality constraints there must be
*)(,0)(,0)()(minarg* xAixgxhxfx ix
∈∀=== tosubject
this is a problem with equality constraints, there must be a λ* and µj*, j ∈ A(x*), such that
0*)(*)(*)( ** ∇∇∇ ∑∑ hfm
λ
which does not change if we assign a zero Lagrange
0*)(*)(*)(*)(
*
1
* =∇+∇+∇ ∑∑∈=
xgxhxf jxAj
jii
i µλ
16
g g g gmultiplier to the innactive constraints
-
Constrained optimizationletting µj* = 0, j ∉ A(x*),
rm
th i fi l t i t hi h i * ≥ 0 f ll j d t
0*)(*)(*)(1
*
1
* =∇+∇+∇ ∑∑==
xgxhxf jr
jji
m
ii µλ
there is one final constraint which is µj* ≥ 0, for all j due to the following picture
• ∇f has to point inward (otherwise we would have a g(x) ≤ 0
∇f
p (maximum of f)
• ∇g has to point outward (otherwise g would increase inward, i.e. g would be non-negative
∇g, g g
inside)
when we put together all these constraints we obtain the famed Karush Kuhn Tucker (KKT) conditions
17
famed Karush-Kuhn-Tucker (KKT) conditions
-
The KKT conditionsTheorem: for the problem
0)(,0)()(minarg* ≤== xgxhxfx tosubject
x* is a local minimum if and only if there exist λ* and µ* such that
x
0*)(*)(*)(
**
1
*
1
* xgxhxfi) jr
jji
m
ii =∇+∇+∇
==∑∑ µλ
0*)()*)(,0),0) **
xhivxAjiiijii
rm
jj
⎤⎡
=
∉∀=∀≥ µµ
( )
{ }*)(,0*)(,0*)(|*)(
*)(,0)()()*1
*
1
*
xAjyxgiyxhyxVwhere
xVyyxgxhxfyv
Tj
Ti
xxj
r
jji
m
ii
T
∈∀=∇∀=∇=
∈∀≥⎥⎦
⎤⎢⎣
⎡∇+∇+∇∇
===∑∑
and
µλ
18
{ })(,0)(,0)(|)( xAjyxgiyxhyxVwhere ji ∈∀∇∀∇ and
-
Geometric interpretationLet’s forget the equality constraints for nowlater we will see that they do not change muchlater we will see that they do not change muchconsider the problem
0)(tosubject)(minarg* ≤= xgxfx
from the KKT conditions, the solution satisfies
0)( tosubject )(minarg* ≤= xgxfxx
[ ]*)(,0) ,0)
0*)*,( ** xAjiiijii
xLi)
jj ∉∀=∀≥
=∇
µµ
µ
with
*)(*)(*)*,( * xgxfxL jr
j∑+= µµ
19
)()(),(1
gf jj
j∑=
µµ
-
Geometric interpretationwhich is equivalent to
[ ] ( )[ ])(*)(min*)(minL* xgxfxL T+== µµ
th h
[ ] ( )[ ]*)(,0 and ,0
)(*)(min*),(minL*** xAjjwith
xgxfxL
jj
xx
∉∀=∀≥
+==
µµ
µµ
we thus have• x = x* ⇒ f(x) + (µ*)Tg(x) – L* = 0• x ≠ x* ⇒ f(x) + (µ*)Tg(x) – L* ≥ 0
plane in z-spacenormal w bias bx ≠ x ⇒ f(x) + (µ ) g(x) L ≥ 0
or• x = x* ⇒ w*Tz - b = 0
normal w, bias b
L is in half-space • x ≠ x* ⇒ w*Tz - b ≥ 0
where⎥⎤
⎢⎡
⎥⎤
⎢⎡ )(1
**xf
Lb
s a spacepointed to by w*
20
⎥⎦
⎢⎣
=⎥⎦
⎢⎣
==)()(
,*
* *,xg
fzwLb
µ
-
Geometric Interpretationfrom
⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡==
)()(
,*
1* *,
xgxf
zwLbµ
we have• since µ*i ≥ 0, w is always in the first quadrant
⎦⎣⎦⎣ )(* xgµ
• since first coordinate is 1, w* is never parallel tog(x) “axis”
this can be visualized in “z space” as f ∈ R
w*
this can be visualized in z-space asalso, two cases: 1) if g(x*) = 0• x = x* ⇒ w*Tz(x*) - b = 0
f ∈ R
(g* f*)=• x = x ⇒ w z(x ) - b = 0⇒ f(x*) = L*
• the y-intercept is (0,L*) = (0, f*)is the minimum of L
w*
admissibleplanes
(g ,f )=
(0,f*)
21
is the minimum of Lg∈ Rr-1
-
Geometric Interpretation
⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡==
)()(
,*
1* *,
xgxf
zwLbµ
case ii) g(x*)
-
In summary[ ] ( )[ ]
*)(,0and,0
)(*)(min*),(minL*** xAjjwith
xgxfxL
jj
T
xx
∉∀=∀≥
+==
µµ
µµ
is equivalent to• x = x* ⇒ w*Tz - b = 0
)(,0 and ,0 xAjjwith jj ∉∀∀≥ µµ
⎥⎤
⎢⎡
⎥⎤
⎢⎡ )(1
**xf
Lb• x ≠ x* ⇒ w*Tz - b ≥ 0
can be visualized as
⎥⎦
⎢⎣
=⎥⎦
⎢⎣
==)(
,*
* *,xg
zwLbµ
f ∈ Rf ∈ R
w*
g(x*)=0 g(x*)
-
Duality
[ ] ( )[ ]*)(0d0
)(*)(min*),(minL*** Ajji h
xgxfxL Txx
∀∀≥
+== µµ
does not appear terribly difficult once I know µ*
*)(,0 and ,0 xAjjwith jj ∉∀=∀≥ µµ
how I do find its value? consider this function for any µ
[ ] [ ])()(min),(min)(q +== µµµ xgxfxL Txx
this is equivalent to
0 ≥µwithxx
• x = x* ⇒ wTz - b = 0• x ≠ x* ⇒ wTz - b ≥ 0 ⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡==
)()(
,1
),(xgxf
zwqbµ
µ
24
the picture is the same with L* replaced by q(µ)
-
Dualitynoting that• hyperplane (w,b) still has to support feasible set• and we still have µ>0
this leads tow
f ∈ Rf ∈ R
w*
g(x*)=0 g(x*)
-
Dualitynote that • q(µ) ≤ L* = f*• if we keep increasing q(µ) we will get q(µ) = L*• we cannot go beyond L* (x* would move to g(x*) > 0)
this is exactly the definition of the dual problemthis is exactly the definition of the dual problem
[ ] [ ])()(min),(min)(q +== µµµ xgxfxL Txx)(q max0 µµ≥
note:• q(µ) may go to -∞ for some µ, which means that there is no
Lagrange multiplier (plane would be vertical)
0 ≥µwith
Lagrange multiplier (plane would be vertical). • this is avoided by introducing the constraint
{ }−∞>=∈ )(| µµµ qD
26
{ }−∞>=∈ )(| µµµ qDq
-
Dualitywe therefore have a two step recipe to find the optimal solution1 f l1.for any µ, solve
[ ] [ ])()(min),(min)(q xgxfxL Txx
µµµ +==
2.then solve
xx
)(qmax µ
one of the reasons why this is interesting is that the d bl t t t b it bl
)(q max,0
µµµ qD∈≥
second problem turns out to be quite manageable
27
-
DualityTheorem: Dq is a convex set and q is concave on DqProof:• for any x, µ, µ and α ∈ [0,1]
( ) ( )µααµµααµ )()1()()1(, xgxfxLT
T−++=−+
[ ] [ ]( ) ( )
µαµα
µααµ
)(
)()()1()()(
)()1()()(
xgxfxgxf
xgxgxfTT
TT
+−++=
−++=
• and taking min on both sides( ) ( )µαµα ,)1(, xLxL −+=
( ) ( ) ( )[ ]µαµαµααµ ,)1(,min )1(,min xLxLxL −+=−+
• we have
( ) ( ) ( )[ ]( ) ( )µαµα ,min)1(,min xLxL
xx
xx
−+≥
( ) ( )28
( ) ( ) ( )µαµαµααµ qqq )1()1( −+≥−+
-
Duality• we have
( ) ( ) ( )µαµαµααµ qqq )1()1( −+≥−+ (*)• from which two conclusions follow
• if µ ∈ Dq and µ ∈ Dq ⇒ q(µ)>-∞, q(µ)>-∞ ⇒ αµ + (1-α)µ ∈ DqHence Dq is convexe ce q s co e
• by definition of concavity (*) implies that q is concave over Dq
this is a very appealing result, since convex optimization bl th i t t lproblems are among the easiest to solve
note that the dual is always concave, irrespective of the primal problemprimal problemthe following result only proves what we already have inferred from the graphical interpretation
29
-
DualityTheorem: (weak duality) it is always true that
** fq ≤Proof:• for any µ ≥ 0, and x with g(x) ≤ 0,
** fq ≤
since µ g(x) ≤ 0 Hence
( ) )()()(),(min xfxgxfzLqj
jjz≤+≤= ∑µµµ
since µjgj(x) ≤ 0. Hence
)()(max*0
xfqq ≤=≥
µµ
and since this holds for any x
)(min*0)(,
xfqxgx ≤
≤
30
-
Duality gapif q* = f* we say that there is no duality gap. Otherwise there is a duality gap.y g pthe DG constrains the existence of Lagrange multipliersTheorem:• if there is no duality gap, the set of Lagrange multipliers is the set
of optimal dual solutionsif th i d lit th L lti li• if there is a duality gap, there are no Lagrange multipliers
Proof:• by definition µ* ≥ 0 is a Lagrange multiplier if and only if• by definition, µ ≥ 0 is a Lagrange multiplier if and only if
• which, from the previous theorem, holds if and only if q* = f*, i.e. if there is no duality gap
**)(* qqf ≤= µ
31
there is no duality gap
-
Duality gapnote that there are situations in which the dual problem has a solution, but for which there is no Lagrange , g gmultiplier.for example:
f ∈ R
• this is a valid dual problem• however the constraint
*g( *) 0
(0,f*)
µi*g(xi*) = 0
is not satisfied andq* ≠ f* g∈ Rr-1g(xi*)
-
33