The Karush-Kuhn-Tucker conditi d d litditions and...

The Karush-Kuhn-Tucker diti d d litconditions and duality

Nuno Vasconcelos ECE Department, UCSDp ,

Optimizationgoal: find maximum or minimum of a functionDefinition: given functions f g i=1 k and h i=1 mDefinition: given functions f, gi, i=1,...,k and hi, i=1,...mdefined on some domain Ω ∈ Rn

wwf Ω∈ ),( min

iwhiwg

f

i

i

∀=∀≤

,0)( ,0)( subject to

)(

for compactness we write g(w) ≤ 0 instead of gi(w) ≤ 0, ∀i. Similarly h(w) = 0we derived necessary and sufficient for (local) optimalityin• the absence of constraints

2

the absence of constraints• equality constraints only

Minima conditions (unconstrained)let f(w) be continuously differentiablew* is a local minimum of f(w) if and only ifw is a local minimum of f(w) if and only if• f has zero gradient at w*

0*)( =∇ wf• and the Hessian of f at w* is positive definite

0*)( =∇ wf

nt ddwfd ℜ∈∀≥∇ 0*)(2

• where

⎥⎤

⎢⎡ ∂∂ )()(

22

xfxf

ddwfd ℜ∈∀≥∇ ,0)(

⎥⎥⎥⎥

⎢⎢⎢⎢

∂∂

∂∂∂=∇

−

)()(

)(22

1020

2

ff

xxx

xx

xfn

M

L

3

⎥⎥

⎦⎢⎢

⎣ ∂∂

∂∂∂

−−

)()( 2101

xx

fxxx

f

nn

L

Maxima conditions (unconstrained)let f(w) be continuously differentiablew* is a local maximum of f(w) if and only ifw is a local maximum of f(w) if and only if• f has zero gradient at w*

0*)( =∇ wf• and the Hessian of f at w* is negative definite

0*)( =∇ wf

nt ddwfd ℜ∈∀≤∇ 0*)(2

• where

⎥⎤

⎢⎡ ∂∂ )()(

22

xfxf

ddwfd ℜ∈∀≤∇ ,0)(

⎥⎥⎥⎥

⎢⎢⎢⎢

∂∂

∂∂∂=∇

−

)()(

)(22

1020

2

ff

xxx

xx

xfn

M

L

4

⎥⎥

⎦⎢⎢

⎣ ∂∂

∂∂∂

−−

)()( 2101

xx

fxxx

f

nn

L

Constrained optimization with equality constraints onlyTheorem: consider the problemTheorem: consider the problem

0)()(minarg* == xhxfxx

tosubject

where the constraint gradients ∇hi(x*) are linearly independent. Then x* is a solution if and only if there exits a unique vector λ, such that

0*)(*)() =∇+∇ ∑ xhxfi im

iλ

0*)(,0*)(*)()

0)()()

22

1

=∇∀≥⎥⎦

⎤⎢⎣

⎡∇+∇

∇+∇

∑

∑=

yxhyyxhxfyii

xhxfi

Ti

m

iT

ii

i

s.t.

λ

λ

5

1⎥⎦

⎢⎣

∑=i

Alternative formulationstate the conditions through the Lagrangian

m

th th b tl itt

)()(),(1

xhxfxL im

ii∑

=

+= λλ

the theorem can be compactly written as

)*,( * ⎤⎡∇ xL λ

0*)(0)*()

)*,(),(

)

*2

*

=∇∀≥∇

=⎥⎦

⎤⎢⎣

⎡

∇∇

=∇

yxhyyxLyii

xLxL

i

TT

x

s t

0*)L(x*,

λ

λλ

λλ

the entries of λ are referred to as Lagrange multipliers

0)(,0),() =∇∀≥∇ yxhyyxLyii xx s.t. λ

6

Geometric interpretationderivative of f along d is

thi th t

))(,cos(.)(.)()()(lim0

wfdwfdwfdwfdwf T ∇∇=∇=−+→ α

αα

this means that• greatest increase when d || ∇f• no increase when d ⊥ ∇f

∇fno increase

no increase when d ⊥ ∇f• since there is no increase when

d is tangent to iso-contour f(x) = kh di i di l• the gradient is perpendicular to

the tangent of the iso-contour

allows geometric interpretation of the Lagrangian

7

g p g gconditions

Lagrangian optimizationgeometric interpretation:• since h(x)=0 is a iso-contour of h(x) ∇h(x*) is perpendicular to• since h(x)=0 is a iso-contour of h(x), ∇h(x ) is perpendicular to

the iso-contour• i) says that ∇f (x*) ∈ span{∇hi(x*)}• i.e. ∇f ⊥ to tangent space of the constraint surface• intuitive

• direction of largest increase of

span{∇h(x*)}

direction of largest increase off is ⊥ to constraint surface

• the gradient is zero along theconstraint

tg plane

• no way to give an infinitesimalgradient step, without ending upviolating it

8

h(x)=0• it is impossible to increase f and

still satisfy the constraint

Exampleconsider the problem

min x1 + x2 subject to x12 + x22 = 21 2 j 1 2∇f ⊥ to the iso-contours of f (x1 + x2 = k)

⎥⎤

⎢⎡

=∇ 12

)(x

xh

h(x)=02

⎥⎦

⎢⎣

=∇22

)(x

xh

⎥⎤

⎢⎡

=∇1

)(xf

2

⎥⎦

⎢⎣

∇1

)(xf

x1 + x2 = 0

x1 + x2 = 1

9

x1 + x2 = -1

Exampleconsider the problem

min x1 + x2 subject to x12 + x22 = 21 2 j 1 2∇h ⊥ to the iso-contour of h (x12 + x22 - 2 = 0)

⎥⎤

⎢⎡

=∇ 12

)(x

xh

h(x)=02

⎥⎦

⎢⎣

=∇22

)(x

xh

⎥⎤

⎢⎡

=∇1

)(xf

2

⎥⎦

⎢⎣

∇1

)(xf

x1 + x2 = 0

x1 + x2 = 1

10

x1 + x2 = -1

Examplerecall that derivative along d is

)()( wfdwf +α

- moving along the tangent

))(,cos(.)(.)()(lim0

wfdwfdwfdwf ∇∇=−+→ α

αα

critical pointmoving along the tangentis descent as long as2

0),cos(

Alternative viewconsider the tangent space to the iso-contour h(x) = 0this is the subspace of first order feasible variationsthis is the subspace of first order feasible variations

{ }ixxhxxV Ti ∀=∆∇∆= ,0*)(|*)(space of ∆x for which x + ∆x satisfies the constraint up to first order approximation

V(x*) feasible variations

x* ∇h(x*)h(x)=0

12

Feasible variationsmultiplying our first Lagrangian condition by ∆x

*)(*)( ∑ hf Tm

T λ

it follows that

0*)(*)(1

=∆∇+∆∇ ∑=

xxhxxf Tii

iT λ

this is a generalization of ∇f(x*)=0 in unconstrained case

*)(,0*)( xVxxxf T ∈∆∀=∆∇

this is a generalization of ∇f(x ) 0 in unconstrained case implies that ∇f(x*) ⊥ V(x*) and therefore ∇f(x*) || ∇h(x*)note:note:• Hessian constraint only defined for y in V(x*)• makes sense: we cannot move anywhere else, does not really

13

matter what Hessian is outside V(x*)

Inequality constraintswhat happens when we introduce inequalities?

0)(0)()(minarg* ≤== xgxhxfx tosubject

we start by defining the set of active inequality constraints

0)(,0)()(minarg* ≤== xgxhxfxx

tosubject

for example f(x1 x2) = x12 + x2 -5 ≤ x1 ≤ 5 -5 ≤ x2 ≤ 5

{ }0)(|)( == xgjxA j for example f(x1,x2) x1 x2, 5 ≤ x1 ≤ 5, 5 ≤ x2 ≤ 5

14

Active inequality constraintswe have a minimum at x* = (0,-5)• x*1 – 5 < 0 -x*1 - 5 < 0 and x*2 – 5 < 0 are inactivex 1 5 < 0, x 1 5 < 0, and x 2 5 < 0 are inactive• -x*2 – 5 = 0 is active (x*2 = -5)

note that a local minimum for this problem would still be a local minimum if we removed the innactive constraintsminimum if we removed the innactive constraints• innactive constraints do not do anything• active constraints are equalities

innactiveinnactive

15

x*x* active

Constrained optimizationhence, the problem

0)(0)()(minarg* ≤== xgxhxfx tosubject

is equivalent to

0)(,0)()(minarg* ≤== xgxhxfxx

tosubject

this is a problem with equality constraints there must be

*)(,0)(,0)()(minarg* xAixgxhxfx ix

∈∀=== tosubject

this is a problem with equality constraints, there must be a λ* and µj*, j ∈ A(x*), such that

0*)(*)(*)( ** ∇∇∇ ∑∑ hfm

λ

which does not change if we assign a zero Lagrange

0*)(*)(*)(*)(

*

1

* =∇+∇+∇ ∑∑∈=

xgxhxf jxAj

jii

i µλ

16

g g g gmultiplier to the innactive constraints

Constrained optimizationletting µj* = 0, j ∉ A(x*),

rm

th i fi l t i t hi h i * ≥ 0 f ll j d t

0*)(*)(*)(1

*

1

* =∇+∇+∇ ∑∑==

xgxhxf jr

jji

m

ii µλ

there is one final constraint which is µj* ≥ 0, for all j due to the following picture

• ∇f has to point inward (otherwise we would have a g(x) ≤ 0

∇f

p (maximum of f)

• ∇g has to point outward (otherwise g would increase inward, i.e. g would be non-negative

∇g, g g

inside)

when we put together all these constraints we obtain the famed Karush Kuhn Tucker (KKT) conditions

17

famed Karush-Kuhn-Tucker (KKT) conditions

The KKT conditionsTheorem: for the problem

0)(,0)()(minarg* ≤== xgxhxfx tosubject

x* is a local minimum if and only if there exist λ* and µ* such that

x

0*)(*)(*)(

**

1

*

1

* xgxhxfi) jr

jji

m

ii =∇+∇+∇

==∑∑ µλ

0*)()*)(,0),0) **

xhivxAjiiijii

rm

jj

⎤⎡

=

∉∀=∀≥ µµ

( )

{ }*)(,0*)(,0*)(|*)(

*)(,0)()()*1

*

1

*

xAjyxgiyxhyxVwhere

xVyyxgxhxfyv

Tj

Ti

xxj

r

jji

m

ii

T

∈∀=∇∀=∇=

∈∀≥⎥⎦

⎤⎢⎣

⎡∇+∇+∇∇

===∑∑

and

µλ

18

{ })(,0)(,0)(|)( xAjyxgiyxhyxVwhere ji ∈∀∇∀∇ and

Geometric interpretationLet’s forget the equality constraints for nowlater we will see that they do not change muchlater we will see that they do not change muchconsider the problem

0)(tosubject)(minarg* ≤= xgxfx

from the KKT conditions, the solution satisfies

0)( tosubject )(minarg* ≤= xgxfxx

[ ]*)(,0) ,0)

0*)*,( ** xAjiiijii

xLi)

jj ∉∀=∀≥

=∇

µµ

µ

with

*)(*)(*)*,( * xgxfxL jr

j∑+= µµ

19

)()(),(1

gf jj

j∑=

µµ

Geometric interpretationwhich is equivalent to

[ ] ( )[ ])(*)(min*)(minL* xgxfxL T+== µµ

th h

[ ] ( )[ ]*)(,0 and ,0

)(*)(min*),(minL*** xAjjwith

xgxfxL

jj

xx

∉∀=∀≥

+==

µµ

µµ

we thus have• x = x* ⇒ f(x) + (µ*)Tg(x) – L* = 0• x ≠ x* ⇒ f(x) + (µ*)Tg(x) – L* ≥ 0

plane in z-spacenormal w bias bx ≠ x ⇒ f(x) + (µ ) g(x) L ≥ 0

or• x = x* ⇒ w*Tz - b = 0

normal w, bias b

L is in half-space • x ≠ x* ⇒ w*Tz - b ≥ 0

where⎥⎤

⎢⎡

⎥⎤

⎢⎡ )(1

**xf

Lb

s a spacepointed to by w*

20

⎥⎦

⎢⎣

=⎥⎦

⎢⎣

==)()(

,*

* *,xg

fzwLb

µ

Geometric Interpretationfrom

⎥⎦

⎤⎢⎣

⎡=⎥

⎦

⎤⎢⎣

⎡==

)()(

,*

1* *,

xgxf

zwLbµ

we have• since µ*i ≥ 0, w is always in the first quadrant

⎦⎣⎦⎣ )(* xgµ

• since first coordinate is 1, w* is never parallel tog(x) “axis”

this can be visualized in “z space” as f ∈ R

w*

this can be visualized in z-space asalso, two cases: 1) if g(x*) = 0• x = x* ⇒ w*Tz(x*) - b = 0

f ∈ R

(g* f*)=• x = x ⇒ w z(x ) - b = 0⇒ f(x*) = L*

• the y-intercept is (0,L*) = (0, f*)is the minimum of L

w*

admissibleplanes

(g ,f )=

(0,f*)

21

is the minimum of Lg∈ Rr-1

Geometric Interpretation

⎥⎦

⎤⎢⎣

⎡=⎥

⎦

⎤⎢⎣

⎡==

)()(

,*

1* *,

xgxf

zwLbµ

case ii) g(x*)

In summary[ ] ( )[ ]

*)(,0and,0

)(*)(min*),(minL*** xAjjwith

xgxfxL

jj

T

xx

∉∀=∀≥

+==

µµ

µµ

is equivalent to• x = x* ⇒ w*Tz - b = 0

)(,0 and ,0 xAjjwith jj ∉∀∀≥ µµ

⎥⎤

⎢⎡

⎥⎤

⎢⎡ )(1

**xf

Lb• x ≠ x* ⇒ w*Tz - b ≥ 0

can be visualized as

⎥⎦

⎢⎣

=⎥⎦

⎢⎣

==)(

,*

* *,xg

zwLbµ

f ∈ Rf ∈ R

w*

g(x*)=0 g(x*)

Duality

[ ] ( )[ ]*)(0d0

)(*)(min*),(minL*** Ajji h

xgxfxL Txx

∀∀≥

+== µµ

does not appear terribly difficult once I know µ*

*)(,0 and ,0 xAjjwith jj ∉∀=∀≥ µµ

how I do find its value? consider this function for any µ

[ ] [ ])()(min),(min)(q +== µµµ xgxfxL Txx

this is equivalent to

0 ≥µwithxx

• x = x* ⇒ wTz - b = 0• x ≠ x* ⇒ wTz - b ≥ 0 ⎥⎦

⎤⎢⎣

⎡=⎥

⎦

⎤⎢⎣

⎡==

)()(

,1

),(xgxf

zwqbµ

µ

24

the picture is the same with L* replaced by q(µ)

Dualitynoting that• hyperplane (w,b) still has to support feasible set• and we still have µ>0

this leads tow

f ∈ Rf ∈ R

w*

g(x*)=0 g(x*)

Dualitynote that • q(µ) ≤ L* = f*• if we keep increasing q(µ) we will get q(µ) = L*• we cannot go beyond L* (x* would move to g(x*) > 0)

this is exactly the definition of the dual problemthis is exactly the definition of the dual problem

[ ] [ ])()(min),(min)(q +== µµµ xgxfxL Txx)(q max0 µµ≥

note:• q(µ) may go to -∞ for some µ, which means that there is no

Lagrange multiplier (plane would be vertical)

0 ≥µwith

Lagrange multiplier (plane would be vertical). • this is avoided by introducing the constraint

{ }−∞>=∈ )(| µµµ qD

26

{ }−∞>=∈ )(| µµµ qDq

Dualitywe therefore have a two step recipe to find the optimal solution1 f l1.for any µ, solve

[ ] [ ])()(min),(min)(q xgxfxL Txx

µµµ +==

2.then solve

xx

)(qmax µ

one of the reasons why this is interesting is that the d bl t t t b it bl

)(q max,0

µµµ qD∈≥

second problem turns out to be quite manageable

27

DualityTheorem: Dq is a convex set and q is concave on DqProof:• for any x, µ, µ and α ∈ [0,1]

( ) ( )µααµµααµ )()1()()1(, xgxfxLT

T−++=−+

[ ] [ ]( ) ( )

µαµα

µααµ

)(

)()()1()()(

)()1()()(

xgxfxgxf

xgxgxfTT

TT

+−++=

−++=

• and taking min on both sides( ) ( )µαµα ,)1(, xLxL −+=

( ) ( ) ( )[ ]µαµαµααµ ,)1(,min )1(,min xLxLxL −+=−+

• we have

( ) ( ) ( )[ ]( ) ( )µαµα ,min)1(,min xLxL

xx

xx

−+≥

( ) ( )28

( ) ( ) ( )µαµαµααµ qqq )1()1( −+≥−+

Duality• we have

( ) ( ) ( )µαµαµααµ qqq )1()1( −+≥−+ (*)• from which two conclusions follow

• if µ ∈ Dq and µ ∈ Dq ⇒ q(µ)>-∞, q(µ)>-∞ ⇒ αµ + (1-α)µ ∈ DqHence Dq is convexe ce q s co e

• by definition of concavity (*) implies that q is concave over Dq

this is a very appealing result, since convex optimization bl th i t t lproblems are among the easiest to solve

note that the dual is always concave, irrespective of the primal problemprimal problemthe following result only proves what we already have inferred from the graphical interpretation

29

DualityTheorem: (weak duality) it is always true that

** fq ≤Proof:• for any µ ≥ 0, and x with g(x) ≤ 0,

** fq ≤

since µ g(x) ≤ 0 Hence

( ) )()()(),(min xfxgxfzLqj

jjz≤+≤= ∑µµµ

since µjgj(x) ≤ 0. Hence

)()(max*0

xfqq ≤=≥

µµ

and since this holds for any x

)(min*0)(,

xfqxgx ≤

≤

30

Duality gapif q* = f* we say that there is no duality gap. Otherwise there is a duality gap.y g pthe DG constrains the existence of Lagrange multipliersTheorem:• if there is no duality gap, the set of Lagrange multipliers is the set

of optimal dual solutionsif th i d lit th L lti li• if there is a duality gap, there are no Lagrange multipliers

Proof:• by definition µ* ≥ 0 is a Lagrange multiplier if and only if• by definition, µ ≥ 0 is a Lagrange multiplier if and only if

• which, from the previous theorem, holds if and only if q* = f*, i.e. if there is no duality gap

**)(* qqf ≤= µ

31

there is no duality gap

Duality gapnote that there are situations in which the dual problem has a solution, but for which there is no Lagrange , g gmultiplier.for example:

f ∈ R

• this is a valid dual problem• however the constraint

*g( *) 0

(0,f*)

µi*g(xi*) = 0

is not satisfied andq* ≠ f* g∈ Rr-1g(xi*)

The Karush-Kuhn-Tucker conditi d d litditions and...

Documents

Transcript of The Karush-Kuhn-Tucker conditi d d litditions and...