The Karush-Kuhn-Tucker conditi d d litditions and...

33
The Karush-Kuhn-Tucker diti dd lit conditions and duality Nuno Vasconcelos ECE Department, UCSD

Transcript of The Karush-Kuhn-Tucker conditi d d litditions and...

  • The Karush-Kuhn-Tucker diti d d litconditions and duality

    Nuno Vasconcelos ECE Department, UCSDp ,

  • Optimizationgoal: find maximum or minimum of a functionDefinition: given functions f g i=1 k and h i=1 mDefinition: given functions f, gi, i=1,...,k and hi, i=1,...mdefined on some domain Ω ∈ Rn

    wwf Ω∈ ),( min

    iwhiwg

    f

    i

    i

    ∀=∀≤

    ,0)( ,0)( subject to

    )(

    for compactness we write g(w) ≤ 0 instead of gi(w) ≤ 0, ∀i. Similarly h(w) = 0we derived necessary and sufficient for (local) optimalityin• the absence of constraints

    2

    the absence of constraints• equality constraints only

  • Minima conditions (unconstrained)let f(w) be continuously differentiablew* is a local minimum of f(w) if and only ifw is a local minimum of f(w) if and only if• f has zero gradient at w*

    0*)( =∇ wf• and the Hessian of f at w* is positive definite

    0*)( =∇ wf

    nt ddwfd ℜ∈∀≥∇ 0*)(2

    • where

    ⎥⎤

    ⎢⎡ ∂∂ )()(

    22

    xfxf

    ddwfd ℜ∈∀≥∇ ,0)(

    ⎥⎥⎥⎥

    ⎢⎢⎢⎢

    ∂∂

    ∂∂∂=∇

    )()(

    )(22

    1020

    2

    ff

    xxx

    xx

    xfn

    M

    L

    3

    ⎥⎥

    ⎦⎢⎢

    ⎣ ∂∂

    ∂∂∂

    −−

    )()( 2101

    xx

    fxxx

    f

    nn

    L

  • Maxima conditions (unconstrained)let f(w) be continuously differentiablew* is a local maximum of f(w) if and only ifw is a local maximum of f(w) if and only if• f has zero gradient at w*

    0*)( =∇ wf• and the Hessian of f at w* is negative definite

    0*)( =∇ wf

    nt ddwfd ℜ∈∀≤∇ 0*)(2

    • where

    ⎥⎤

    ⎢⎡ ∂∂ )()(

    22

    xfxf

    ddwfd ℜ∈∀≤∇ ,0)(

    ⎥⎥⎥⎥

    ⎢⎢⎢⎢

    ∂∂

    ∂∂∂=∇

    )()(

    )(22

    1020

    2

    ff

    xxx

    xx

    xfn

    M

    L

    4

    ⎥⎥

    ⎦⎢⎢

    ⎣ ∂∂

    ∂∂∂

    −−

    )()( 2101

    xx

    fxxx

    f

    nn

    L

  • Constrained optimization with equality constraints onlyTheorem: consider the problemTheorem: consider the problem

    0)()(minarg* == xhxfxx

    tosubject

    where the constraint gradients ∇hi(x*) are linearly independent. Then x* is a solution if and only if there exits a unique vector λ, such that

    0*)(*)() =∇+∇ ∑ xhxfi im

    0*)(,0*)(*)()

    0)()()

    22

    1

    =∇∀≥⎥⎦

    ⎤⎢⎣

    ⎡∇+∇

    ∇+∇

    ∑=

    yxhyyxhxfyii

    xhxfi

    Ti

    m

    iT

    ii

    i

    s.t.

    λ

    λ

    5

    1⎥⎦

    ⎢⎣

    ∑=i

  • Alternative formulationstate the conditions through the Lagrangian

    m

    th th b tl itt

    )()(),(1

    xhxfxL im

    ii∑

    =

    += λλ

    the theorem can be compactly written as

    )*,( * ⎤⎡∇ xL λ

    0*)(0)*()

    )*,(),(

    )

    *2

    *

    =∇∀≥∇

    =⎥⎦

    ⎤⎢⎣

    ∇∇

    =∇

    yxhyyxLyii

    xLxL

    i

    TT

    x

    s t

    0*)L(x*,

    λ

    λλ

    λλ

    the entries of λ are referred to as Lagrange multipliers

    0)(,0),() =∇∀≥∇ yxhyyxLyii xx s.t. λ

    6

  • Geometric interpretationderivative of f along d is

    thi th t

    ))(,cos(.)(.)()()(lim0

    wfdwfdwfdwfdwf T ∇∇=∇=−+→ α

    αα

    this means that• greatest increase when d || ∇f• no increase when d ⊥ ∇f

    ∇fno increase

    no increase when d ⊥ ∇f• since there is no increase when

    d is tangent to iso-contour f(x) = kh di i di l• the gradient is perpendicular to

    the tangent of the iso-contour

    allows geometric interpretation of the Lagrangian

    7

    g p g gconditions

  • Lagrangian optimizationgeometric interpretation:• since h(x)=0 is a iso-contour of h(x) ∇h(x*) is perpendicular to• since h(x)=0 is a iso-contour of h(x), ∇h(x ) is perpendicular to

    the iso-contour• i) says that ∇f (x*) ∈ span{∇hi(x*)}• i.e. ∇f ⊥ to tangent space of the constraint surface• intuitive

    • direction of largest increase of

    span{∇h(x*)}

    direction of largest increase off is ⊥ to constraint surface

    • the gradient is zero along theconstraint

    tg plane

    • no way to give an infinitesimalgradient step, without ending upviolating it

    8

    h(x)=0• it is impossible to increase f and

    still satisfy the constraint

  • Exampleconsider the problem

    min x1 + x2 subject to x12 + x22 = 21 2 j 1 2∇f ⊥ to the iso-contours of f (x1 + x2 = k)

    ⎥⎤

    ⎢⎡

    =∇ 12

    )(x

    xh

    h(x)=02

    ⎥⎦

    ⎢⎣

    =∇22

    )(x

    xh

    ⎥⎤

    ⎢⎡

    =∇1

    )(xf

    2

    ⎥⎦

    ⎢⎣

    ∇1

    )(xf

    x1 + x2 = 0

    x1 + x2 = 1

    9

    x1 + x2 = -1

  • Exampleconsider the problem

    min x1 + x2 subject to x12 + x22 = 21 2 j 1 2∇h ⊥ to the iso-contour of h (x12 + x22 - 2 = 0)

    ⎥⎤

    ⎢⎡

    =∇ 12

    )(x

    xh

    h(x)=02

    ⎥⎦

    ⎢⎣

    =∇22

    )(x

    xh

    ⎥⎤

    ⎢⎡

    =∇1

    )(xf

    2

    ⎥⎦

    ⎢⎣

    ∇1

    )(xf

    x1 + x2 = 0

    x1 + x2 = 1

    10

    x1 + x2 = -1

  • Examplerecall that derivative along d is

    )()( wfdwf +α

    - moving along the tangent

    ))(,cos(.)(.)()(lim0

    wfdwfdwfdwf ∇∇=−+→ α

    αα

    critical pointmoving along the tangentis descent as long as2

    0),cos(

  • Alternative viewconsider the tangent space to the iso-contour h(x) = 0this is the subspace of first order feasible variationsthis is the subspace of first order feasible variations

    { }ixxhxxV Ti ∀=∆∇∆= ,0*)(|*)(space of ∆x for which x + ∆x satisfies the constraint up to first order approximation

    V(x*) feasible variations

    x* ∇h(x*)h(x)=0

    12

  • Feasible variationsmultiplying our first Lagrangian condition by ∆x

    *)(*)( ∑ hf Tm

    T λ

    it follows that

    0*)(*)(1

    =∆∇+∆∇ ∑=

    xxhxxf Tii

    iT λ

    this is a generalization of ∇f(x*)=0 in unconstrained case

    *)(,0*)( xVxxxf T ∈∆∀=∆∇

    this is a generalization of ∇f(x ) 0 in unconstrained case implies that ∇f(x*) ⊥ V(x*) and therefore ∇f(x*) || ∇h(x*)note:note:• Hessian constraint only defined for y in V(x*)• makes sense: we cannot move anywhere else, does not really

    13

    matter what Hessian is outside V(x*)

  • Inequality constraintswhat happens when we introduce inequalities?

    0)(0)()(minarg* ≤== xgxhxfx tosubject

    we start by defining the set of active inequality constraints

    0)(,0)()(minarg* ≤== xgxhxfxx

    tosubject

    for example f(x1 x2) = x12 + x2 -5 ≤ x1 ≤ 5 -5 ≤ x2 ≤ 5

    { }0)(|)( == xgjxA j for example f(x1,x2) x1 x2, 5 ≤ x1 ≤ 5, 5 ≤ x2 ≤ 5

    14

  • Active inequality constraintswe have a minimum at x* = (0,-5)• x*1 – 5 < 0 -x*1 - 5 < 0 and x*2 – 5 < 0 are inactivex 1 5 < 0, x 1 5 < 0, and x 2 5 < 0 are inactive• -x*2 – 5 = 0 is active (x*2 = -5)

    note that a local minimum for this problem would still be a local minimum if we removed the innactive constraintsminimum if we removed the innactive constraints• innactive constraints do not do anything• active constraints are equalities

    innactiveinnactive

    15

    x*x* active

  • Constrained optimizationhence, the problem

    0)(0)()(minarg* ≤== xgxhxfx tosubject

    is equivalent to

    0)(,0)()(minarg* ≤== xgxhxfxx

    tosubject

    this is a problem with equality constraints there must be

    *)(,0)(,0)()(minarg* xAixgxhxfx ix

    ∈∀=== tosubject

    this is a problem with equality constraints, there must be a λ* and µj*, j ∈ A(x*), such that

    0*)(*)(*)( ** ∇∇∇ ∑∑ hfm

    λ

    which does not change if we assign a zero Lagrange

    0*)(*)(*)(*)(

    *

    1

    * =∇+∇+∇ ∑∑∈=

    xgxhxf jxAj

    jii

    i µλ

    16

    g g g gmultiplier to the innactive constraints

  • Constrained optimizationletting µj* = 0, j ∉ A(x*),

    rm

    th i fi l t i t hi h i * ≥ 0 f ll j d t

    0*)(*)(*)(1

    *

    1

    * =∇+∇+∇ ∑∑==

    xgxhxf jr

    jji

    m

    ii µλ

    there is one final constraint which is µj* ≥ 0, for all j due to the following picture

    • ∇f has to point inward (otherwise we would have a g(x) ≤ 0

    ∇f

    p (maximum of f)

    • ∇g has to point outward (otherwise g would increase inward, i.e. g would be non-negative

    ∇g, g g

    inside)

    when we put together all these constraints we obtain the famed Karush Kuhn Tucker (KKT) conditions

    17

    famed Karush-Kuhn-Tucker (KKT) conditions

  • The KKT conditionsTheorem: for the problem

    0)(,0)()(minarg* ≤== xgxhxfx tosubject

    x* is a local minimum if and only if there exist λ* and µ* such that

    x

    0*)(*)(*)(

    **

    1

    *

    1

    * xgxhxfi) jr

    jji

    m

    ii =∇+∇+∇

    ==∑∑ µλ

    0*)()*)(,0),0) **

    xhivxAjiiijii

    rm

    jj

    ⎤⎡

    =

    ∉∀=∀≥ µµ

    ( )

    { }*)(,0*)(,0*)(|*)(

    *)(,0)()()*1

    *

    1

    *

    xAjyxgiyxhyxVwhere

    xVyyxgxhxfyv

    Tj

    Ti

    xxj

    r

    jji

    m

    ii

    T

    ∈∀=∇∀=∇=

    ∈∀≥⎥⎦

    ⎤⎢⎣

    ⎡∇+∇+∇∇

    ===∑∑

    and

    µλ

    18

    { })(,0)(,0)(|)( xAjyxgiyxhyxVwhere ji ∈∀∇∀∇ and

  • Geometric interpretationLet’s forget the equality constraints for nowlater we will see that they do not change muchlater we will see that they do not change muchconsider the problem

    0)(tosubject)(minarg* ≤= xgxfx

    from the KKT conditions, the solution satisfies

    0)( tosubject )(minarg* ≤= xgxfxx

    [ ]*)(,0) ,0)

    0*)*,( ** xAjiiijii

    xLi)

    jj ∉∀=∀≥

    =∇

    µµ

    µ

    with

    *)(*)(*)*,( * xgxfxL jr

    j∑+= µµ

    19

    )()(),(1

    gf jj

    j∑=

    µµ

  • Geometric interpretationwhich is equivalent to

    [ ] ( )[ ])(*)(min*)(minL* xgxfxL T+== µµ

    th h

    [ ] ( )[ ]*)(,0 and ,0

    )(*)(min*),(minL*** xAjjwith

    xgxfxL

    jj

    xx

    ∉∀=∀≥

    +==

    µµ

    µµ

    we thus have• x = x* ⇒ f(x) + (µ*)Tg(x) – L* = 0• x ≠ x* ⇒ f(x) + (µ*)Tg(x) – L* ≥ 0

    plane in z-spacenormal w bias bx ≠ x ⇒ f(x) + (µ ) g(x) L ≥ 0

    or• x = x* ⇒ w*Tz - b = 0

    normal w, bias b

    L is in half-space • x ≠ x* ⇒ w*Tz - b ≥ 0

    where⎥⎤

    ⎢⎡

    ⎥⎤

    ⎢⎡ )(1

    **xf

    Lb

    s a spacepointed to by w*

    20

    ⎥⎦

    ⎢⎣

    =⎥⎦

    ⎢⎣

    ==)()(

    ,*

    * *,xg

    fzwLb

    µ

  • Geometric Interpretationfrom

    ⎥⎦

    ⎤⎢⎣

    ⎡=⎥

    ⎤⎢⎣

    ⎡==

    )()(

    ,*

    1* *,

    xgxf

    zwLbµ

    we have• since µ*i ≥ 0, w is always in the first quadrant

    ⎦⎣⎦⎣ )(* xgµ

    • since first coordinate is 1, w* is never parallel tog(x) “axis”

    this can be visualized in “z space” as f ∈ R

    w*

    this can be visualized in z-space asalso, two cases: 1) if g(x*) = 0• x = x* ⇒ w*Tz(x*) - b = 0

    f ∈ R

    (g* f*)=• x = x ⇒ w z(x ) - b = 0⇒ f(x*) = L*

    • the y-intercept is (0,L*) = (0, f*)is the minimum of L

    w*

    admissibleplanes

    (g ,f )=

    (0,f*)

    21

    is the minimum of Lg∈ Rr-1

  • Geometric Interpretation

    ⎥⎦

    ⎤⎢⎣

    ⎡=⎥

    ⎤⎢⎣

    ⎡==

    )()(

    ,*

    1* *,

    xgxf

    zwLbµ

    case ii) g(x*)

  • In summary[ ] ( )[ ]

    *)(,0and,0

    )(*)(min*),(minL*** xAjjwith

    xgxfxL

    jj

    T

    xx

    ∉∀=∀≥

    +==

    µµ

    µµ

    is equivalent to• x = x* ⇒ w*Tz - b = 0

    )(,0 and ,0 xAjjwith jj ∉∀∀≥ µµ

    ⎥⎤

    ⎢⎡

    ⎥⎤

    ⎢⎡ )(1

    **xf

    Lb• x ≠ x* ⇒ w*Tz - b ≥ 0

    can be visualized as

    ⎥⎦

    ⎢⎣

    =⎥⎦

    ⎢⎣

    ==)(

    ,*

    * *,xg

    zwLbµ

    f ∈ Rf ∈ R

    w*

    g(x*)=0 g(x*)

  • Duality

    [ ] ( )[ ]*)(0d0

    )(*)(min*),(minL*** Ajji h

    xgxfxL Txx

    ∀∀≥

    +== µµ

    does not appear terribly difficult once I know µ*

    *)(,0 and ,0 xAjjwith jj ∉∀=∀≥ µµ

    how I do find its value? consider this function for any µ

    [ ] [ ])()(min),(min)(q +== µµµ xgxfxL Txx

    this is equivalent to

    0 ≥µwithxx

    • x = x* ⇒ wTz - b = 0• x ≠ x* ⇒ wTz - b ≥ 0 ⎥⎦

    ⎤⎢⎣

    ⎡=⎥

    ⎤⎢⎣

    ⎡==

    )()(

    ,1

    ),(xgxf

    zwqbµ

    µ

    24

    the picture is the same with L* replaced by q(µ)

  • Dualitynoting that• hyperplane (w,b) still has to support feasible set• and we still have µ>0

    this leads tow

    f ∈ Rf ∈ R

    w*

    g(x*)=0 g(x*)

  • Dualitynote that • q(µ) ≤ L* = f*• if we keep increasing q(µ) we will get q(µ) = L*• we cannot go beyond L* (x* would move to g(x*) > 0)

    this is exactly the definition of the dual problemthis is exactly the definition of the dual problem

    [ ] [ ])()(min),(min)(q +== µµµ xgxfxL Txx)(q max0 µµ≥

    note:• q(µ) may go to -∞ for some µ, which means that there is no

    Lagrange multiplier (plane would be vertical)

    0 ≥µwith

    Lagrange multiplier (plane would be vertical). • this is avoided by introducing the constraint

    { }−∞>=∈ )(| µµµ qD

    26

    { }−∞>=∈ )(| µµµ qDq

  • Dualitywe therefore have a two step recipe to find the optimal solution1 f l1.for any µ, solve

    [ ] [ ])()(min),(min)(q xgxfxL Txx

    µµµ +==

    2.then solve

    xx

    )(qmax µ

    one of the reasons why this is interesting is that the d bl t t t b it bl

    )(q max,0

    µµµ qD∈≥

    second problem turns out to be quite manageable

    27

  • DualityTheorem: Dq is a convex set and q is concave on DqProof:• for any x, µ, µ and α ∈ [0,1]

    ( ) ( )µααµµααµ )()1()()1(, xgxfxLT

    T−++=−+

    [ ] [ ]( ) ( )

    µαµα

    µααµ

    )(

    )()()1()()(

    )()1()()(

    xgxfxgxf

    xgxgxfTT

    TT

    +−++=

    −++=

    • and taking min on both sides( ) ( )µαµα ,)1(, xLxL −+=

    ( ) ( ) ( )[ ]µαµαµααµ ,)1(,min )1(,min xLxLxL −+=−+

    • we have

    ( ) ( ) ( )[ ]( ) ( )µαµα ,min)1(,min xLxL

    xx

    xx

    −+≥

    ( ) ( )28

    ( ) ( ) ( )µαµαµααµ qqq )1()1( −+≥−+

  • Duality• we have

    ( ) ( ) ( )µαµαµααµ qqq )1()1( −+≥−+ (*)• from which two conclusions follow

    • if µ ∈ Dq and µ ∈ Dq ⇒ q(µ)>-∞, q(µ)>-∞ ⇒ αµ + (1-α)µ ∈ DqHence Dq is convexe ce q s co e

    • by definition of concavity (*) implies that q is concave over Dq

    this is a very appealing result, since convex optimization bl th i t t lproblems are among the easiest to solve

    note that the dual is always concave, irrespective of the primal problemprimal problemthe following result only proves what we already have inferred from the graphical interpretation

    29

  • DualityTheorem: (weak duality) it is always true that

    ** fq ≤Proof:• for any µ ≥ 0, and x with g(x) ≤ 0,

    ** fq ≤

    since µ g(x) ≤ 0 Hence

    ( ) )()()(),(min xfxgxfzLqj

    jjz≤+≤= ∑µµµ

    since µjgj(x) ≤ 0. Hence

    )()(max*0

    xfqq ≤=≥

    µµ

    and since this holds for any x

    )(min*0)(,

    xfqxgx ≤

    30

  • Duality gapif q* = f* we say that there is no duality gap. Otherwise there is a duality gap.y g pthe DG constrains the existence of Lagrange multipliersTheorem:• if there is no duality gap, the set of Lagrange multipliers is the set

    of optimal dual solutionsif th i d lit th L lti li• if there is a duality gap, there are no Lagrange multipliers

    Proof:• by definition µ* ≥ 0 is a Lagrange multiplier if and only if• by definition, µ ≥ 0 is a Lagrange multiplier if and only if

    • which, from the previous theorem, holds if and only if q* = f*, i.e. if there is no duality gap

    **)(* qqf ≤= µ

    31

    there is no duality gap

  • Duality gapnote that there are situations in which the dual problem has a solution, but for which there is no Lagrange , g gmultiplier.for example:

    f ∈ R

    • this is a valid dual problem• however the constraint

    *g( *) 0

    (0,f*)

    µi*g(xi*) = 0

    is not satisfied andq* ≠ f* g∈ Rr-1g(xi*)

  • 33