Duality - Duality gap the following theorems are relevant (note: proofs are hard,,p y g , ) not...

Click here to load reader

  • date post

    28-Jun-2020
  • Category

    Documents

  • view

    1
  • download

    0

Embed Size (px)

Transcript of Duality - Duality gap the following theorems are relevant (note: proofs are hard,,p y g , ) not...

  • Duality

    Nuno Vasconcelos ECE Department, UCSDp ,

  • Optimization goal: find maximum or minimum of a function Definition: given functions f g i=1 k and h i=1 mDefinition: given functions f, gi, i=1,...,k and hi, i=1,...m defined on some domain Ω ∈ Rn

    wwf Ω∈ ),( min

    iwh iwg

    f

    i

    i

    ∀= ∀≤

    ,0)( ,0)( subject to

    )(

    for compactness we write g(w) ≤ 0 instead of gi(w) ≤ 0, ∀i. Similarly h(w) = 0 we derived nec. and suf. conds for (local) optimality in • the absence of constraints • equality constraints only

    2

    • equality constraints only • equality and inequality

  • Minima conditions (unconstrained) let f(w) be continuously differentiable w* is a local minimum of f(w) if and only ifw is a local minimum of f(w) if and only if • f has zero gradient at w*

    0*)( =∇ wf • and the Hessian of f at w* is positive definite

    0*)( =∇ wf

    nt ddwfd ℜ∈∀≥∇ 0*)(2

    • where

    ⎥ ⎤

    ⎢ ⎡ ∂∂ )()(

    22

    xfxf

    ddwfd ℜ∈∀≥∇ ,0)(

    ⎥ ⎥ ⎥ ⎥

    ⎢ ⎢ ⎢ ⎢

    ∂∂

    ∂∂∂ =∇

    )()(

    )( 22

    10 2 0

    2

    ff

    x xx

    x x

    xf n

    M

    L

    3

    ⎥ ⎥

    ⎦ ⎢ ⎢

    ⎣ ∂ ∂

    ∂∂ ∂

    −−

    )()( 2 101

    x x

    fx xx

    f

    nn

    L

  • Maxima conditions (unconstrained) let f(w) be continuously differentiable w* is a local maximum of f(w) if and only ifw is a local maximum of f(w) if and only if • f has zero gradient at w*

    0*)( =∇ wf • and the Hessian of f at w* is negative definite

    0*)( =∇ wf

    nt ddwfd ℜ∈∀≤∇ 0*)(2

    • where

    ⎥ ⎤

    ⎢ ⎡ ∂∂ )()(

    22

    xfxf

    ddwfd ℜ∈∀≤∇ ,0)(

    ⎥ ⎥ ⎥ ⎥

    ⎢ ⎢ ⎢ ⎢

    ∂∂

    ∂∂∂ =∇

    )()(

    )( 22

    10 2 0

    2

    ff

    x xx

    x x

    xf n

    M

    L

    4

    ⎥ ⎥

    ⎦ ⎢ ⎢

    ⎣ ∂ ∂

    ∂∂ ∂

    −−

    )()( 2 101

    x x

    fx xx

    f

    nn

    L

  • Constrained optimization with equality constraints only Theorem: consider the problemTheorem: consider the problem

    0)()(minarg* == xhxfx x

    tosubject

    where the constraint gradients ∇hi(x*) are linearly independent. Then x* is a solution if and only if there exits a unique vector λ, such that

    0*)(*)() =∇+∇ ∑ xhxfi i m

    0*)(,0*)(*)()

    0)()()

    22

    1

    =∇∀≥⎥ ⎦

    ⎤ ⎢ ⎣

    ⎡ ∇+∇

    ∇+∇

    ∑ =

    yxhyyxhxfyii

    xhxfi

    T i

    m

    i T

    i i

    i

    s.t.

    λ

    λ

    5

    1 ⎥ ⎦

    ⎢ ⎣

    ∑ =i

  • Alternative formulation state the conditions through the Lagrangian

    m

    th th b tl itt

    )()(),( 1

    xhxfxL i m

    i i∑

    =

    += λλ

    the theorem can be compactly written as

    )*,( )*()

    * * =⎥

    ⎤ ⎢ ⎡∇

    =∇ xL

    xLi x 0 λ

    λ

    0*)(,0)*,()

    )*,( ),()

    *2

    *

    =∇∀≥∇

    =⎥ ⎦

    ⎢ ⎣∇

    =∇

    yxhyyxLyii

    xL xLi

    T xx

    T s.t.

    0

    λ

    λ λ

    λ

    the entries of λ are referred to as Lagrange multipliers

    6

  • Geometric view consider the tangent space to the iso-contour h(x) = 0 since h grows in any direction along which ∇h(x) is notsince h grows in any direction along which ∇h(x) is not zero, ∇h(x) ⊥ to the iso-contour hence, the subspace of first order feasible variations is, p

    f ∆ f hi h ∆ ti fi th t i t t

    { }ixxhxxV Ti ∀=∆∇∆= ,0*)(|*)( space of ∆x for which x + ∆x satisfies the constraint up to first order approximation

    V(x*) feasible variations

    x* ∇h(x*)h(x)=0

    7

  • Feasible variations multiplying our first Lagrangian condition by ∆x

    0*)(*)( ∆∇+∆∇ ∑ xxhxxf T m

    T λ

    it follows that ∇f(x*) must satisfy

    0*)(*)( 1

    =∆∇+∆∇ ∑ =

    xxhxxf i i

    *)(0*)( Vf T ∆∆

    i.e.∇f(x*) ⊥ V(x*) : gradient orthogonal to all feasible steps

    *)(,0*)( xVxxxf T ∈∆∀=∆∇

    no growth is possible along the constraint this is a generalization of ∇f(x*)=0 in unconstrained case

    tnote: • Hessian constraint only defined for y in V(x*) • makes sense: we cannot move anywhere else, does not really

    8

    y , y matter what Hessian is outside V(x*)

  • Inequality constraints with inequalities

    0)(0)(tosubject)(minarg* ≤== xgxhxfx

    the only ones that matter are those which are active

    0)( ,0)( tosubject )(minarg* ≤== xgxhxfx x

    and these are equalities

    { }0)(| )( == xgjxA j innactive

    *

    9

    x* x* active

  • Constrained optimization hence, the problem

    0)(0)(tosubject)(minarg* ≤== xgxhxfx

    is equivalent to

    0)( ,0)( tosubject )(minarg* ≤== xgxhxfx x

    this is a problem with equality constraints there must be

    *)(,0)( ,0)( tosubject )(minarg* xAixgxhxfx i x

    ∈∀===

    g(x) ≤ 0

    this is a problem with equality constraints, there must be a λ* and µj*, such that

    0*)(*)(*)( ** ∇∇∇ ∑∑ hf rm

    λ ∇f

    ∇gwith µj* = 0, j ∉ A(x*)

    0*)(*)(*)( 11

    =∇+∇+∇ ∑∑ ==

    xgxhxf j j

    ji i

    i µλ

    10

    finally, we need µj* ≥ 0, for all j, to guarantee this

  • The KKT conditions Theorem: for the problem

    0)( ,0)( tosubject )(minarg* ≤== xgxhxfx

    x* is a local minimum if and only if there exist λ* and µ* such that

    x

    0*)(*)(*)(

    **

    1

    *

    1

    * xgxhxfi) j r

    j ji

    m

    i i =∇+∇+∇

    == ∑∑ µλ

    0*)() *)(,0),0) **

    xhiv xAjiiijii

    rm

    jj

    ⎤⎡

    =

    ∉∀=∀≥ µµ

    ( )

    { }*)(,0*)(,0*)(|*)(

    *)(,0)()() *1

    *

    1

    *

    xAjyxgiyxhyxVwhere

    xVyyxgxhxfyv

    T j

    T i

    xx j

    r

    j ji

    m

    i i

    T

    ∈∀=∇∀=∇=

    ∈∀≥⎥ ⎦

    ⎤ ⎢ ⎣

    ⎡ ∇+∇+∇∇

    === ∑∑

    and

    µλ

    11

    { })(,0)(,0)(|)( xAjyxgiyxhyxVwhere ji ∈∀∇∀∇ and

  • Geometric interpretation we consider the case without equality constraints

    0)(tbj t)(i* ≤f

    from the KKT conditions, the solution satisfies

    0)( tosubject )(minarg* ≤= xgxfx x

    [ ]

    with

    [ ] *)(,0) ,0)

    0*)*,( ** xAjiiijii

    xLi)

    jj ∉∀=∀≥

    =∇

    µµ

    µ

    with

    *)(*)(*)*,( 1

    * xgxfxL j r

    j j∑

    =

    += µµ

    which is equivalent to [ ] ( )[ ])(*)(min*),(minL*

    **

    xgxfxL T xx

    +== µµ

    12

    *)(,0 and ,0 ** xAjjwith jj ∉∀=∀≥ µµ

  • Geometric interpretation [ ] ( )[ ]

    *)(,0and,0

    )(*)(min*),(minL* ** xAjjwith

    xgxfxL

    jj

    T

    xx

    ∉∀=∀≥

    +==

    µµ

    µµ

    is equivalent to • x = x* ⇒ w*Tz - b = 0

    )(,0 and ,0 xAjjwith jj ∉∀∀≥ µµ

    ⎥ ⎤

    ⎢ ⎡

    ⎥ ⎤

    ⎢ ⎡ )(1

    ** xf

    Lb • x ≠ x* ⇒ w*Tz - b ≥ 0

    can be visualized as

    ⎥ ⎦

    ⎢ ⎣

    =⎥ ⎦

    ⎢ ⎣

    == )(

    , *

    * *, xg

    zwLb µ

    f ∈ R f ∈ R

    w*

    g(x*)=0 g(x*)

  • Duality we solve instead

    [ ] [ ])()(min)(min)(q +== µµµ xgxfxL T ⎤⎡⎤⎡

    i t ith L* l d b ( ) * b

    [ ] [ ] 0

    )()(min),(min)(q

    +==

    µ

    µµµ

    with

    xgxfxL xx

    ⎥ ⎦

    ⎤ ⎢ ⎣

    ⎡ =⎥

    ⎤ ⎢ ⎣

    ⎡ ==

    )( )(

    , 1

    ),( xg xf

    zwqb µ

    µ

    same picture with L* replaced by q(µ), µ* by µ

    f ∈ R f ∈ Rg(x*)=0 g(x*)

  • Duality note that • q(µ) ≤ L* = f* • if we keep increasing q(µ) we will get q(µ) = L* • we cannot go beyond L* (x* would move to g(x*) > 0)

    this is exactly the definition of the dual problemthis is exactly the definition of the dual problem

    [ ] [ ])()(min),(min)(q +== µµµ xgxfxL T xx

    )(q max 0

    µ µ≥

    note: • q(µ) may go to -∞ for some µ.

    0 ≥µwith

    • this is avoided by introducing the constraint

    { }−∞>=∈ )(| µµµ qDq

    15

  • Equality constraints so far we have disregard them. What about

    0)(0)(tosubject)(minarg* ≤== xgxhxfx

    intuitively, nothing should change, since

    0)( ,0)( tosubject )(minarg* ≤== xgxhxfx x

    i