optimization - 2009. 10. 6.آ  Optimization note: maximizing f(x) is the same as minimizing...

download optimization - 2009. 10. 6.آ  Optimization note: maximizing f(x) is the same as minimizing –f(x),

of 29

  • date post

    08-Mar-2021
  • Category

    Documents

  • view

    0
  • download

    0

Embed Size (px)

Transcript of optimization - 2009. 10. 6.آ  Optimization note: maximizing f(x) is the same as minimizing...

  • Optimization

    Nuno Vasconcelos ECE Department, UCSDp ,

  • Optimization many engineering problems boil down to optimization goal: find maximum or minimum of a functiongoal: find maximum or minimum of a function Definition: given functions f, gi, i=1,...,k and hi, i=1,...m defined on some domain Ω ∈ Rn

    iwg wwf

    i ∀≤ Ω∈

    ,0)( subject to ),( min

    f(w): cost; hi (equality), gi (inequality): constraints

    iwhi ∀= ,0)(

    for compactness we write g(w) ≤ 0 instead of gi(w) ≤ 0, ∀i. Similarly h(w) = 0

    2

    note that f(w) ≥ 0 ⇔ –f(w) ≤ 0 (no need for ≥ 0)

  • Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible region is the region where f(.) is defined and all constraints hold

    { } w* is a global minimum of f(w) if

    { }0)(,0)(| =≤Ω∈=ℜ xhwgw

    w* is a local minimum of f(w) if

    Ω∈∀≥ wwfwf *),()(

    w is a local minimum of f(w) if local

    global*)()( * s.t. 0

    ff ww

    ⇒∃ εε

    3

    global*)()( wfwf ≥

  • The gradient the gradient of a function f(w) at z is

    T ff ⎞⎛ ∂∂

    Th th di t i t i

    n

    z w

    fz w fzf ⎟⎟

    ⎞ ⎜⎜ ⎝

    ⎛ ∂

    ∂ ∂ ∂

    =∇ −

    )(,),()( 10

    L

    Theorem: the gradient points in the direction of maximum growth proof:

    ∇f

    proof: • from Taylor series expansion

    )()()()( 2ααα Owfdwfdwf T +∇+=+ • derivative along d

    ))(,cos(.)(.)()()(lim 0

    wfdwfdwfdwfdwf T ∇∇=∇=−+ → α

    α α

    (*)

    4

    • is maximum when d is in the direction of the gradient

  • max The gradient

    max

    note that if ∇f = 0 • there is no direction of growthg • also -∇f = 0, and there is no direction of

    decrease • we are either at a local minimum or maximumwe are either at a local minimum or maximum

    or “saddle” point

    conversely, at local min or max or saddle point

    min point • no direction of growth or decrease • ∇f = 0

    saddle this shows that we have a critical point if and only if ∇f = 0 to determine which type we need second

    5

    to determine which type we need second order conditions

  • maxThe Hessian if ∇f = 0, by Taylor series

    )()( α wfdwf =+

    and )()(

    2 )(

    )()(

    32 2

    0

    ααα

    α

    Odwfdwfd

    wfdwf

    TT +∇+∇+

    +

    321

    i

    and

    T

    )()( 2 1)()( 2

    2 αα α Odwfdwfdwf T +∇=−+

    minpick α such that O(α)

  • Minima conditions (unconstrained) let f(w) be continuously differentiable w* is a local minimum of f(w) if and only ifw is a local minimum of f(w) if and only if • f has zero gradient at w*

    0*)( =∇ wf

    • and the Hessian of f at w* is positive definite

    0)( =∇ wf

    nt ddwfd ℜ∈∀≥∇ 0*)(2

    • where ⎤⎡ ∂∂ 22 ff

    ddwfd ℜ∈∀≥∇ ,0)(

    ⎥ ⎥ ⎥ ⎥ ⎤

    ⎢ ⎢ ⎢ ⎢ ⎡

    ∂∂

    ∂∂ ∂

    ∂ ∂

    =∇ −

    )()(

    )( 22

    10 2 0

    2

    ff

    x xx fx

    x f

    xf n

    M

    L

    7

    ⎥ ⎥

    ⎦ ⎢ ⎢

    ⎣ ∂ ∂

    ∂∂ ∂

    −−

    )()( 2 1

    2

    01

    2

    x x

    fx xx

    f

    nn

    L

  • Maxima conditions (unconstrained) let f(w) be continuously differentiable w* is a local maximum of f(w) if and only ifw is a local maximum of f(w) if and only if • f has zero gradient at w*

    0*)( =∇ wf

    • and the Hessian of f at w* is negative definite

    0)( =∇ wf

    nt ddwfd ℜ∈∀≤∇ 0*)(2

    • where ⎤⎡ ∂∂ 22 ff

    ddwfd ℜ∈∀≤∇ ,0)(

    ⎥ ⎥ ⎥ ⎥ ⎤

    ⎢ ⎢ ⎢ ⎢ ⎡

    ∂∂

    ∂∂ ∂

    ∂ ∂

    =∇ −

    )()(

    )( 22

    10 2 0

    2

    ff

    x xx fx

    x f

    xf n

    M

    L

    8

    ⎥ ⎥

    ⎦ ⎢ ⎢

    ⎣ ∂ ∂

    ∂∂ ∂

    −−

    )()( 2 1

    2

    01

    2

    x x

    fx xx

    f

    nn

    L

  • Example consider the functions

    f(x) = x1 + x2 g(x) = x12 + x22( ) 1 2 g( ) 1 2 the gradients are

    ⎤⎡ 12x⎤⎡1

    f has no minima or maxima

    ⎥ ⎦

    ⎤ ⎢ ⎣

    ⎡ =∇

    2

    1

    2 2

    )( x x

    xg⎥ ⎦

    ⎤ ⎢ ⎣

    ⎡ =∇

    1 1

    )(xf

    f has no minima or maxima, g has a critical point at the origin x = (0,0) since Hessian is positive definite this is a minimumsince Hessian is positive definite, this is a minimum

    ⎥ ⎦

    ⎤ ⎢ ⎣

    ⎡ =∇

    20 02

    )(2 xg

    9

    ⎦⎣ 20

  • Example makes sense because

    f(x) = x1 + x2( ) 1 2 is a plane, gradient is constant

    ⎥ ⎦

    ⎤ ⎢ ⎣

    ⎡ =∇

    1 1

    )(xf

    iso-contours of f(x)of f(x)

    x1 + x2 = 0

    x1 + x2 = 1

    10

    x1 + x2 0

    x1 + x2 = -1

  • Example makes sense because

    g(x) = x12 + x22g( ) 1 2 is a quadratic, positive everywhere but the origin note how gradient points towards largest increase g p g

    g(x)=2 2

    ⎥ ⎦

    ⎤ ⎢ ⎣

    ⎡ =∇

    2 0

    )(xh ⎥ ⎦

    ⎤ ⎢ ⎣

    ⎡ =∇

    2 2

    )(xh

    ⎥ ⎦

    ⎤ ⎢ ⎣

    ⎡ =∇

    2

    1

    2 2

    )( x x

    xh

    1

    1

    ⎤⎡22

    g(x)=1 1

    ⎥ ⎦

    ⎤ ⎢ ⎣

    ⎡ =∇

    0 2

    )(xh

    11

  • Convex functions Definition: f(w) is convex if∀w,u ∈ Ω and λ ∈ [0,1]

    )()1()())1(( fff λλλλ ≤

    Theorem: f(w) is convex if and only if its Hessian is iti d fi it f ll

    )()1()())1(( ufwfuwf λλλλ −+≤−+

    positive definite for all w

    Ω∈∀≥∇ wwwfw t ,0)(2 f(w)

    λf(w)+(1-λ)f(v)

    proof: • requires some

    intermediate res lts that

    ( ) f(u)

    intermediate results that we will not cover

    • we will skip it u

    w f(λw+(1-λ)v)

    12

    λw+(1-λ)v

  • Concave functions Definition: f(w) is concave if∀w,u ∈ Ω and λ ∈ [0,1]

    )()1()())1(( fff λλλλ ≥

    Theorem: f(w) is concave if and only if its Hessian is ti d fi it f ll

    )()1()())1(( ufwfuwf λλλλ −+≥−+

    negative definite for all w

    Ω∈∀≤∇ wwwfw t ,0)(2

    proof: • -f(w) is convex • by previous theorem, Hessian

    is negative definite • Hessian of f(w) is positive definite

    13

    ( ) p

  • Convex functions Theorem: if f(w) is convex any local minimum w* is also a global minimumg Proof: • we need to show that, for any u, f(w*) ≤ f(u) • for any u: ||w*-[λw*+(1-λ)u|| = (1-λ) ||w*-u|| • and, making λ arbitrarily close to 1, we can make

    ||w*-[λw*+(1-λ)u|| ≤ ε, for any ε > 0

    • since w* is local minimum, it follows that f(w*) ≤ f(λw*+(1-λ)u) and by convexity that f(w*) ≤ λf(w*)+(1-λ)f(u)and, by convexity, that f(w ) ≤ λf(w ) (1 λ)f(u)

    • or f(w*)(1-λ) ≤ f(u)(1-λ) • and f(w*) ≤ f(u)

    14

  • Constrained optimization in summary: • we know what are conditions for unconstrained max and minwe know what are conditions for unconstrained max and min • we like convex functions (find a minima, it will be global minimum)

    what about optimization with constraints?p a few definitions to start with inequality gi(w) ≤ 0:q y gi( ) • is active if gi(w) = 0, otherwise inactive

    inequalities can be expressed as equalities by introduction of slack variables

    0 and ,0)( 0)( ≥=+⇔≤ iiii wgwg ξξ

    15

  • Convex optimization Definition: a set Ω is convex if ∀w,u ∈ Ω and λ ∈ [0,1] then λw+(1-λ)u ∈ Ω( ) “a line between any two points in Ω is also in Ω”

    convex not convex

    Definition: an optimization problem where the set Ω, the cost f and all constraints g and h are convex is said to be convexconvex note: linear constraints g(x) = Ax+b are always convex (zero Hessian)

    16

    ( )

  • Constrained optimization we will consider general (not only convex) constrained optimization problems, start by case with only equalitiesp p , y y q Theorem: consider the problem

    0)( subject to )(minarg* == xhxfx

    where the constraint gradients hi(x*) are linearly independent. Then x* is a solution if and only if there

    )(j)(g f x

    exits a unique vector λ, such that

    0*)(*)() =∇+∇ ∑ xhxfi i m

    0*)( s.t. ,0*)(*)( )

    0)()( )

    22

    1

    =∇∀≥⎥ ⎦

    ⎤ ⎢ ⎣

    ⎡ ∇+∇

    ∇+∇

    ∑ =

    yxhyyxhxfyii

    xhxfi

    T i

    m

    i T

    i i

    i

    λ

    λ

    17

    1 ⎥ ⎦

    ⎢ ⎣

    ∑ =i