optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as...

Post on 08-Mar-2021

8 views 0 download

Transcript of optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as...

Optimization

Nuno Vasconcelos ECE Department, UCSDp ,

Optimizationmany engineering problems boil down to optimizationgoal: find maximum or minimum of a functiongoal: find maximum or minimum of a functionDefinition: given functions f, gi, i=1,...,k and hi, i=1,...mdefined on some domain Ω ∈ Rn

iwgwwf

i ∀≤Ω∈

,0)( subject to ),( min

f(w): cost; hi (equality), gi (inequality): constraints

iwhi ∀= ,0)(

for compactness we write g(w) ≤ 0 instead of gi(w) ≤ 0, ∀i. Similarly h(w) = 0

2

note that f(w) ≥ 0 ⇔ –f(w) ≤ 0 (no need for ≥ 0)

Optimizationnote: maximizing f(x) is the same as minimizing –f(x),this definition also works for maximizationthe feasible region is the region where f(.) is defined and all constraints hold

w* is a global minimum of f(w) if

0)(,0)(| =≤Ω∈=ℜ xhwgw

w* is a local minimum of f(w) if

Ω∈∀≥ wwfwf *),()(

w is a local minimum of f(w) iflocal

global*)()(* s.t. 0

ffww

⇒<−>∃ εε

3

global*)()( wfwf ≥

The gradientthe gradient of a function f(w) at z is

Tff ⎞⎛ ∂∂

Th th di t i t i

n

zw

fzwfzf ⎟⎟

⎞⎜⎜⎝

⎛∂

∂∂∂

=∇−

)(,),()(10

L

Theorem: the gradient points in the direction of maximum growthproof:

∇f

proof:• from Taylor series expansion

)()()()( 2ααα Owfdwfdwf T +∇+=+• derivative along d

))(,cos(.)(.)()()(lim0

wfdwfdwfdwfdwf T ∇∇=∇=−+

→ αα

α(*)

4

• is maximum when d is in the direction of the gradient

maxThe gradient

max

note that if ∇f = 0• there is no direction of growthg• also -∇f = 0, and there is no direction of

decrease• we are either at a local minimum or maximumwe are either at a local minimum or maximum

or “saddle” point

conversely, at local min or max or saddle point

minpoint• no direction of growth or decrease• ∇f = 0

saddlethis shows that we have a critical point if and only if ∇f = 0to determine which type we need second

5

to determine which type we need second order conditions

maxThe Hessianif ∇f = 0, by Taylor series

)()( α wfdwf =+

and)()(

2)(

)()(

322

0

ααα

α

Odwfdwfd

wfdwf

TT +∇+∇+

+

321

i

and

T

)()(21)()( 2

2 αα

α Odwfdwfdwf T +∇=−+

minpick α such that O(α) << |dT∇2f d|, ∀d≠0• maximum at w if and only if dT∇2f d ≤ 0, ∀d≠0• minimum at w if and only if dT∇2f d ≥ 0 ∀d≠0

saddle• minimum at w if and only if dT∇2f d ≥ 0, ∀d≠0• saddle otherwise

this proves the following theorems

6

this proves the following theorems

Minima conditions (unconstrained)let f(w) be continuously differentiablew* is a local minimum of f(w) if and only ifw is a local minimum of f(w) if and only if• f has zero gradient at w*

0*)( =∇ wf

• and the Hessian of f at w* is positive definite

0)( =∇ wf

nt ddwfd ℜ∈∀≥∇ 0*)(2

• where⎤⎡ ∂∂ 22 ff

ddwfd ℜ∈∀≥∇ ,0)(

⎥⎥⎥⎥⎤

⎢⎢⎢⎢⎡

∂∂

∂∂∂

∂∂

=∇−

)()(

)(22

1020

2

ff

xxxfx

xf

xfn

M

L

7

⎥⎥

⎦⎢⎢

⎣ ∂∂

∂∂∂

−−

)()( 21

2

01

2

xx

fxxx

f

nn

L

Maxima conditions (unconstrained)let f(w) be continuously differentiablew* is a local maximum of f(w) if and only ifw is a local maximum of f(w) if and only if• f has zero gradient at w*

0*)( =∇ wf

• and the Hessian of f at w* is negative definite

0)( =∇ wf

nt ddwfd ℜ∈∀≤∇ 0*)(2

• where⎤⎡ ∂∂ 22 ff

ddwfd ℜ∈∀≤∇ ,0)(

⎥⎥⎥⎥⎤

⎢⎢⎢⎢⎡

∂∂

∂∂∂

∂∂

=∇−

)()(

)(22

1020

2

ff

xxxfx

xf

xfn

M

L

8

⎥⎥

⎦⎢⎢

⎣ ∂∂

∂∂∂

−−

)()( 21

2

01

2

xx

fxxx

f

nn

L

Exampleconsider the functions

f(x) = x1 + x2 g(x) = x12 + x2

2( ) 1 2 g( ) 1 2

the gradients are

⎤⎡ 12x⎤⎡1

f has no minima or maxima

⎥⎦

⎤⎢⎣

⎡=∇

2

1

22

)(xx

xg⎥⎦

⎤⎢⎣

⎡=∇

11

)(xf

f has no minima or maxima, g has a critical point at the origin x = (0,0)since Hessian is positive definite this is a minimumsince Hessian is positive definite, this is a minimum

⎥⎦

⎤⎢⎣

⎡=∇

2002

)(2 xg

9

⎦⎣ 20

Examplemakes sense because

f(x) = x1 + x2( ) 1 2

is a plane, gradient is constant

⎥⎦

⎤⎢⎣

⎡=∇

11

)(xf

iso-contours of f(x)of f(x)

x1 + x2 = 0

x1 + x2 = 1

10

x1 + x2 0

x1 + x2 = -1

Examplemakes sense because

g(x) = x12 + x2

2g( ) 1 2

is a quadratic, positive everywhere but the originnote how gradient points towards largest increase g p g

g(x)=22

⎥⎦

⎤⎢⎣

⎡=∇

20

)(xh⎥⎦

⎤⎢⎣

⎡=∇

22

)(xh

⎥⎦

⎤⎢⎣

⎡=∇

2

1

22

)(xx

xh

1

1

⎤⎡22

g(x)=11

⎥⎦

⎤⎢⎣

⎡=∇

02

)(xh

11

Convex functionsDefinition: f(w) is convex if∀w,u ∈ Ω and λ ∈ [0,1]

)()1()())1(( fff λλλλ ≤

Theorem: f(w) is convex if and only if its Hessian is iti d fi it f ll

)()1()())1(( ufwfuwf λλλλ −+≤−+

positive definite for all w

Ω∈∀≥∇ wwwfw t ,0)(2

f(w)

λf(w)+(1-λ)f(v)

proof: • requires some

intermediate res lts that

( )f(u)

intermediate results thatwe will not cover

• we will skip itu

w f(λw+(1-λ)v)

12

λw+(1-λ)v

Concave functionsDefinition: f(w) is concave if∀w,u ∈ Ω and λ ∈ [0,1]

)()1()())1(( fff λλλλ ≥

Theorem: f(w) is concave if and only if its Hessian is ti d fi it f ll

)()1()())1(( ufwfuwf λλλλ −+≥−+

negative definite for all w

Ω∈∀≤∇ wwwfw t ,0)(2

proof:• -f(w) is convex• by previous theorem, Hessian

is negative definite• Hessian of f(w) is positive definite

13

( ) p

Convex functionsTheorem: if f(w) is convex any local minimum w* is also a global minimumgProof:• we need to show that, for any u, f(w*) ≤ f(u)• for any u: ||w*-[λw*+(1-λ)u|| = (1-λ) ||w*-u||• and, making λ arbitrarily close to 1, we can make

||w*-[λw*+(1-λ)u|| ≤ ε, for any ε > 0

• since w* is local minimum, it follows that f(w*) ≤ f(λw*+(1-λ)u)and by convexity that f(w*) ≤ λf(w*)+(1-λ)f(u)and, by convexity, that f(w ) ≤ λf(w ) (1 λ)f(u)

• or f(w*)(1-λ) ≤ f(u)(1-λ)• and f(w*) ≤ f(u)

14

Constrained optimizationin summary:• we know what are conditions for unconstrained max and minwe know what are conditions for unconstrained max and min• we like convex functions (find a minima, it will be global minimum)

what about optimization with constraints?pa few definitions to start withinequality gi(w) ≤ 0:q y gi( )• is active if gi(w) = 0, otherwise inactive

inequalities can be expressed as equalities by introduction of slack variables

0 and ,0)( 0)( ≥=+⇔≤ iiii wgwg ξξ

15

Convex optimizationDefinition: a set Ω is convex if ∀w,u ∈ Ω and λ ∈ [0,1]then λw+(1-λ)u ∈ Ω( )“a line between any two points in Ω is also in Ω”

convex not convex

Definition: an optimization problem where the set Ω, the cost f and all constraints g and h are convex is said to be convexconvexnote: linear constraints g(x) = Ax+b are always convex (zero Hessian)

16

( )

Constrained optimization we will consider general (not only convex) constrained optimization problems, start by case with only equalitiesp p , y y qTheorem: consider the problem

0)( subject to )(minarg* == xhxfx

where the constraint gradients hi(x*) are linearly independent. Then x* is a solution if and only if there

)(j)(g fx

exits a unique vector λ, such that

0*)(*)() =∇+∇ ∑ xhxfi i

m

0*)( s.t. ,0*)(*)( )

0)()( )

22

1

=∇∀≥⎥⎦

⎤⎢⎣

⎡∇+∇

∇+∇

∑=

yxhyyxhxfyii

xhxfi

Ti

m

iT

ii

i

λ

λ

17

1⎥⎦

⎢⎣

∑=i

Alternative formulationstate the conditions through the Lagrangian

m

th th b tl itt

)()(),(1

xhxfxL i

m

ii∑

=

+= λλ

the theorem can be compactly written as

0 )*,( )*

* =∇ xLi x λ

0*)( s.t. ,0)*,( )

0)*,(*2

*

=∇∀≥∇

=∇

yxhyyxLyii

xLii) T

xxT λ

λλ

the entries of λ are referred to as Lagrange multipliers

18

Gradient (revisited)recall from (*) that derivative of f along d is

thi th t

))(,cos(.)(.)()()(lim0

wfdwfdwfdwfdwf T ∇∇=∇=−+

→ αα

α

this means that• greatest increase when d || f• no increase when d ⊥ f

∇fno increase

no increase when d ⊥ f• since there is no increase when

d is tangent to iso-contour f(x) = kh di i di l• the gradient is perpendicular to

the tangent of the iso-contour

this suggests a geometric proof

19

gg g p

Lagrangian optimizationgeometric interpretation:• since h(x)=0 is a iso-contour of h(x) ∇h(x*) is perpendicular to• since h(x)=0 is a iso-contour of h(x), ∇h(x ) is perpendicular to

the iso-contour• i) says that ∇f (x*) ∈ span∇hi(x*)• i.e. ∇f ⊥ to tangent space of the constraint surface• intuitive

• direction of largest increase of

span∇h(x*)

direction of largest increase off is ⊥ to constraint surface

• the gradient is zero along theconstraint

tg plane

• no way to give an infinitesimalgradient step, without ending upviolating it

20

h(x)=0• it is impossible to increase f and

still satisfy the constraint

Exampleconsider the problem

min x1 + x2 subject to x12 + x2

2 = 21 2 j 1 2

it leads to the following picture

⎥⎤

⎢⎡

=∇ 12)(

xxh

h(x)=0

i t

2

⎥⎦

⎢⎣

=∇22

)(x

xh

⎥⎤

⎢⎡

=∇1

)(xf

iso-contours of f(x)2

⎥⎦

⎢⎣

∇1

)(xf

x1 + x2 = 0

x1 + x2 = 1

21

x1 + x2 = -1

Exampleconsider the problem

min x1 + x2 subject to x12 + x2

2 = 21 2 j 1 2

∇f ⊥ to the iso-contours of f (x1 + x2 = k)

⎥⎤

⎢⎡

=∇ 12)(

xxh

h(x)=02

⎥⎦

⎢⎣

=∇22

)(x

xh

⎥⎤

⎢⎡

=∇1

)(xf

2

⎥⎦

⎢⎣

∇1

)(xf

x1 + x2 = 0

x1 + x2 = 1

22

x1 + x2 = -1

Exampleconsider the problem

min x1 + x2 subject to x12 + x2

2 = 21 2 j 1 2

∇h ⊥ to the iso-contour of h (x12 + x2

2 - 2 = 0)

⎥⎤

⎢⎡

=∇ 12)(

xxh

h(x)=02

⎥⎦

⎢⎣

=∇22

)(x

xh

⎥⎤

⎢⎡

=∇1

)(xf

2

⎥⎦

⎢⎣

∇1

)(xf

x1 + x2 = 0

x1 + x2 = 1

23

x1 + x2 = -1

Examplerecall that derivative along d is

)()( wfdwf +α

- moving along the tangent

))(,cos(.)(.)()(lim0

wfdwfdwfdwf∇∇=

−+→ α

αα

critical pointmoving along the tangentis descent as long as2

0),cos( <∇ftg- i.e. π/2 < angle(∇f,tg) < 3π/2- can always find such d unless

∇f ⊥ tg

2

),( g

∇f ⊥ tg- critical point when ∇f || ∇h- to find which type we need 2nd

d ( b f )

24

order (as before)critical point

Alternative viewconsider the tangent space to the iso-contour h(x) = 0this is the subspace of first order feasible variationsthis is the subspace of first order feasible variations

ixxhxxV Ti ∀=∆∇∆= ,0*)(|*)(

space of ∆x for which x + ∆x satisfies the constraint up to first order approximation

V(x*) feasible variations

x* ∇h(x*)h(x)=0

25

Feasible variationsmultiplying our first Lagrangian condition by ∆x

*)(*)( ∑ hf Tm

T λ

it follows that

0*)(*)(1

=∆∇+∆∇ ∑=

xxhxxf Ti

ii

T λ

this is a generalization of ∇f(x*)=0 in unconstrained case

*)(,0*)( xVxxxf T ∈∆∀=∆∇

this is a generalization of ∇f(x ) 0 in unconstrained case implies that ∇f(x*) ⊥ V(x*) and therefore ∇f(x*) || ∇h(x*)note:note:• Hessian constraint only defined for y in V(x*)• makes sense: we cannot move anywhere else, does not really

26

matter what Hessian is outside V(x*)

In summary for a constrained optimization problem, with equality constraintsTheorem: consider the problem

0)( subject to )(minarg* == xhxfx

where the constraint gradients hi(x*) are linearly independent. Then x* is a solution if and only if there

)(j)(g fx

exits a unique vector λ, such that

0*)(*)() =∇+∇ ∑ xhxfi i

m

0*)( s.t. ,0*)(*)( )

0)()( )

22

1

=∇∀≥⎥⎦

⎤⎢⎣

⎡∇+∇

∇+∇

∑=

yxhyyxhxfyii

xhxfi

Ti

m

iT

ii

i

λ

λ

27

1⎥⎦

⎢⎣

∑=i

Alternative formulationstate the conditions through the Lagrangian

m

th th b tl itt

)()(),(1

xhxfxL i

m

ii∑

=

+= λλ

the theorem can be compactly written as

0 )*,( )*

* =∇ xLi x λ

0*)( s.t. ,0)*,( )

0)*,(*2

*

=∇∀≥∇

=∇

yxhyyxLyii

xLii) T

xxT λ

λλ

the entries of λ are referred to as Lagrange multipliers

28

29