optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as...

29
Optimization Nuno Vasconcelos ECE Department, UCSD

Transcript of optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as...

Page 1: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Optimization

Nuno Vasconcelos ECE Department, UCSDp ,

Page 2: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Optimizationmany engineering problems boil down to optimizationgoal: find maximum or minimum of a functiongoal: find maximum or minimum of a functionDefinition: given functions f, gi, i=1,...,k and hi, i=1,...mdefined on some domain Ω ∈ Rn

iwgwwf

i ∀≤Ω∈

,0)( subject to ),( min

f(w): cost; hi (equality), gi (inequality): constraints

iwhi ∀= ,0)(

for compactness we write g(w) ≤ 0 instead of gi(w) ≤ 0, ∀i. Similarly h(w) = 0

2

note that f(w) ≥ 0 ⇔ –f(w) ≤ 0 (no need for ≥ 0)

Page 3: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Optimizationnote: maximizing f(x) is the same as minimizing –f(x),this definition also works for maximizationthe feasible region is the region where f(.) is defined and all constraints hold

w* is a global minimum of f(w) if

0)(,0)(| =≤Ω∈=ℜ xhwgw

w* is a local minimum of f(w) if

Ω∈∀≥ wwfwf *),()(

w is a local minimum of f(w) iflocal

global*)()(* s.t. 0

ffww

⇒<−>∃ εε

3

global*)()( wfwf ≥

Page 4: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

The gradientthe gradient of a function f(w) at z is

Tff ⎞⎛ ∂∂

Th th di t i t i

n

zw

fzwfzf ⎟⎟

⎞⎜⎜⎝

⎛∂

∂∂∂

=∇−

)(,),()(10

L

Theorem: the gradient points in the direction of maximum growthproof:

∇f

proof:• from Taylor series expansion

)()()()( 2ααα Owfdwfdwf T +∇+=+• derivative along d

))(,cos(.)(.)()()(lim0

wfdwfdwfdwfdwf T ∇∇=∇=−+

→ αα

α(*)

4

• is maximum when d is in the direction of the gradient

Page 5: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

maxThe gradient

max

note that if ∇f = 0• there is no direction of growthg• also -∇f = 0, and there is no direction of

decrease• we are either at a local minimum or maximumwe are either at a local minimum or maximum

or “saddle” point

conversely, at local min or max or saddle point

minpoint• no direction of growth or decrease• ∇f = 0

saddlethis shows that we have a critical point if and only if ∇f = 0to determine which type we need second

5

to determine which type we need second order conditions

Page 6: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

maxThe Hessianif ∇f = 0, by Taylor series

)()( α wfdwf =+

and)()(

2)(

)()(

322

0

ααα

α

Odwfdwfd

wfdwf

TT +∇+∇+

+

321

i

and

T

)()(21)()( 2

2 αα

α Odwfdwfdwf T +∇=−+

minpick α such that O(α) << |dT∇2f d|, ∀d≠0• maximum at w if and only if dT∇2f d ≤ 0, ∀d≠0• minimum at w if and only if dT∇2f d ≥ 0 ∀d≠0

saddle• minimum at w if and only if dT∇2f d ≥ 0, ∀d≠0• saddle otherwise

this proves the following theorems

6

this proves the following theorems

Page 7: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Minima conditions (unconstrained)let f(w) be continuously differentiablew* is a local minimum of f(w) if and only ifw is a local minimum of f(w) if and only if• f has zero gradient at w*

0*)( =∇ wf

• and the Hessian of f at w* is positive definite

0)( =∇ wf

nt ddwfd ℜ∈∀≥∇ 0*)(2

• where⎤⎡ ∂∂ 22 ff

ddwfd ℜ∈∀≥∇ ,0)(

⎥⎥⎥⎥⎤

⎢⎢⎢⎢⎡

∂∂

∂∂∂

∂∂

=∇−

)()(

)(22

1020

2

ff

xxxfx

xf

xfn

M

L

7

⎥⎥

⎦⎢⎢

⎣ ∂∂

∂∂∂

−−

)()( 21

2

01

2

xx

fxxx

f

nn

L

Page 8: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Maxima conditions (unconstrained)let f(w) be continuously differentiablew* is a local maximum of f(w) if and only ifw is a local maximum of f(w) if and only if• f has zero gradient at w*

0*)( =∇ wf

• and the Hessian of f at w* is negative definite

0)( =∇ wf

nt ddwfd ℜ∈∀≤∇ 0*)(2

• where⎤⎡ ∂∂ 22 ff

ddwfd ℜ∈∀≤∇ ,0)(

⎥⎥⎥⎥⎤

⎢⎢⎢⎢⎡

∂∂

∂∂∂

∂∂

=∇−

)()(

)(22

1020

2

ff

xxxfx

xf

xfn

M

L

8

⎥⎥

⎦⎢⎢

⎣ ∂∂

∂∂∂

−−

)()( 21

2

01

2

xx

fxxx

f

nn

L

Page 9: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Exampleconsider the functions

f(x) = x1 + x2 g(x) = x12 + x2

2( ) 1 2 g( ) 1 2

the gradients are

⎤⎡ 12x⎤⎡1

f has no minima or maxima

⎥⎦

⎤⎢⎣

⎡=∇

2

1

22

)(xx

xg⎥⎦

⎤⎢⎣

⎡=∇

11

)(xf

f has no minima or maxima, g has a critical point at the origin x = (0,0)since Hessian is positive definite this is a minimumsince Hessian is positive definite, this is a minimum

⎥⎦

⎤⎢⎣

⎡=∇

2002

)(2 xg

9

⎦⎣ 20

Page 10: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Examplemakes sense because

f(x) = x1 + x2( ) 1 2

is a plane, gradient is constant

⎥⎦

⎤⎢⎣

⎡=∇

11

)(xf

iso-contours of f(x)of f(x)

x1 + x2 = 0

x1 + x2 = 1

10

x1 + x2 0

x1 + x2 = -1

Page 11: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Examplemakes sense because

g(x) = x12 + x2

2g( ) 1 2

is a quadratic, positive everywhere but the originnote how gradient points towards largest increase g p g

g(x)=22

⎥⎦

⎤⎢⎣

⎡=∇

20

)(xh⎥⎦

⎤⎢⎣

⎡=∇

22

)(xh

⎥⎦

⎤⎢⎣

⎡=∇

2

1

22

)(xx

xh

1

1

⎤⎡22

g(x)=11

⎥⎦

⎤⎢⎣

⎡=∇

02

)(xh

11

Page 12: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Convex functionsDefinition: f(w) is convex if∀w,u ∈ Ω and λ ∈ [0,1]

)()1()())1(( fff λλλλ ≤

Theorem: f(w) is convex if and only if its Hessian is iti d fi it f ll

)()1()())1(( ufwfuwf λλλλ −+≤−+

positive definite for all w

Ω∈∀≥∇ wwwfw t ,0)(2

f(w)

λf(w)+(1-λ)f(v)

proof: • requires some

intermediate res lts that

( )f(u)

intermediate results thatwe will not cover

• we will skip itu

w f(λw+(1-λ)v)

12

λw+(1-λ)v

Page 13: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Concave functionsDefinition: f(w) is concave if∀w,u ∈ Ω and λ ∈ [0,1]

)()1()())1(( fff λλλλ ≥

Theorem: f(w) is concave if and only if its Hessian is ti d fi it f ll

)()1()())1(( ufwfuwf λλλλ −+≥−+

negative definite for all w

Ω∈∀≤∇ wwwfw t ,0)(2

proof:• -f(w) is convex• by previous theorem, Hessian

is negative definite• Hessian of f(w) is positive definite

13

( ) p

Page 14: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Convex functionsTheorem: if f(w) is convex any local minimum w* is also a global minimumgProof:• we need to show that, for any u, f(w*) ≤ f(u)• for any u: ||w*-[λw*+(1-λ)u|| = (1-λ) ||w*-u||• and, making λ arbitrarily close to 1, we can make

||w*-[λw*+(1-λ)u|| ≤ ε, for any ε > 0

• since w* is local minimum, it follows that f(w*) ≤ f(λw*+(1-λ)u)and by convexity that f(w*) ≤ λf(w*)+(1-λ)f(u)and, by convexity, that f(w ) ≤ λf(w ) (1 λ)f(u)

• or f(w*)(1-λ) ≤ f(u)(1-λ)• and f(w*) ≤ f(u)

14

Page 15: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Constrained optimizationin summary:• we know what are conditions for unconstrained max and minwe know what are conditions for unconstrained max and min• we like convex functions (find a minima, it will be global minimum)

what about optimization with constraints?pa few definitions to start withinequality gi(w) ≤ 0:q y gi( )• is active if gi(w) = 0, otherwise inactive

inequalities can be expressed as equalities by introduction of slack variables

0 and ,0)( 0)( ≥=+⇔≤ iiii wgwg ξξ

15

Page 16: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Convex optimizationDefinition: a set Ω is convex if ∀w,u ∈ Ω and λ ∈ [0,1]then λw+(1-λ)u ∈ Ω( )“a line between any two points in Ω is also in Ω”

convex not convex

Definition: an optimization problem where the set Ω, the cost f and all constraints g and h are convex is said to be convexconvexnote: linear constraints g(x) = Ax+b are always convex (zero Hessian)

16

( )

Page 17: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Constrained optimization we will consider general (not only convex) constrained optimization problems, start by case with only equalitiesp p , y y qTheorem: consider the problem

0)( subject to )(minarg* == xhxfx

where the constraint gradients hi(x*) are linearly independent. Then x* is a solution if and only if there

)(j)(g fx

exits a unique vector λ, such that

0*)(*)() =∇+∇ ∑ xhxfi i

m

0*)( s.t. ,0*)(*)( )

0)()( )

22

1

=∇∀≥⎥⎦

⎤⎢⎣

⎡∇+∇

∇+∇

∑=

yxhyyxhxfyii

xhxfi

Ti

m

iT

ii

i

λ

λ

17

1⎥⎦

⎢⎣

∑=i

Page 18: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Alternative formulationstate the conditions through the Lagrangian

m

th th b tl itt

)()(),(1

xhxfxL i

m

ii∑

=

+= λλ

the theorem can be compactly written as

0 )*,( )*

* =∇ xLi x λ

0*)( s.t. ,0)*,( )

0)*,(*2

*

=∇∀≥∇

=∇

yxhyyxLyii

xLii) T

xxT λ

λλ

the entries of λ are referred to as Lagrange multipliers

18

Page 19: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Gradient (revisited)recall from (*) that derivative of f along d is

thi th t

))(,cos(.)(.)()()(lim0

wfdwfdwfdwfdwf T ∇∇=∇=−+

→ αα

α

this means that• greatest increase when d || f• no increase when d ⊥ f

∇fno increase

no increase when d ⊥ f• since there is no increase when

d is tangent to iso-contour f(x) = kh di i di l• the gradient is perpendicular to

the tangent of the iso-contour

this suggests a geometric proof

19

gg g p

Page 20: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Lagrangian optimizationgeometric interpretation:• since h(x)=0 is a iso-contour of h(x) ∇h(x*) is perpendicular to• since h(x)=0 is a iso-contour of h(x), ∇h(x ) is perpendicular to

the iso-contour• i) says that ∇f (x*) ∈ span∇hi(x*)• i.e. ∇f ⊥ to tangent space of the constraint surface• intuitive

• direction of largest increase of

span∇h(x*)

direction of largest increase off is ⊥ to constraint surface

• the gradient is zero along theconstraint

tg plane

• no way to give an infinitesimalgradient step, without ending upviolating it

20

h(x)=0• it is impossible to increase f and

still satisfy the constraint

Page 21: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Exampleconsider the problem

min x1 + x2 subject to x12 + x2

2 = 21 2 j 1 2

it leads to the following picture

⎥⎤

⎢⎡

=∇ 12)(

xxh

h(x)=0

i t

2

⎥⎦

⎢⎣

=∇22

)(x

xh

⎥⎤

⎢⎡

=∇1

)(xf

iso-contours of f(x)2

⎥⎦

⎢⎣

∇1

)(xf

x1 + x2 = 0

x1 + x2 = 1

21

x1 + x2 = -1

Page 22: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Exampleconsider the problem

min x1 + x2 subject to x12 + x2

2 = 21 2 j 1 2

∇f ⊥ to the iso-contours of f (x1 + x2 = k)

⎥⎤

⎢⎡

=∇ 12)(

xxh

h(x)=02

⎥⎦

⎢⎣

=∇22

)(x

xh

⎥⎤

⎢⎡

=∇1

)(xf

2

⎥⎦

⎢⎣

∇1

)(xf

x1 + x2 = 0

x1 + x2 = 1

22

x1 + x2 = -1

Page 23: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Exampleconsider the problem

min x1 + x2 subject to x12 + x2

2 = 21 2 j 1 2

∇h ⊥ to the iso-contour of h (x12 + x2

2 - 2 = 0)

⎥⎤

⎢⎡

=∇ 12)(

xxh

h(x)=02

⎥⎦

⎢⎣

=∇22

)(x

xh

⎥⎤

⎢⎡

=∇1

)(xf

2

⎥⎦

⎢⎣

∇1

)(xf

x1 + x2 = 0

x1 + x2 = 1

23

x1 + x2 = -1

Page 24: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Examplerecall that derivative along d is

)()( wfdwf +α

- moving along the tangent

))(,cos(.)(.)()(lim0

wfdwfdwfdwf∇∇=

−+→ α

αα

critical pointmoving along the tangentis descent as long as2

0),cos( <∇ftg- i.e. π/2 < angle(∇f,tg) < 3π/2- can always find such d unless

∇f ⊥ tg

2

),( g

∇f ⊥ tg- critical point when ∇f || ∇h- to find which type we need 2nd

d ( b f )

24

order (as before)critical point

Page 25: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Alternative viewconsider the tangent space to the iso-contour h(x) = 0this is the subspace of first order feasible variationsthis is the subspace of first order feasible variations

ixxhxxV Ti ∀=∆∇∆= ,0*)(|*)(

space of ∆x for which x + ∆x satisfies the constraint up to first order approximation

V(x*) feasible variations

x* ∇h(x*)h(x)=0

25

Page 26: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Feasible variationsmultiplying our first Lagrangian condition by ∆x

*)(*)( ∑ hf Tm

T λ

it follows that

0*)(*)(1

=∆∇+∆∇ ∑=

xxhxxf Ti

ii

T λ

this is a generalization of ∇f(x*)=0 in unconstrained case

*)(,0*)( xVxxxf T ∈∆∀=∆∇

this is a generalization of ∇f(x ) 0 in unconstrained case implies that ∇f(x*) ⊥ V(x*) and therefore ∇f(x*) || ∇h(x*)note:note:• Hessian constraint only defined for y in V(x*)• makes sense: we cannot move anywhere else, does not really

26

matter what Hessian is outside V(x*)

Page 27: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

In summary for a constrained optimization problem, with equality constraintsTheorem: consider the problem

0)( subject to )(minarg* == xhxfx

where the constraint gradients hi(x*) are linearly independent. Then x* is a solution if and only if there

)(j)(g fx

exits a unique vector λ, such that

0*)(*)() =∇+∇ ∑ xhxfi i

m

0*)( s.t. ,0*)(*)( )

0)()( )

22

1

=∇∀≥⎥⎦

⎤⎢⎣

⎡∇+∇

∇+∇

∑=

yxhyyxhxfyii

xhxfi

Ti

m

iT

ii

i

λ

λ

27

1⎥⎦

⎢⎣

∑=i

Page 28: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

Alternative formulationstate the conditions through the Lagrangian

m

th th b tl itt

)()(),(1

xhxfxL i

m

ii∑

=

+= λλ

the theorem can be compactly written as

0 )*,( )*

* =∇ xLi x λ

0*)( s.t. ,0)*,( )

0)*,(*2

*

=∇∀≥∇

=∇

yxhyyxLyii

xLii) T

xxT λ

λλ

the entries of λ are referred to as Lagrange multipliers

28

Page 29: optimization - SVCL · 2009. 10. 6. · Optimization note: maximizing f(x) is the same as minimizing –f(x), this definition also works for maximization the feasible regionis the

29