Optimization
Nuno Vasconcelos ECE Department, UCSDp ,
Optimizationmany engineering problems boil down to optimizationgoal: find maximum or minimum of a functiongoal: find maximum or minimum of a functionDefinition: given functions f, gi, i=1,...,k and hi, i=1,...mdefined on some domain Ω ∈ Rn
iwgwwf
i ∀≤Ω∈
,0)( subject to ),( min
f(w): cost; hi (equality), gi (inequality): constraints
iwhi ∀= ,0)(
for compactness we write g(w) ≤ 0 instead of gi(w) ≤ 0, ∀i. Similarly h(w) = 0
2
note that f(w) ≥ 0 ⇔ –f(w) ≤ 0 (no need for ≥ 0)
Optimizationnote: maximizing f(x) is the same as minimizing –f(x),this definition also works for maximizationthe feasible region is the region where f(.) is defined and all constraints hold
w* is a global minimum of f(w) if
0)(,0)(| =≤Ω∈=ℜ xhwgw
w* is a local minimum of f(w) if
Ω∈∀≥ wwfwf *),()(
w is a local minimum of f(w) iflocal
global*)()(* s.t. 0
ffww
≥
⇒<−>∃ εε
3
global*)()( wfwf ≥
The gradientthe gradient of a function f(w) at z is
Tff ⎞⎛ ∂∂
Th th di t i t i
n
zw
fzwfzf ⎟⎟
⎠
⎞⎜⎜⎝
⎛∂
∂∂∂
=∇−
)(,),()(10
L
Theorem: the gradient points in the direction of maximum growthproof:
∇f
proof:• from Taylor series expansion
)()()()( 2ααα Owfdwfdwf T +∇+=+• derivative along d
))(,cos(.)(.)()()(lim0
wfdwfdwfdwfdwf T ∇∇=∇=−+
→ αα
α(*)
4
• is maximum when d is in the direction of the gradient
maxThe gradient
max
note that if ∇f = 0• there is no direction of growthg• also -∇f = 0, and there is no direction of
decrease• we are either at a local minimum or maximumwe are either at a local minimum or maximum
or “saddle” point
conversely, at local min or max or saddle point
minpoint• no direction of growth or decrease• ∇f = 0
saddlethis shows that we have a critical point if and only if ∇f = 0to determine which type we need second
5
to determine which type we need second order conditions
maxThe Hessianif ∇f = 0, by Taylor series
)()( α wfdwf =+
and)()(
2)(
)()(
322
0
ααα
α
Odwfdwfd
wfdwf
TT +∇+∇+
+
321
i
and
T
)()(21)()( 2
2 αα
α Odwfdwfdwf T +∇=−+
minpick α such that O(α) << |dT∇2f d|, ∀d≠0• maximum at w if and only if dT∇2f d ≤ 0, ∀d≠0• minimum at w if and only if dT∇2f d ≥ 0 ∀d≠0
saddle• minimum at w if and only if dT∇2f d ≥ 0, ∀d≠0• saddle otherwise
this proves the following theorems
6
this proves the following theorems
Minima conditions (unconstrained)let f(w) be continuously differentiablew* is a local minimum of f(w) if and only ifw is a local minimum of f(w) if and only if• f has zero gradient at w*
0*)( =∇ wf
• and the Hessian of f at w* is positive definite
0)( =∇ wf
nt ddwfd ℜ∈∀≥∇ 0*)(2
• where⎤⎡ ∂∂ 22 ff
ddwfd ℜ∈∀≥∇ ,0)(
⎥⎥⎥⎥⎤
⎢⎢⎢⎢⎡
∂∂
∂∂∂
∂∂
=∇−
)()(
)(22
1020
2
ff
xxxfx
xf
xfn
M
L
7
⎥⎥
⎦⎢⎢
⎣ ∂∂
∂∂∂
−−
)()( 21
2
01
2
xx
fxxx
f
nn
L
Maxima conditions (unconstrained)let f(w) be continuously differentiablew* is a local maximum of f(w) if and only ifw is a local maximum of f(w) if and only if• f has zero gradient at w*
0*)( =∇ wf
• and the Hessian of f at w* is negative definite
0)( =∇ wf
nt ddwfd ℜ∈∀≤∇ 0*)(2
• where⎤⎡ ∂∂ 22 ff
ddwfd ℜ∈∀≤∇ ,0)(
⎥⎥⎥⎥⎤
⎢⎢⎢⎢⎡
∂∂
∂∂∂
∂∂
=∇−
)()(
)(22
1020
2
ff
xxxfx
xf
xfn
M
L
8
⎥⎥
⎦⎢⎢
⎣ ∂∂
∂∂∂
−−
)()( 21
2
01
2
xx
fxxx
f
nn
L
Exampleconsider the functions
f(x) = x1 + x2 g(x) = x12 + x2
2( ) 1 2 g( ) 1 2
the gradients are
⎤⎡ 12x⎤⎡1
f has no minima or maxima
⎥⎦
⎤⎢⎣
⎡=∇
2
1
22
)(xx
xg⎥⎦
⎤⎢⎣
⎡=∇
11
)(xf
f has no minima or maxima, g has a critical point at the origin x = (0,0)since Hessian is positive definite this is a minimumsince Hessian is positive definite, this is a minimum
⎥⎦
⎤⎢⎣
⎡=∇
2002
)(2 xg
9
⎦⎣ 20
Examplemakes sense because
f(x) = x1 + x2( ) 1 2
is a plane, gradient is constant
⎥⎦
⎤⎢⎣
⎡=∇
11
)(xf
iso-contours of f(x)of f(x)
x1 + x2 = 0
x1 + x2 = 1
10
x1 + x2 0
x1 + x2 = -1
Examplemakes sense because
g(x) = x12 + x2
2g( ) 1 2
is a quadratic, positive everywhere but the originnote how gradient points towards largest increase g p g
g(x)=22
⎥⎦
⎤⎢⎣
⎡=∇
20
)(xh⎥⎦
⎤⎢⎣
⎡=∇
22
)(xh
⎥⎦
⎤⎢⎣
⎡=∇
2
1
22
)(xx
xh
1
1
⎤⎡22
g(x)=11
⎥⎦
⎤⎢⎣
⎡=∇
02
)(xh
11
Convex functionsDefinition: f(w) is convex if∀w,u ∈ Ω and λ ∈ [0,1]
)()1()())1(( fff λλλλ ≤
Theorem: f(w) is convex if and only if its Hessian is iti d fi it f ll
)()1()())1(( ufwfuwf λλλλ −+≤−+
positive definite for all w
Ω∈∀≥∇ wwwfw t ,0)(2
f(w)
λf(w)+(1-λ)f(v)
proof: • requires some
intermediate res lts that
( )f(u)
intermediate results thatwe will not cover
• we will skip itu
w f(λw+(1-λ)v)
12
λw+(1-λ)v
Concave functionsDefinition: f(w) is concave if∀w,u ∈ Ω and λ ∈ [0,1]
)()1()())1(( fff λλλλ ≥
Theorem: f(w) is concave if and only if its Hessian is ti d fi it f ll
)()1()())1(( ufwfuwf λλλλ −+≥−+
negative definite for all w
Ω∈∀≤∇ wwwfw t ,0)(2
proof:• -f(w) is convex• by previous theorem, Hessian
is negative definite• Hessian of f(w) is positive definite
13
( ) p
Convex functionsTheorem: if f(w) is convex any local minimum w* is also a global minimumgProof:• we need to show that, for any u, f(w*) ≤ f(u)• for any u: ||w*-[λw*+(1-λ)u|| = (1-λ) ||w*-u||• and, making λ arbitrarily close to 1, we can make
||w*-[λw*+(1-λ)u|| ≤ ε, for any ε > 0
• since w* is local minimum, it follows that f(w*) ≤ f(λw*+(1-λ)u)and by convexity that f(w*) ≤ λf(w*)+(1-λ)f(u)and, by convexity, that f(w ) ≤ λf(w ) (1 λ)f(u)
• or f(w*)(1-λ) ≤ f(u)(1-λ)• and f(w*) ≤ f(u)
14
Constrained optimizationin summary:• we know what are conditions for unconstrained max and minwe know what are conditions for unconstrained max and min• we like convex functions (find a minima, it will be global minimum)
what about optimization with constraints?pa few definitions to start withinequality gi(w) ≤ 0:q y gi( )• is active if gi(w) = 0, otherwise inactive
inequalities can be expressed as equalities by introduction of slack variables
0 and ,0)( 0)( ≥=+⇔≤ iiii wgwg ξξ
15
Convex optimizationDefinition: a set Ω is convex if ∀w,u ∈ Ω and λ ∈ [0,1]then λw+(1-λ)u ∈ Ω( )“a line between any two points in Ω is also in Ω”
convex not convex
Definition: an optimization problem where the set Ω, the cost f and all constraints g and h are convex is said to be convexconvexnote: linear constraints g(x) = Ax+b are always convex (zero Hessian)
16
( )
Constrained optimization we will consider general (not only convex) constrained optimization problems, start by case with only equalitiesp p , y y qTheorem: consider the problem
0)( subject to )(minarg* == xhxfx
where the constraint gradients hi(x*) are linearly independent. Then x* is a solution if and only if there
)(j)(g fx
exits a unique vector λ, such that
0*)(*)() =∇+∇ ∑ xhxfi i
m
iλ
0*)( s.t. ,0*)(*)( )
0)()( )
22
1
=∇∀≥⎥⎦
⎤⎢⎣
⎡∇+∇
∇+∇
∑
∑=
yxhyyxhxfyii
xhxfi
Ti
m
iT
ii
i
λ
λ
17
1⎥⎦
⎢⎣
∑=i
Alternative formulationstate the conditions through the Lagrangian
m
th th b tl itt
)()(),(1
xhxfxL i
m
ii∑
=
+= λλ
the theorem can be compactly written as
0 )*,( )*
* =∇ xLi x λ
0*)( s.t. ,0)*,( )
0)*,(*2
*
=∇∀≥∇
=∇
yxhyyxLyii
xLii) T
xxT λ
λλ
the entries of λ are referred to as Lagrange multipliers
18
Gradient (revisited)recall from (*) that derivative of f along d is
thi th t
))(,cos(.)(.)()()(lim0
wfdwfdwfdwfdwf T ∇∇=∇=−+
→ αα
α
this means that• greatest increase when d || f• no increase when d ⊥ f
∇fno increase
no increase when d ⊥ f• since there is no increase when
d is tangent to iso-contour f(x) = kh di i di l• the gradient is perpendicular to
the tangent of the iso-contour
this suggests a geometric proof
19
gg g p
Lagrangian optimizationgeometric interpretation:• since h(x)=0 is a iso-contour of h(x) ∇h(x*) is perpendicular to• since h(x)=0 is a iso-contour of h(x), ∇h(x ) is perpendicular to
the iso-contour• i) says that ∇f (x*) ∈ span∇hi(x*)• i.e. ∇f ⊥ to tangent space of the constraint surface• intuitive
• direction of largest increase of
span∇h(x*)
direction of largest increase off is ⊥ to constraint surface
• the gradient is zero along theconstraint
tg plane
• no way to give an infinitesimalgradient step, without ending upviolating it
20
h(x)=0• it is impossible to increase f and
still satisfy the constraint
Exampleconsider the problem
min x1 + x2 subject to x12 + x2
2 = 21 2 j 1 2
it leads to the following picture
⎥⎤
⎢⎡
=∇ 12)(
xxh
h(x)=0
i t
2
⎥⎦
⎢⎣
=∇22
)(x
xh
⎥⎤
⎢⎡
=∇1
)(xf
iso-contours of f(x)2
⎥⎦
⎢⎣
∇1
)(xf
x1 + x2 = 0
x1 + x2 = 1
21
x1 + x2 = -1
Exampleconsider the problem
min x1 + x2 subject to x12 + x2
2 = 21 2 j 1 2
∇f ⊥ to the iso-contours of f (x1 + x2 = k)
⎥⎤
⎢⎡
=∇ 12)(
xxh
h(x)=02
⎥⎦
⎢⎣
=∇22
)(x
xh
⎥⎤
⎢⎡
=∇1
)(xf
2
⎥⎦
⎢⎣
∇1
)(xf
x1 + x2 = 0
x1 + x2 = 1
22
x1 + x2 = -1
Exampleconsider the problem
min x1 + x2 subject to x12 + x2
2 = 21 2 j 1 2
∇h ⊥ to the iso-contour of h (x12 + x2
2 - 2 = 0)
⎥⎤
⎢⎡
=∇ 12)(
xxh
h(x)=02
⎥⎦
⎢⎣
=∇22
)(x
xh
⎥⎤
⎢⎡
=∇1
)(xf
2
⎥⎦
⎢⎣
∇1
)(xf
x1 + x2 = 0
x1 + x2 = 1
23
x1 + x2 = -1
Examplerecall that derivative along d is
)()( wfdwf +α
- moving along the tangent
))(,cos(.)(.)()(lim0
wfdwfdwfdwf∇∇=
−+→ α
αα
critical pointmoving along the tangentis descent as long as2
0),cos( <∇ftg- i.e. π/2 < angle(∇f,tg) < 3π/2- can always find such d unless
∇f ⊥ tg
2
),( g
∇f ⊥ tg- critical point when ∇f || ∇h- to find which type we need 2nd
d ( b f )
24
order (as before)critical point
Alternative viewconsider the tangent space to the iso-contour h(x) = 0this is the subspace of first order feasible variationsthis is the subspace of first order feasible variations
ixxhxxV Ti ∀=∆∇∆= ,0*)(|*)(
space of ∆x for which x + ∆x satisfies the constraint up to first order approximation
V(x*) feasible variations
x* ∇h(x*)h(x)=0
25
Feasible variationsmultiplying our first Lagrangian condition by ∆x
*)(*)( ∑ hf Tm
T λ
it follows that
0*)(*)(1
=∆∇+∆∇ ∑=
xxhxxf Ti
ii
T λ
this is a generalization of ∇f(x*)=0 in unconstrained case
*)(,0*)( xVxxxf T ∈∆∀=∆∇
this is a generalization of ∇f(x ) 0 in unconstrained case implies that ∇f(x*) ⊥ V(x*) and therefore ∇f(x*) || ∇h(x*)note:note:• Hessian constraint only defined for y in V(x*)• makes sense: we cannot move anywhere else, does not really
26
matter what Hessian is outside V(x*)
In summary for a constrained optimization problem, with equality constraintsTheorem: consider the problem
0)( subject to )(minarg* == xhxfx
where the constraint gradients hi(x*) are linearly independent. Then x* is a solution if and only if there
)(j)(g fx
exits a unique vector λ, such that
0*)(*)() =∇+∇ ∑ xhxfi i
m
iλ
0*)( s.t. ,0*)(*)( )
0)()( )
22
1
=∇∀≥⎥⎦
⎤⎢⎣
⎡∇+∇
∇+∇
∑
∑=
yxhyyxhxfyii
xhxfi
Ti
m
iT
ii
i
λ
λ
27
1⎥⎦
⎢⎣
∑=i
Alternative formulationstate the conditions through the Lagrangian
m
th th b tl itt
)()(),(1
xhxfxL i
m
ii∑
=
+= λλ
the theorem can be compactly written as
0 )*,( )*
* =∇ xLi x λ
0*)( s.t. ,0)*,( )
0)*,(*2
*
=∇∀≥∇
=∇
yxhyyxLyii
xLii) T
xxT λ
λλ
the entries of λ are referred to as Lagrange multipliers
28
29
Top Related