Post on 04-Nov-2021
Slide 1
18-660: Numerical Methods for Engineering Design and Optimization
Xin Li Department of ECE
Carnegie Mellon University Pittsburgh, PA 15213
Slide 2
Overview
Unconstrained Optimization Gradient method Newton method
Slide 3
Unconstrained Optimization
Linear regression with regularization
Unconstrained optimization: minimizing a cost function without any constraint Golden section search Downhill simplex method Gradient method Newton method
BA =α 1
2
2min αλαα
⋅+− BA
Non-derivative method
Rely on derivatives (this lecture)
Slide 4
Gradient Method
If a cost function is smooth, its derivative information can be used to search optimal solution
x
f(x)
( )xfx
min
Slide 5
Gradient Method
For illustration purpose, we start from a one-dimensional case
( )xfx
min
( ) ( ) ( ) ( )
( )
( )( )01 >⋅−=−=∆ + k
x
kkkk
kdxdfxxx λλ
Step size Derivative
( ) ( ) ( )
( )
( )( )01 >⋅−=+ k
x
kkk
kdxdfxx λλ
Slide 6
Gradient Method
One-dimensional case (continued) ( ) ( ) ( ) ( )
( )
( )[ ] ( )[ ]( )
( )k
x
kk
x
kkkk xdxdfxfxf
dxdfxxx
kk
∆⋅+≈⋅−=−=∆ ++ 11 &λ
( )[ ] ( )[ ]( )
( )
( )
⋅−⋅+≈+
kk x
k
x
kk
dxdf
dxdfxfxf λ1
( )[ ] ( )[ ] ( )
( )
21
kx
kkk
dxdfxfxf ⋅−≈+ λ
The cost function f(x) keeps decreasing if the derivative is non-zero
λ(k) > 0
Slide 7
Gradient Method
One-dimensional case (continued)
0=⋅−=∆ dxdfx λ
x
f(x)
Derivative is zero at local optimum (gradient method converges)
Slide 8
Gradient Method
Two-dimensional case ( )21,
,min21
xxfxx
( )
( )
( ) ( )
( ) ( )( ) ( ) ( )[ ]kkk
kk
kk
k
k
xxfxxxx
xx
212
12
11
1
2
1 ,∇⋅−=
−−
=
∆∆
+
+
λ
( )
∂∂∂∂
=∇2
121, xf
xfxxf
( )
( )
( )
( )( ) ( ) ( )[ ]kkk
k
k
k
k
xxfxx
xx
212
11
2
11 ,∇⋅−
=
+
+
λ
Slide 9
Gradient Method
Two-dimensional case (continued)
The cost function f(x1, x2) keeps decreasing if the gradient is non-zero
λ(k) > 0
( )
( )( ) ( ) ( )[ ]
( ) ( )[ ] ( ) ( )[ ] ( ) ( )[ ]( )
( )
∆∆
⋅∇+≈
∇⋅−=
∆∆
++k
kTkkkkkk
kkkk
k
xx
xxfxxfxxf
xxfxx
2
12121
12
11
212
1
,,,
,λ
( ) ( )[ ] ( ) ( )[ ] ( ) ( ) ( )[ ] ( ) ( )[ ]kkTkkkkkkk xxfxxfxxfxxf 2121211
21
1 ,,,, ∇⋅∇⋅−≈++ λ
( ) ( )[ ] ( ) ( )[ ] ( ) ( ) ( )[ ] 2
221211
21
1 ,,, kkkkkkk xxfxxfxxf ∇⋅−≈++ λ
Slide 10
Gradient Method
N-dimensional case ( )Xf
Xmin
( )
∂∂∂∂
=∇
2
1
xfxf
Xf
( ) ( ) ( ) ( )[ ]kkkk XfXX ∇⋅−=+ λ1
Slide 11
Newton Method
Gradient method relies on first-order derivative information Each iteration is simple, but it converges to optimum slowly I.e., require a large number of iteration steps
The step size λ(k) can be optimized by one-dimensional search
for fast convergence
Alternative algorithm: Newton method Rely on both first-order and second-order derivatives Each iteration is more expensive But it converges to optimum more quickly, i.e., requires a
smaller number of iterations to reach convergence
( )( ) ( ) ( )[ ]{ }kkk XfXf
k∇⋅−λ
λmin
Slide 12
Newton Method
One-dimensional case ( )xf
xmin
( ) ( ) ( )
( ) ( )[ ]kk
xxx
xxdx
fddxdf
dxdf
kkk
−⋅+≈ +
+
12
2
1
( ) ( )
( ) ( )[ ]kk
xx
xxdx
fddxdf
kk
−⋅+= +12
2
0
First-order derivative is zero at local optimum
Slide 13
Newton Method
One-dimensional case (continued) ( ) ( )
( ) ( )[ ] 012
2
=−⋅+ + kk
xx
xxdx
fddxdf
kk
( ) ( ) ( )
( ) ( )kk xx
kkk
dxdf
dxfdxxx ⋅−=−=∆
−
+
1
2
21
( ) ( )
( ) ( )kk xx
kk
dxdf
dxfdxx ⋅−=
−
+
1
2
21
Slide 14
Newton Method
One-dimensional case (continued)
( )
( ) ( )
( )[ ] ( )[ ]( )
( )k
x
kk
xx
k xdxdfxfxf
dxdf
dxfdx
kkk
∆⋅+≈⋅−=∆ +
−
11
2
2
&
( )[ ] ( )[ ]( ) ( ) ( )
⋅−⋅+≈
−
+
kkk xxx
kk
dxdf
dxfd
dxdfxfxf
1
2
21
( )[ ] ( )[ ]( ) ( )
21
2
21
kk xx
kk
dxdf
dxfdxfxf ⋅−≈
−
+
Positive (convex function)
Slide 15
Newton Method
One-dimensional case (continued)
Newton method gives an estimation of the optimal step size λ(k) using second-order derivative
( ) ( )
( ) ( )kk xx
kk
dxdf
dxfdxx ⋅−=
−
+
1
2
21( ) ( ) ( )
( )kx
kkk
dxdfxx ⋅−=+ λ1
Gradient method Newton method
( )
( )
01
2
2
>=−
kx
k
dxfdλ (convex function)
Slide 16
Newton Method
One-dimensional case (continued) The step size estimation is based on the following linear
approximation for first-order derivative
The approximation is exact if and only if f(x) is quadratic and, therefore, df/dx is linear
( ) ( ) ( )
( ) ( )[ ]kk
xxx
xxdx
fddxdf
dxdf
kkk
−⋅+≈ +
+
12
2
1
( ) baxdx
dfcbxaxxf +=⇒++= 22
Slide 17
Newton Method
One-dimensional case (continued) If the actual f(x) is not quadratic, the following step size
estimation may be non-optimal
Using this step size may result in bad convergence
In these cases, a damping factor β is typically introduced
( )
( )
1
2
2 −
=kx
k
dxfdλ
( ) ( )
( ) ( )( )10
1
2
21 <<⋅⋅−=
−
+ ββkk xx
kk
dxdf
dxfdxx
Slide 18
Newton Method
Two-dimensional case
( ) ( )[ ] ( ) ( )[ ] ( ) ( )[ ]( ) ( )
( ) ( )
−−
⋅∇+∇≈∇+
+++
kk
kkkkkkkk
xxxx
xxfxxfxxf2
12
11
121
221
12
11 ,,,
( )21,,min
21
xxfxx
( ) ( )
∂∂∂∂∂∂∂∂∂∂
=∇
∂∂∂∂
=∇ 22
221
221
221
2
212
2
121 ,&,
xfxxfxxfxf
xxfxfxf
xxf
Hessian matrix
Slide 19
Newton Method
Two-dimensional case (continued)
( ) ( )[ ] ( ) ( )[ ] ( ) ( )[ ]( ) ( )
( ) ( ) 0,,,2
12
11
121
221
12
11 =
−−
⋅∇+∇≈∇+
+++
kk
kkkkkkkk
xxxx
xxfxxfxxf
( )
( )
( ) ( )
( ) ( )( ) ( )[ ] ( ) ( )[ ]kkkk
kk
kk
k
k
xxfxxfxxxx
xx
211
212
21
2
11
11
2
11 ,, ∇⋅−∇=
−−
=
∆∆ −
+
+
+
+
( )
( )
( )
( )( ) ( )[ ] ( ) ( )[ ]kkkk
k
k
k
k
xxfxxfxx
xx
211
212
2
11
2
11 ,, ∇⋅∇−
=
−
+
+
Slide 20
Newton Method
Two-dimensional case (continued)
Positive definite (convex function)
( )
( )( ) ( )[ ] ( ) ( )[ ]
( ) ( )[ ] ( ) ( )[ ] ( ) ( )[ ]( )
( )
∆∆
⋅∇+≈
∇⋅−∇=
∆∆
++
−
k
kTkkkkkk
kkkkk
k
xx
xxfxxfxxf
xxfxxfxx
2
12121
12
11
211
212
2
1
,,,
,,
( ) ( )[ ] ( ) ( )[ ] ( ) ( )[ ] ( ) ( )[ ] ( ) ( )[ ]kkkkTkkkkkk xxfxxfxxfxxfxxf 211
212
21211
21
1 ,,,,, ∇⋅∇⋅∇−≈−++
Slide 21
Newton Method
Two-dimensional case (continued)
Gradient method and Newton method do not move along the same direction
Gradient method Newton method
( ) ( ) ( ) ( )[ ]kkkk xxfX 21 ,∇⋅−=∆ λ ( ) ( ) ( )[ ] ( ) ( )[ ]kkkkk xxfxxfX 211
212 ,, ∇⋅−∇=∆
−
x1
x2 Gradient
Newton
( ) CXBAXXXf TT ++=
Slide 22
Newton Method
Newton method can be extended to high-dimensional cases
The following Hessian matrix is N×N if we have N variables
Numerically computing the Hessian matrix and its inverse (by Cholesky decomposition) can be quite expensive for large N
( )
∂∂∂∂∂∂∂∂
∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂
=∇
222
21
2
222
22
212
12
2122
12
2
NNN
N
N
xfxxfxxf
xxfxfxxfxxfxxfxf
Xf
( ) ( ) ( )[ ] ( )[ ]kkkk XfXfXX ∇⋅∇−=−+ 121
Slide 23
Newton Method
A number of modified algorithms were developed to address this complexity issue E.g., quasi-Newton method
The key idea is to approximate the Hessian matrix and its
inverse so that: The computational cost is significantly reduced Fast convergence can still be achieved
More details can be found at
Numerical Recipes: The Art of Scientific Computing, 2007
Slide 24
Summary
Unconstrained Optimization Gradient method Newton method