Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1...
Embed Size (px)
Transcript of Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1...
-
Quasi-Newton methods:Symmetric rank 1 (SR1)
BroydenFletcherGoldfarbShanno (BFGS)Limited memory BFGS (L-BFGS)
February 6, 2014
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 1 / 25
-
Roadmap
Line-search Trust-region
CG pk = fk + kpk1 Newton pk = [2fk ]1fk mk(p) = fk +f Tk p + 0.5pT2fkp
q-Newton pk = B1k fk mk(p) = fk +fTk p + 0.5p
TBkp
Today:
How to select Bk 2fk for fast convergence of algorithms?
Tomorrow:
How to cheaply approximately solve Newtons subproblem & preserve fastconvergence?
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 2 / 25
-
Idea
Taylor:
f (xk+1)f (xk) = 2f (xk + t(xk+1 xk)) Bk+1
[xk+1 xk ]
Secant equation
sk := xk+1 xkyk := f (xk+1)f (xk)
Bk+1sk = yk
n equations in (n + 1)n/2 unknowns!
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 3 / 25
-
SR1 method
Ansatz:
Bk+1 = Bk + vvT , = 1
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 4 / 25
-
SR1 method
Bk+1 = Bk +(yk Bksk)(yk Bksk)T
(yk Bksk)T sk, when (yk Bksk)T sk 6= 0
ShermanMorrison
A = A + abT = A1 = A1 A1abTA1
1 + bTA1a,
provided A, A are non-singular.
Hk+1 = B1k+1 = Hk +
(sk Hkyk)(sk Hkyk)T
(sk Hkyk)T yk
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 5 / 25
-
SR1: cases
1 If (yk Bksk)T sk 6= 0 use SR1 update formula2 If yk = Bksk = Bk+1 = Bk3 If yk 6= Bksk and (yk Bksk)T sk = 0 = no SR1 update! Skip the
update in this case (Bk+1 = Bk)
Note:
Bk does not have to be positive definite =Good Hessian approximation for non-convex problems
Better suited for TR than LS approach
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 6 / 25
-
SR1: convergence
Theorem 6.1, N&W
Exact in n steps on convex quadratic functions (if updates never skipped)
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 7 / 25
-
SR1: convergence
Theorem 6.2, N&W
xk x
2f is bounded & Lipschitz|(yk Bksk)T sk | ryk Bksksk (updates never skipped){sk}-uniformly linearly independent
Thenlimk
Bk 2f (x) = 0
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 8 / 25
-
SR1: convergence
Theorem 6.7, N&W
Under restrictive assumptions (unique stationary point in the vicinity ofthe algorithmic iterations; updates never skipped; Bk bounded; etc) =convergence with n-step superlinear rate.
Good convergence in practice.
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 9 / 25
-
Rank-two update algorithms
Idea:
minimizeBRnn
B Bk,
subject to Bsk = yk ,
B = BT .
orminimizeHRnn
H Hk,
subject to Hyk = sk ,
H = HT .
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 10 / 25
-
Rank-two update algorithms
Idea:
minimizeBRnn
B Bk,
subject to Bsk = yk ,
B = BT .
Take scaled Frobenius matrix norm:
B Bk2 = Tr[W (B Bk)W (B Bk)T ],
W = W T , W 0, Wyk = sk .
Result: DavidonFletcherPowell (DFP) algorithm
Bk+1 = (I kyksTk )Bk(I kskyTk ) + kykyTk ,
k = (yTk sk)
1.
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 11 / 25
-
DavidonFletcherPowell algorithm:
Bk+1 = (I kyksTk )Bk(I kskyTk ) + kykyTk ,
Hk+1 = Hk Hkyky
Tk Hk
yTk Hkyk+
sksTk
yTk sk,
k = (yTk sk)
1.Historically: first quasi-Newton algorithm. Sometimes gets stuck.
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 12 / 25
-
Rank-two update algorithms
Idea:
minimizeHRnn
H Hk,
subject to Hsk = yk ,
H = HT .
Take scaled Frobenius matrix norm:
H Hk2 = Tr[W (H Hk)W (H Hk)T ],
W = W T , W 0, Wsk = yk .
Result: BroydenFletcherGoldfarbShanno (BFGS) algorithm
Hk+1 = (I kskyTk )Hk(I kyksTk ) + ksksTk ,
k = (yTk sk)
1.
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 13 / 25
-
BroydenFletcherGoldfarbShanno algorithm:
Hk+1 = (I kskyTk )Hk(I kyksTk ) + ksksTk ,
Bk+1 = Bk Bksks
Tk Bk
sTk Bksk+
ykyTk
yTk sk,
k = (yTk sk)
1.
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 14 / 25
-
Linesearch framework:
Wolfe curvature condition:
f Tk+1pk c2f Tk pk
[f Tk+1 fk ]T =yTk
pk=1k sk
(c2 1)f Tk pk > 0.
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 15 / 25
-
BFGS/DFP+Wolfe linesearch:
Suppose Hk 0.
Wolfe curvature condition = yTk sk > 0.
z Rn, z 6= 0:
zTHk+1z = [(I kyksTk )zk ]THk [(I kyksTk )zk ] + [sTk z ]2/(yTk sk) 0.
If sTk z 6= 0 = zTHk+1zT [sTk z ]2/(yTk sk) > 0.If sTk z = 0 = zTHk+1z = zTHkz > 0.
Therefore Hk+1 0! (Same for DFP)
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 16 / 25
-
BFGS/DFP+Wolfe linesearch:
B0 0 (H0 0) + Wolfe linesearchBk 0 (Hk 0), kpk = B1k fk = Hkfk is a descent direction (fk 6= 0).
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 17 / 25
-
BFGS: global convergence
Theorem 6.4, N&W
Exact in n steps on convex quadratic functions; if B0 = I = sameiterates as CG.
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 18 / 25
-
BFGS: global convergence
Theorem 6.5, N&W
1 B0 0, B0 = BT02 f C 2
3 m,M > 0: mz2 zT2f (x)z Mz2
Then xk x: f (x) = 0 &
k=1 xk x
-
BFGS: local convergence
Theorem 6.6, N&W
1 f C 2
2 f - Lipschitz3
k=1 xk x
-
Presented quasi-Newton algorithms:
Good for small to medium size optimization problems with dense (or costlyto compute) Hessian matrices
Factorization/system solve cost for [2fk ]1fk : O(n3)Update + multiplication with Hk : O(n
2)
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 21 / 25
-
Limited-memory quasi-Newton algorithms:
Idea
Storage of Bk (or Hk): O(n2)
Instead: store m
-
Example: L-BFGS
Hk+1q = [I kyksTk ]THk [I kyksTk ]q + ksksTk q= [I kskyTk ]Hk [q {ksTk q}yk ] + {ksTk q}sk .
Algorithm (only one pair is stored)
1: k := ksTk q
2: q := q kyk3: r := Hkq4: k := ky
Tk r
5: r := r + (k k)sk
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 23 / 25
-
Example: L-BFGS
1: q := fk+12: for i = k , k 1, . . . , k m + 1 do3: i := i s
Ti q
4: q := q iyi5: end for6: r := H0kq7: for i = k m + 1, . . . , k 1, k do8: i := iy
Ti r
9: r := r + (i i )si10: end for11: Result: r Hk+1fk+1
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 24 / 25
-
Storage and step complexity revisited:
Method Storage Step complexity Convergence rate
CG O(n) O(n) linearBFGS/DFP/SR1 O(n2) O(n2) superlinear
L-BFGS O(mn) O(mn) ???Newton (full Hess) O(n2) O(n3) quadratic
Newton (sparse Hess) O(n)O(n2) O(n)O(n3) quardatic
Note: step complexity does not account for complexities inevaluating function/derivatives!!!
Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 25 / 25