of 25 /25
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFG February 6, 2014 1 / 25
• Author

buidieu
• Category

## Documents

• view

242

5

Embed Size (px)

### Transcript of Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1...

• Quasi-Newton methods:Symmetric rank 1 (SR1)

BroydenFletcherGoldfarbShanno (BFGS)Limited memory BFGS (L-BFGS)

February 6, 2014

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 1 / 25

Line-search Trust-region

CG pk = fk + kpk1 Newton pk = [2fk ]1fk mk(p) = fk +f Tk p + 0.5pT2fkp

q-Newton pk = B1k fk mk(p) = fk +fTk p + 0.5p

TBkp

Today:

How to select Bk 2fk for fast convergence of algorithms?

Tomorrow:

How to cheaply approximately solve Newtons subproblem & preserve fastconvergence?

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 2 / 25

• Idea

Taylor:

f (xk+1)f (xk) = 2f (xk + t(xk+1 xk)) Bk+1

[xk+1 xk ]

Secant equation

sk := xk+1 xkyk := f (xk+1)f (xk)

Bk+1sk = yk

n equations in (n + 1)n/2 unknowns!

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 3 / 25

• SR1 method

Ansatz:

Bk+1 = Bk + vvT , = 1

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 4 / 25

• SR1 method

Bk+1 = Bk +(yk Bksk)(yk Bksk)T

(yk Bksk)T sk, when (yk Bksk)T sk 6= 0

ShermanMorrison

A = A + abT = A1 = A1 A1abTA1

1 + bTA1a,

provided A, A are non-singular.

Hk+1 = B1k+1 = Hk +

(sk Hkyk)(sk Hkyk)T

(sk Hkyk)T yk

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 5 / 25

• SR1: cases

1 If (yk Bksk)T sk 6= 0 use SR1 update formula2 If yk = Bksk = Bk+1 = Bk3 If yk 6= Bksk and (yk Bksk)T sk = 0 = no SR1 update! Skip the

update in this case (Bk+1 = Bk)

Note:

Bk does not have to be positive definite =Good Hessian approximation for non-convex problems

Better suited for TR than LS approach

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 6 / 25

• SR1: convergence

Theorem 6.1, N&W

Exact in n steps on convex quadratic functions (if updates never skipped)

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 7 / 25

• SR1: convergence

Theorem 6.2, N&W

xk x

2f is bounded & Lipschitz|(yk Bksk)T sk | ryk Bksksk (updates never skipped){sk}-uniformly linearly independent

Thenlimk

Bk 2f (x) = 0

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 8 / 25

• SR1: convergence

Theorem 6.7, N&W

Under restrictive assumptions (unique stationary point in the vicinity ofthe algorithmic iterations; updates never skipped; Bk bounded; etc) =convergence with n-step superlinear rate.

Good convergence in practice.

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 9 / 25

• Rank-two update algorithms

Idea:

minimizeBRnn

B Bk,

subject to Bsk = yk ,

B = BT .

orminimizeHRnn

H Hk,

subject to Hyk = sk ,

H = HT .

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 10 / 25

• Rank-two update algorithms

Idea:

minimizeBRnn

B Bk,

subject to Bsk = yk ,

B = BT .

Take scaled Frobenius matrix norm:

B Bk2 = Tr[W (B Bk)W (B Bk)T ],

W = W T , W 0, Wyk = sk .

Result: DavidonFletcherPowell (DFP) algorithm

Bk+1 = (I kyksTk )Bk(I kskyTk ) + kykyTk ,

k = (yTk sk)

1.

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 11 / 25

• DavidonFletcherPowell algorithm:

Bk+1 = (I kyksTk )Bk(I kskyTk ) + kykyTk ,

Hk+1 = Hk Hkyky

Tk Hk

yTk Hkyk+

sksTk

yTk sk,

k = (yTk sk)

1.Historically: first quasi-Newton algorithm. Sometimes gets stuck.

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 12 / 25

• Rank-two update algorithms

Idea:

minimizeHRnn

H Hk,

subject to Hsk = yk ,

H = HT .

Take scaled Frobenius matrix norm:

H Hk2 = Tr[W (H Hk)W (H Hk)T ],

W = W T , W 0, Wsk = yk .

Result: BroydenFletcherGoldfarbShanno (BFGS) algorithm

Hk+1 = (I kskyTk )Hk(I kyksTk ) + ksksTk ,

k = (yTk sk)

1.

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 13 / 25

• BroydenFletcherGoldfarbShanno algorithm:

Hk+1 = (I kskyTk )Hk(I kyksTk ) + ksksTk ,

Bk+1 = Bk Bksks

Tk Bk

sTk Bksk+

ykyTk

yTk sk,

k = (yTk sk)

1.

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 14 / 25

• Linesearch framework:

Wolfe curvature condition:

f Tk+1pk c2f Tk pk

[f Tk+1 fk ]T =yTk

pk=1k sk

(c2 1)f Tk pk > 0.

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 15 / 25

• BFGS/DFP+Wolfe linesearch:

Suppose Hk 0.

Wolfe curvature condition = yTk sk > 0.

z Rn, z 6= 0:

zTHk+1z = [(I kyksTk )zk ]THk [(I kyksTk )zk ] + [sTk z ]2/(yTk sk) 0.

If sTk z 6= 0 = zTHk+1zT [sTk z ]2/(yTk sk) > 0.If sTk z = 0 = zTHk+1z = zTHkz > 0.

Therefore Hk+1 0! (Same for DFP)

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 16 / 25

• BFGS/DFP+Wolfe linesearch:

B0 0 (H0 0) + Wolfe linesearchBk 0 (Hk 0), kpk = B1k fk = Hkfk is a descent direction (fk 6= 0).

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 17 / 25

• BFGS: global convergence

Theorem 6.4, N&W

Exact in n steps on convex quadratic functions; if B0 = I = sameiterates as CG.

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 18 / 25

• BFGS: global convergence

Theorem 6.5, N&W

1 B0 0, B0 = BT02 f C 2

3 m,M > 0: mz2 zT2f (x)z Mz2

Then xk x: f (x) = 0 &

k=1 xk x

• BFGS: local convergence

Theorem 6.6, N&W

1 f C 2

2 f - Lipschitz3

k=1 xk x

• Presented quasi-Newton algorithms:

Good for small to medium size optimization problems with dense (or costlyto compute) Hessian matrices

Factorization/system solve cost for [2fk ]1fk : O(n3)Update + multiplication with Hk : O(n

2)

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 21 / 25

• Limited-memory quasi-Newton algorithms:

Idea

Storage of Bk (or Hk): O(n2)

• Example: L-BFGS

Hk+1q = [I kyksTk ]THk [I kyksTk ]q + ksksTk q= [I kskyTk ]Hk [q {ksTk q}yk ] + {ksTk q}sk .

Algorithm (only one pair is stored)

1: k := ksTk q

2: q := q kyk3: r := Hkq4: k := ky

Tk r

5: r := r + (k k)sk

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 23 / 25

• Example: L-BFGS

1: q := fk+12: for i = k , k 1, . . . , k m + 1 do3: i := i s

Ti q

4: q := q iyi5: end for6: r := H0kq7: for i = k m + 1, . . . , k 1, k do8: i := iy

Ti r

9: r := r + (i i )si10: end for11: Result: r Hk+1fk+1

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 24 / 25

• Storage and step complexity revisited:

Method Storage Step complexity Convergence rate

CG O(n) O(n) linearBFGS/DFP/SR1 O(n2) O(n2) superlinear

L-BFGS O(mn) O(mn) ???Newton (full Hess) O(n2) O(n3) quadratic

Newton (sparse Hess) O(n)O(n2) O(n)O(n3) quardatic

Note: step complexity does not account for complexities inevaluating function/derivatives!!!

Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 25 / 25