Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1...

25
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFG February 6, 2014 1 / 25

Transcript of Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1...

Page 1: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

Quasi-Newton methods:Symmetric rank 1 (SR1)

Broyden–Fletcher–Goldfarb–Shanno (BFGS)Limited memory BFGS (L-BFGS)

February 6, 2014

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 1 / 25

Page 2: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

Roadmap

Line-search Trust-region

CG pk = −∇fk + βkpk−1 —Newton pk = −[∇2fk ]−1∇fk mk(p) = fk +∇f Tk p + 0.5pT∇2fkp

q-Newton pk = −B−1k ∇fk mk(p) = fk +∇f Tk p + 0.5pTBkp

Today:

How to select Bk ≈ ∇2fk for fast convergence of algorithms?

Tomorrow:

How to cheaply approximately solve Newton’s subproblem & preserve fastconvergence?

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 2 / 25

Page 3: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

Idea

Taylor:

∇f (xk+1)−∇f (xk) = ∇2f (xk + t(xk+1 − xk))︸ ︷︷ ︸≈Bk+1

[xk+1 − xk ]

Secant equation

sk := xk+1 − xk

yk := ∇f (xk+1)−∇f (xk)

Bk+1sk = yk

n equations in (n + 1)n/2 unknowns!

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 3 / 25

Page 4: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

SR1 method

Ansatz:

Bk+1 = Bk + σvvT , σ = ±1

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 4 / 25

Page 5: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

SR1 method

Bk+1 = Bk +(yk − Bksk)(yk − Bksk)T

(yk − Bksk)T sk, when (yk − Bksk)T sk 6= 0

Sherman–Morrison

A = A + abT =⇒ A−1 = A−1 − A−1abTA−1

1 + bTA−1a,

provided A, A are non-singular.

Hk+1 = B−1k+1 = Hk +(sk − Hkyk)(sk − Hkyk)T

(sk − Hkyk)T yk

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 5 / 25

Page 6: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

SR1: cases

1 If (yk − Bksk)T sk 6= 0 use SR1 update formula

2 If yk = Bksk =⇒ Bk+1 = Bk

3 If yk 6= Bksk and (yk − Bksk)T sk = 0 =⇒ no SR1 update! Skip theupdate in this case (Bk+1 = Bk)

Note:

Bk does not have to be positive definite =⇒Good Hessian approximation for non-convex problems

Better suited for TR than LS approach

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 6 / 25

Page 7: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

SR1: convergence

Theorem 6.1, N&W

Exact in n steps on convex quadratic functions (if updates never skipped)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 7 / 25

Page 8: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

SR1: convergence

Theorem 6.2, N&W

xk → x∗

∇2f is bounded & Lipschitz

|(yk − Bksk)T sk | ≥ r‖yk − Bksk‖‖sk‖ (updates never skipped)

{sk}-uniformly linearly independent

Thenlimk→∞

‖Bk −∇2f (x∗)‖ = 0

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 8 / 25

Page 9: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

SR1: convergence

Theorem 6.7, N&W

Under restrictive assumptions (unique stationary point in the vicinity ofthe algorithmic iterations; updates never skipped; Bk bounded; etc) =⇒convergence with n-step superlinear rate.

Good convergence in practice.

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 9 / 25

Page 10: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

Rank-two update algorithms

Idea:

minimizeB∈Rn×n

‖B − Bk‖,

subject to Bsk = yk ,

B = BT .

orminimizeH∈Rn×n

‖H − Hk‖,

subject to Hyk = sk ,

H = HT .

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 10 / 25

Page 11: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

Rank-two update algorithms

Idea:

minimizeB∈Rn×n

‖B − Bk‖,

subject to Bsk = yk ,

B = BT .

Take scaled Frobenius matrix norm:

‖B − Bk‖2 = Tr[W (B − Bk)W (B − Bk)T ],

W = W T , W � 0, Wyk = sk .

Result: Davidon–Fletcher–Powell (DFP) algorithm

Bk+1 = (I − ρkyksTk )Bk(I − ρkskyTk ) + ρkykyTk ,

ρk = (yTk sk)−1.

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 11 / 25

Page 12: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

Davidon–Fletcher–Powell algorithm:

Bk+1 = (I − ρkyksTk )Bk(I − ρkskyTk ) + ρkykyTk ,

Hk+1 = Hk −Hkyky

Tk Hk

yTk Hkyk+

sksTk

yTk sk,

ρk = (yTk sk)−1.Historically: first quasi-Newton algorithm. Sometimes gets stuck.

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 12 / 25

Page 13: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

Rank-two update algorithms

Idea:

minimizeH∈Rn×n

‖H − Hk‖,

subject to Hsk = yk ,

H = HT .

Take scaled Frobenius matrix norm:

‖H − Hk‖2 = Tr[W (H − Hk)W (H − Hk)T ],

W = W T , W � 0, Wsk = yk .

Result: Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm

Hk+1 = (I − ρkskyTk )Hk(I − ρkyksTk ) + ρksksTk ,

ρk = (yTk sk)−1.

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 13 / 25

Page 14: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

Broyden–Fletcher–Goldfarb–Shanno algorithm:

Hk+1 = (I − ρkskyTk )Hk(I − ρkyksTk ) + ρksksTk ,

Bk+1 = Bk −Bksks

Tk Bk

sTk Bksk+

ykyTk

yTk sk,

ρk = (yTk sk)−1.

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 14 / 25

Page 15: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

Linesearch framework:

Wolfe curvature condition:

∇f Tk+1pk ≥ c2∇f Tk pk

⇓[∇f Tk+1 −∇fk ]T︸ ︷︷ ︸

=yTk

pk︸︷︷︸=α−1

k sk

≥ (c2 − 1)∇f Tk pk > 0.

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 15 / 25

Page 16: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

BFGS/DFP+Wolfe linesearch:

Suppose Hk � 0.

Wolfe curvature condition =⇒ yTk sk > 0.

∀z ∈ Rn, z 6= 0:

zTHk+1z = [(I − ρkyksTk )zk ]THk [(I − ρkyksTk )zk ] + [sTk z ]2/(yTk sk) ≥ 0.

If sTk z 6= 0 =⇒ zTHk+1zT ≥ [sTk z ]2/(yTk sk) > 0.

If sTk z = 0 =⇒ zTHk+1z = zTHkz > 0.

Therefore Hk+1 � 0! (Same for DFP)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 16 / 25

Page 17: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

BFGS/DFP+Wolfe linesearch:

B0 � 0 (H0 � 0) + Wolfe linesearch⇓Bk � 0 (Hk � 0), ∀k⇓pk = −B−1k ∇fk = −Hk∇fk is a descent direction (∇fk 6= 0).

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 17 / 25

Page 18: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

BFGS: global convergence

Theorem 6.4, N&W

Exact in n steps on convex quadratic functions; if B0 = I =⇒ sameiterates as CG.

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 18 / 25

Page 19: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

BFGS: global convergence

Theorem 6.5, N&W

1 B0 � 0, B0 = BT0

2 f ∈ C 2

3 ∃m,M > 0: m‖z‖2 ≤ zT∇2f (x)z ≤ M‖z‖2

Then xk → x∗: ∇f (x∗) = 0 &∑∞

k=1 ‖xk − x∗‖ <∞.

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 19 / 25

Page 20: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

BFGS: local convergence

Theorem 6.6, N&W

1 f ∈ C 2

2 ∇f - Lipschitz

3∑∞

k=1 ‖xk − x∗‖ <∞.

Then limk→∞ ‖xk+1 − x∗‖/‖xk − x∗‖ = 0.

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 20 / 25

Page 21: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

Presented quasi-Newton algorithms:

Good for small to medium size optimization problems with dense (or costlyto compute) Hessian matrices

Factorization/system solve cost for [∇2fk ]−1∇fk : O(n3)

Update + multiplication with Hk : O(n2)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 21 / 25

Page 22: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

Limited-memory quasi-Newton algorithms:

Idea

Storage of Bk (or Hk): O(n2)

Instead: store m << n pairs (sk , yk): O(2mn)

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 22 / 25

Page 23: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

Example: L-BFGS

Hk+1q = [I − ρkyksTk ]THk [I − ρkyksTk ]q + ρksksTk q

= [I − ρkskyTk ]Hk [q − {ρksTk q}yk ] + {ρksTk q}sk .

Algorithm (only one pair is stored)

1: αk := ρksTk q

2: q := q − αkyk3: r := Hkq4: βk := ρky

Tk r

5: r := r + (αk − βk)sk

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 23 / 25

Page 24: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

Example: L-BFGS

1: q := ∇fk+1

2: for i = k , k − 1, . . . , k −m + 1 do3: αi := ρi s

Ti q

4: q := q − αiyi5: end for6: r := H0

kq7: for i = k −m + 1, . . . , k − 1, k do8: βi := ρiy

Ti r

9: r := r + (αi − βi )si10: end for11: Result: r ≈ Hk+1∇fk+1

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 24 / 25

Page 25: Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden{Fletcher{Goldfarb{Shanno (BFGS) Limited memory BFGS (L-BFGS) February

Storage and step complexity revisited:

Method Storage Step complexity Convergence rate

CG O(n) O(n) linearBFGS/DFP/SR1 O(n2) O(n2) superlinear

L-BFGS O(mn) O(mn) ???Newton (full Hess) O(n2) O(n3) quadratic

Newton (sparse Hess) O(n)–O(n2) O(n)–O(n3) quardatic

Note: step complexity does not account for complexities inevaluating function/derivatives!!!

Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 25 / 25