Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1...

of 25 /25
Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFGS) Limited memory BFGS (L-BFGS) February 6, 2014 Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden–Fletcher–Goldfarb–Shanno (BFG February 6, 2014 1 / 25

Embed Size (px)

Transcript of Quasi-Newton methods: Symmetric rank 1 (SR1) Broyden ... · Quasi-Newton methods: Symmetric rank 1...

  • Quasi-Newton methods:Symmetric rank 1 (SR1)

    BroydenFletcherGoldfarbShanno (BFGS)Limited memory BFGS (L-BFGS)

    February 6, 2014

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 1 / 25

  • Roadmap

    Line-search Trust-region

    CG pk = fk + kpk1 Newton pk = [2fk ]1fk mk(p) = fk +f Tk p + 0.5pT2fkp

    q-Newton pk = B1k fk mk(p) = fk +fTk p + 0.5p

    TBkp

    Today:

    How to select Bk 2fk for fast convergence of algorithms?

    Tomorrow:

    How to cheaply approximately solve Newtons subproblem & preserve fastconvergence?

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 2 / 25

  • Idea

    Taylor:

    f (xk+1)f (xk) = 2f (xk + t(xk+1 xk)) Bk+1

    [xk+1 xk ]

    Secant equation

    sk := xk+1 xkyk := f (xk+1)f (xk)

    Bk+1sk = yk

    n equations in (n + 1)n/2 unknowns!

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 3 / 25

  • SR1 method

    Ansatz:

    Bk+1 = Bk + vvT , = 1

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 4 / 25

  • SR1 method

    Bk+1 = Bk +(yk Bksk)(yk Bksk)T

    (yk Bksk)T sk, when (yk Bksk)T sk 6= 0

    ShermanMorrison

    A = A + abT = A1 = A1 A1abTA1

    1 + bTA1a,

    provided A, A are non-singular.

    Hk+1 = B1k+1 = Hk +

    (sk Hkyk)(sk Hkyk)T

    (sk Hkyk)T yk

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 5 / 25

  • SR1: cases

    1 If (yk Bksk)T sk 6= 0 use SR1 update formula2 If yk = Bksk = Bk+1 = Bk3 If yk 6= Bksk and (yk Bksk)T sk = 0 = no SR1 update! Skip the

    update in this case (Bk+1 = Bk)

    Note:

    Bk does not have to be positive definite =Good Hessian approximation for non-convex problems

    Better suited for TR than LS approach

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 6 / 25

  • SR1: convergence

    Theorem 6.1, N&W

    Exact in n steps on convex quadratic functions (if updates never skipped)

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 7 / 25

  • SR1: convergence

    Theorem 6.2, N&W

    xk x

    2f is bounded & Lipschitz|(yk Bksk)T sk | ryk Bksksk (updates never skipped){sk}-uniformly linearly independent

    Thenlimk

    Bk 2f (x) = 0

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 8 / 25

  • SR1: convergence

    Theorem 6.7, N&W

    Under restrictive assumptions (unique stationary point in the vicinity ofthe algorithmic iterations; updates never skipped; Bk bounded; etc) =convergence with n-step superlinear rate.

    Good convergence in practice.

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 9 / 25

  • Rank-two update algorithms

    Idea:

    minimizeBRnn

    B Bk,

    subject to Bsk = yk ,

    B = BT .

    orminimizeHRnn

    H Hk,

    subject to Hyk = sk ,

    H = HT .

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 10 / 25

  • Rank-two update algorithms

    Idea:

    minimizeBRnn

    B Bk,

    subject to Bsk = yk ,

    B = BT .

    Take scaled Frobenius matrix norm:

    B Bk2 = Tr[W (B Bk)W (B Bk)T ],

    W = W T , W 0, Wyk = sk .

    Result: DavidonFletcherPowell (DFP) algorithm

    Bk+1 = (I kyksTk )Bk(I kskyTk ) + kykyTk ,

    k = (yTk sk)

    1.

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 11 / 25

  • DavidonFletcherPowell algorithm:

    Bk+1 = (I kyksTk )Bk(I kskyTk ) + kykyTk ,

    Hk+1 = Hk Hkyky

    Tk Hk

    yTk Hkyk+

    sksTk

    yTk sk,

    k = (yTk sk)

    1.Historically: first quasi-Newton algorithm. Sometimes gets stuck.

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 12 / 25

  • Rank-two update algorithms

    Idea:

    minimizeHRnn

    H Hk,

    subject to Hsk = yk ,

    H = HT .

    Take scaled Frobenius matrix norm:

    H Hk2 = Tr[W (H Hk)W (H Hk)T ],

    W = W T , W 0, Wsk = yk .

    Result: BroydenFletcherGoldfarbShanno (BFGS) algorithm

    Hk+1 = (I kskyTk )Hk(I kyksTk ) + ksksTk ,

    k = (yTk sk)

    1.

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 13 / 25

  • BroydenFletcherGoldfarbShanno algorithm:

    Hk+1 = (I kskyTk )Hk(I kyksTk ) + ksksTk ,

    Bk+1 = Bk Bksks

    Tk Bk

    sTk Bksk+

    ykyTk

    yTk sk,

    k = (yTk sk)

    1.

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 14 / 25

  • Linesearch framework:

    Wolfe curvature condition:

    f Tk+1pk c2f Tk pk

    [f Tk+1 fk ]T =yTk

    pk=1k sk

    (c2 1)f Tk pk > 0.

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 15 / 25

  • BFGS/DFP+Wolfe linesearch:

    Suppose Hk 0.

    Wolfe curvature condition = yTk sk > 0.

    z Rn, z 6= 0:

    zTHk+1z = [(I kyksTk )zk ]THk [(I kyksTk )zk ] + [sTk z ]2/(yTk sk) 0.

    If sTk z 6= 0 = zTHk+1zT [sTk z ]2/(yTk sk) > 0.If sTk z = 0 = zTHk+1z = zTHkz > 0.

    Therefore Hk+1 0! (Same for DFP)

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 16 / 25

  • BFGS/DFP+Wolfe linesearch:

    B0 0 (H0 0) + Wolfe linesearchBk 0 (Hk 0), kpk = B1k fk = Hkfk is a descent direction (fk 6= 0).

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 17 / 25

  • BFGS: global convergence

    Theorem 6.4, N&W

    Exact in n steps on convex quadratic functions; if B0 = I = sameiterates as CG.

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 18 / 25

  • BFGS: global convergence

    Theorem 6.5, N&W

    1 B0 0, B0 = BT02 f C 2

    3 m,M > 0: mz2 zT2f (x)z Mz2

    Then xk x: f (x) = 0 &

    k=1 xk x

  • BFGS: local convergence

    Theorem 6.6, N&W

    1 f C 2

    2 f - Lipschitz3

    k=1 xk x

  • Presented quasi-Newton algorithms:

    Good for small to medium size optimization problems with dense (or costlyto compute) Hessian matrices

    Factorization/system solve cost for [2fk ]1fk : O(n3)Update + multiplication with Hk : O(n

    2)

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 21 / 25

  • Limited-memory quasi-Newton algorithms:

    Idea

    Storage of Bk (or Hk): O(n2)

    Instead: store m

  • Example: L-BFGS

    Hk+1q = [I kyksTk ]THk [I kyksTk ]q + ksksTk q= [I kskyTk ]Hk [q {ksTk q}yk ] + {ksTk q}sk .

    Algorithm (only one pair is stored)

    1: k := ksTk q

    2: q := q kyk3: r := Hkq4: k := ky

    Tk r

    5: r := r + (k k)sk

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 23 / 25

  • Example: L-BFGS

    1: q := fk+12: for i = k , k 1, . . . , k m + 1 do3: i := i s

    Ti q

    4: q := q iyi5: end for6: r := H0kq7: for i = k m + 1, . . . , k 1, k do8: i := iy

    Ti r

    9: r := r + (i i )si10: end for11: Result: r Hk+1fk+1

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 24 / 25

  • Storage and step complexity revisited:

    Method Storage Step complexity Convergence rate

    CG O(n) O(n) linearBFGS/DFP/SR1 O(n2) O(n2) superlinear

    L-BFGS O(mn) O(mn) ???Newton (full Hess) O(n2) O(n3) quadratic

    Newton (sparse Hess) O(n)O(n2) O(n)O(n3) quardatic

    Note: step complexity does not account for complexities inevaluating function/derivatives!!!

    Quasi-Newton methods: Symmetric rank 1 (SR1) BroydenFletcherGoldfarbShanno (BFGS) Limited memory BFGS (L-BFGS)February 6, 2014 25 / 25