• date post

10-Apr-2018
• Category

## Documents

• view

219

5

Embed Size (px)

### Transcript of Mathematics in Quasi-Newton Method - hod/papers/Unsorted/slids_qn.pdfMathematics in Quasi-Newton...

• Mathematics in Quasi-Newton Method

Peng Du and Xijun Wang

Department of Computing and Software

McMaster University

April 2, 2003

• Outline

Elementary Quasi-Newton Method

Update for Cholesky Factorization

Update for Limited Memory

Update for Sparse Hessians

Update for Reduced Hessians

References

• Elementary Quasi-Newton Method

Unconstrained Optimization: min f(x)

Quasi-Newton Iteration: xk+1 = xk tkHkgk, Hk+1 = Hk + Ek

Notation: B, B, H, H, x, x, g, g, (p) = x x, y = g gDesired Update Properties:

Quasi-Newton Property: B = y(Hy = )

Symmetric

Positive Definite

SR1: B = B + T

T , = y B

DFP: B = B ByT+yT BT y

+(1 +

T BT y

)yyT

T y

BFGS: B = B + yyT

yT BT B

T B

PSB: B = B + T+T

T T

T T

T

• Cholesky Factor Modification I Rank-one

Cholesky Factor Modification (symmetric rank one P.D.) update

Using Classical Cholesky Factorization

A = LDLT A = A + zzT =?LDLT

A = L(D + ppT )LT , Lp = z

Factorize D + ppT = LDLT giving A = LDLT , L = LL, D = D

L is of the special form

L(p, , ) =

1p21 2p31 p32 3

... ... ... . . .n1

pn1 pn2 pn3... pnn1 n

Assertion 1: Need only to compute di, i, in O(n) time.

Assertion 2: Product of L and L can be done in O(n2) time.

Drawback instable

• Cholesky Factor Modification II Rank-one

Using Givens Matrices (I)

2-dimensional case(c ss c

) (z1z2

)=

(+0

)

where 2 = z21 + z22, = sign(z1)(

2)1/2, c = z1/, s = z2/.

n-dimensional case

P ijz =

(i)th (j)th1

.. .1

c s1

.. .1

s c1

.. .1

z1...

zi

...

zj...

zn

=

z1...

zi

...

0...

zn

• Sequential reduction

P12 P23 Pn1n z = e1 or P12 P13 P1n z = e1

Products of Pn1n Pn2n1 P23 P12 is of the special form

HL(p, , ) =

p11 1p21 p22 2p31 p32 p33

. . .... ... ... n2

pn11 pn12 pn13 pn1n1 n1pn1 pn2 pn3 pnn1 pnn

Denote HU(, p, ) HTL(p, , )

Factorization A = RTR A = A + zzT =?RT RA = RT (I + ppT )R, RTp = z

Choose a sequence of Givens matrices such that

Pp = P12 P23 Pn1n p = e1

• A can be written as

A = RTPTP (I + ppT )PTPR = RTPT (I + 2e1eT1)PR = H

TJTJH

where H = PR, J is identity matrix except J11 = (1 + pTp)1/2.

Assertion Product of P and R can be done in O(n2) time.

Let H = JH, then A = HT H.

Choose a second sequence of Givens matrices such that

P H = P n1n P n2n1 P23 P12 H = R

Then A = RT R.

• Cholesky Factor Modification III Rank-one

Using Givens Matrices (II) A = RTR A = A + zzT =?RT RA = RT (I + ppT )R = RT (I + ppT )(I + ppT )R

where RTp = z and = /(1 + (1 + pTp)1/2).

Choose a sequence of Givens matrices such that

Pp = P12 P23 Pn1n p = e1

A can be written as

A = RT (I + ppT )PTP (I + ppT )R = RTHTHR

where H = P (I + ppT ) = P + e1pT .

Note that P = HU(, p, ) for some , . Then

H = HU(, p, ) + e1pT = HU(, p, )

where differs from only in the first element: = + e1.

Choose a second sequence of Givens matrices such that

PH = P n1n P n2n1 P23 P12 H = R

• Assertion 1: R is of the form R = R(, p, ) L(p, , )

Then A = RTHT PT PHR = RTRT RR.

Assertion 2: Product of R and R can be done in O(n2) time.

• Cholesky Factor Modification IV Rank-two

Product Form of BFGS

B+ = (I + vuT )B(I + uvT ), u = p + B1y, v = 1Bp + 2yfor some , 1, 2.

Triangularization of I + zwT : (I + zwT )Q = L

First choose a sequence of Givens matrices such that

Pw = P12 P23 Pn1n w = e1

Then H = (I + zwT )PT = PT + zeT1 is lower Hessenberg matrix.

Choose a second sequence of Givens matrix to reduce the superdiagonal

elements of H such that

HP = L

• Assertion: L is of the special form

L(w, , z, , ) =

11w2 + 1z2 21w3 + 1z3 2w3 + 2z3

. . .... ... n1

1wn + 1zn 2wn + 2zn n1wn + n1zn n

Factorization B = L1DLT1 B+ = (I + vuT )B(I + uvT ) =?L1DL1T

B+ = L(I + zwT )(I + wzT )LT , L = L1D1/2, Lz = v, Lw = LLTu.

Triangularize I + zwT = LQT giving

B+ = LLQTQLTLT = L+L+T , L+ = LL = L1D1/2L

Assertion: D1/2L(w, , z, , ) = L(w, , z, , ) = L(w, , z, , e)D1/2 where

w = D1/2w, z = D1/2z, = D1/2, e = (1, ,1)T ,D1/2 = diag(1, , n), = D1/2, = D1/2.

Assertion: Product of L1L(w, , z, , e) could be done in O(n2) time.

• Cholesky Factor Modification V Rank-two

BFGS B+ = B + uuT + vvT

Using Classical Factorization

B = L1DLT1 B+ = B + uuT + vvT =?L1DL1T

B+ = B + uuT + vvT = L1(D + wwT + zzT )LT1

where L1w = u, L1z = v.

Factorize D + wwT + zzT = L1DLT1

Similar to the rank-one case, L1 is of a special form.

Finally

L+1 = L1L(w, , z, , e), D+ = D

Assertion: Product of L1L(w, , z, , e) could be done in O(n2) time.

• Implementing Quasi-Newton Method

Approach-1: Cholesky Approach-2: Inverse Hessian

Bksk = f(xk) Bksk = f(xk)

RTk Rk = Bk Hk = B1k

back and forward substitution solve for sk is straight forward

find Rk+1 s.t. update Hk directly, i.e.

RTk+1Rk+1 = Bk + Bk Hk+1 = f(Hk, k, yk) = B1k+1

• Update Inverse Hessian Approximation: PropertiesDesired

Hessian Approximate Inverse Hessian Approximate

Bk+1k = yk Hk+1yk = k

Bk+1 is positive definite Hk+1 is positive definite

Bk+1 is symmetric Hk+1 is symmetric

Requirements are in dual format. As a consequence, Update formula should be in dual format too

• Updating Hessian Approximate

Broyden Family on Bk update

Bk+1 = Bk Bkk(Bkk)

T

TBkk+

ykyTk

yTk k+ k(

Tk Bkk)vkv

Tk ;

where k [0,1] andvk =

(yk

yTk k Bkk

TBkk

)

Broyden Family on Hk update

Hk+1 = Hk Hkyk(Hkyk)

T

yTHkyk+

kTk

Tk yk+ k(y

Tk Hkyk)wkw

Tk ;

where k [0,1] andwk =

(k

Tk yk Hkyk

yTHkyk

)

• BFGS update of Hk when k = 1

Hk+1 = Hk Hkyk(Hkyk)T

yT Hkyk+

kTk

Tk yk+ (yTk Hkyk)wkw

Tk ;

rewritten as:

Hk+1 = VTk HkVk + kk

Tk

where k =1

yTk k, and Vk = I ykTk .

Difficulties:

Hk is dense usually

Hk can be ill-conditioned

• Solving Memory Problem L-BFGS

Hk+1 = VTk HkVk + kk

Tk

expand Hk = VTk1Hk1Vk1 + k1k1Tk1

Hk+1 = (VTk V

Tk1)Hk1(Vk1Vk)

+k1V Tk1k1Tk1Vk1 + kTk

expand m times:Hk+1 = (V

Tk V Tkm)Hkm(Vkm Vk)

+km(V Tk V Tkm+1)kmTkm(Vkm+1 Vk)+km+1(V Tk V Tkm+2)km+1Tkm+1(Vkm+2 Vk)

...+kk

Tk .

• Ideas of L-BFGS

Maintain m most recent pairs of and y,P = {(k, yk), (k1, yk1), . . . , (km+1, ykm+1)}

At k-th iteration:If (k m)

P = P {(k, yk)}else

P = (P {(km+1, ykm+1)}) {(k, yk)}end

• Performance of L-BFGS

the two numbers above: iterations/function-evaluations

there numbers below: iteration-time/function-time/total-time

• Choosing Appropriate m

the two numbers above: iterations/function-evaluations

there numbers below: iteration-time/function-time/total-time

• Dealing With Hk

Choose of Hk influences the overall performance.

Options found in literature:

Identity matrix Hk = H0 = I

Identity matrix scaled at first step only Hk =yT0 0y02I

Identity matrix scaled at each step Hk =yTk kyk2I

Diagonal matrix minimize

Hk = argminDdiagDYk1 Sk1F ,where Yk1 = [yk1, . . . , ykm], Sk1 = [k1, . . . , km]

• Scaling Schemes Comparisons

the two numbers above: iterations/function-evaluations

there numbers below: iteration-time/function-time/total-time

• Update for Spare Hessian I

Suppose 2f(x) has a known sparse pattern K:(i, j) K [2f(x)]ij = 0(i, j) K (j, i) K, (for i, j = 1,2, , n)(i, i) K, (for i = 1,2, , n)

Additional desired update property: (i, j) K Bij = 0

N = {X Rnn|Xp = y, X = XT}S = {X Rnn|(i, j) K Xij = 0}V = S N

Row Decomposition B = B + ni=1 eieTi BB = B + ni=1 eieTi

(yyT

pT y BppT B

pT Bp

)= B +

ni=1 ei

(eTi y

pT yyT e

Ti Bp

pT BppTB

)

• Update for Spare Hessian II

Symmetrization

B = B +(y Bp)zT

zTp

B = B + (y Bp)zT + z(y Bp)TzTp

zzT

where is chosen to assure QN condition:

=(y Bp)Tp

(zTp)2

z = y

pTBp/pTy + Bp = BFGSz = y = DFPz = y Bp = SR1z = p = PSB

• Update for Spare Hessian III

Update with desired sparse pattern

B = B +n

i=1

eieTi(y Bp)zTi

zTi p, zi = Diz

where Di is the diagonal matrix with the jth component 1 if Bij = 0,0 if Bij = 0.

Symmetrization

B = B +n

i=1

i(eizTi + zie

Ti )

where i is chosen to assure QN condition, satisfying

S = y Bpwhere S =

ni=1 z

Ti peie

Ti + e

Ti pzie

Ti .

When z = y

pTBp/pTy + Bp, y, y Bp, p, we get a sparse version ofBFGS,DFP,SR1,PSB.

Drawback of symmetrization method

• Update for Spare Hessian IV

Frobenius Norm Projection X2F n

i,j=1 X2ij = Tr(X

TX)

Define B = B + E, and consider problem

min 12E2F = 12Tr(ETE)s.t. Ep = 0

Eij = Bij, (i, j) KE = ET

We get

E =1

2

n

i=1

eieTi (p

Ti + p