Conjugate Gradients II: CG and Variants

37
Conjugate Gradients II: CG and Variants CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 1/1

Transcript of Conjugate Gradients II: CG and Variants

Page 1: Conjugate Gradients II: CG and Variants

Conjugate Gradients II:CG and Variants

CS 205A:Mathematical Methods for Robotics, Vision, and Graphics

Doug James (and Justin Solomon)

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 1 / 1

Page 2: Conjugate Gradients II: CG and Variants

Idea

“Easy to apply,hard to invert.”

min~x

[1

2~x>A~x−~b>~x + c

]CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 2 / 1

Page 3: Conjugate Gradients II: CG and Variants

Line Search Along ~d from ~x

minαg(α) ≡ f (~x + α~d)

α =~d>(~b− A~x)~d>A~d

=~d>~r

~d>A~dCS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 3 / 1

Page 4: Conjugate Gradients II: CG and Variants

Conjugate Directions

A-Conjugate VectorsTwo vectors ~v and ~w are A-conjugate if

~v>A~w = 0.

Corollary

Suppose {~v1, . . . , ~vn} are A-conjugate. Then, f

is minimized in at most n steps by line searching

in direction ~v1, then direction ~v2, and so on.

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 4 / 1

Page 5: Conjugate Gradients II: CG and Variants

Another Clue

~rk ≡ ~b− A~xk~rk+1 = ~rk − αk+1A~vk+1

PropositionWhen performing gradient descent on f ,

span {~r0, . . . , ~rk} = span {~r0, A~r0, . . . , Ak~r0}.

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 5 / 1

Page 6: Conjugate Gradients II: CG and Variants

Gradient Descent: Issue

~xk − ~x0 6= argmin~v∈span {~r0,A~r0,...,Ak−1~r0}

f (~x0 + ~v)

But if this did hold...Convergence in n steps!

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 6 / 1

Page 7: Conjugate Gradients II: CG and Variants

Outline for CG

1. [Somehow] generate search direction ~vk

(initialize to ~r0)

2. Line search: αk =~v>k ~rk−1~v>k A~vk

3. Update estimate: ~xk = ~xk−1 + αk~vk

4. Update residual: ~rk = ~rk−1 − αkA~vk

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 7 / 1

Page 8: Conjugate Gradients II: CG and Variants

What We (Greedily) Want

1. Easy way to generate n conjugate directions

{~v1, . . . , ~vn}2. span {~v1, . . . , ~vk} =

span {~r0, A~r0, . . . , Ak−1~r0} for all k

1. Converges in n steps

2. Always better than gradient descent

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 8 / 1

Page 9: Conjugate Gradients II: CG and Variants

What We (Greedily) Want

1. Easy way to generate n conjugate directions

{~v1, . . . , ~vn}2. span {~v1, . . . , ~vk} =

span {~r0, A~r0, . . . , Ak−1~r0} for all k

1. Converges in n steps

2. Always better than gradient descent

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 8 / 1

Page 10: Conjugate Gradients II: CG and Variants

Method of Conjugate Directions

~vk ← Ak−1~r0 −∑i<k

〈Ak−1~r0, ~vi〉A〈~vi, ~vi〉A

~vi

(A-)Gram-Schmidt on ~r0, . . . , Ak−1~r0

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 9 / 1

Page 11: Conjugate Gradients II: CG and Variants

Problems with Method ofConjugate Directions

I Ak−1~r0 looks like largest eigenvector of A, soprojection is poorly conditioned

I Have to store ~v1, . . . , ~vk−1

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 10 / 1

Page 12: Conjugate Gradients II: CG and Variants

Small Adjustment

Only need a search direction

~w ∈ span {~r0, . . . , Ak−1~r0}\span {~r0, . . . , Ak−2~r0}.

~rk = ~rk−1 − αkA~vk −→ Take ~vk = ~rk−1.

~vk ← ~rk−1 −∑i<k

〈~rk−1, ~vi〉A〈~vi, ~vi〉A

~vi

Fixed conditioning, but still have to store ~vi’s

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 11 / 1

Page 13: Conjugate Gradients II: CG and Variants

Small Adjustment

Only need a search direction

~w ∈ span {~r0, . . . , Ak−1~r0}\span {~r0, . . . , Ak−2~r0}.

~rk = ~rk−1 − αkA~vk

−→ Take ~vk = ~rk−1.

~vk ← ~rk−1 −∑i<k

〈~rk−1, ~vi〉A〈~vi, ~vi〉A

~vi

Fixed conditioning, but still have to store ~vi’s

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 11 / 1

Page 14: Conjugate Gradients II: CG and Variants

Small Adjustment

Only need a search direction

~w ∈ span {~r0, . . . , Ak−1~r0}\span {~r0, . . . , Ak−2~r0}.

~rk = ~rk−1 − αkA~vk −→ Take ~vk = ~rk−1.

~vk ← ~rk−1 −∑i<k

〈~rk−1, ~vi〉A〈~vi, ~vi〉A

~vi

Fixed conditioning, but still have to store ~vi’s

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 11 / 1

Page 15: Conjugate Gradients II: CG and Variants

Small Adjustment

Only need a search direction

~w ∈ span {~r0, . . . , Ak−1~r0}\span {~r0, . . . , Ak−2~r0}.

~rk = ~rk−1 − αkA~vk −→ Take ~vk = ~rk−1.

~vk ← ~rk−1 −∑i<k

〈~rk−1, ~vi〉A〈~vi, ~vi〉A

~vi

Fixed conditioning, but still have to store ~vi’s

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 11 / 1

Page 16: Conjugate Gradients II: CG and Variants

Remarkable Observation

Proposition

〈~rk, ~v`〉A = 0 for all ` < k.

Warning: Technical.

~vk ← ~rk−1 −〈~rk−1, ~vk−1〉A〈~vk−1, ~vk−1〉A

~vk−1

No memory!

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 12 / 1

Page 17: Conjugate Gradients II: CG and Variants

Remarkable Observation

Proposition

〈~rk, ~v`〉A = 0 for all ` < k.

Warning: Technical.

~vk ← ~rk−1 −〈~rk−1, ~vk−1〉A〈~vk−1, ~vk−1〉A

~vk−1

No memory!

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 12 / 1

Page 18: Conjugate Gradients II: CG and Variants

Conjugate Gradients Algorithm

1. Update search direction:

~vk = ~rk−1 − 〈~rk−1,~vk−1〉A〈~vk−1,~vk−1〉A~vk−1

2. Line search: αk =~v>k ~rk−1~v>k A~vk

3. Update estimate: ~xk = ~xk−1 + αk~vk

4. Update residual: ~rk = ~rk−1 − αkA~vk

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 13 / 1

Page 19: Conjugate Gradients II: CG and Variants

Properties

I ~xk optimal in subspace spanned by first k

directions ~vi

I f (~xk) upper-bounded by f of k-th iterate of

gradient descent

I Converges in n steps

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 14 / 1

Page 20: Conjugate Gradients II: CG and Variants

Properties

I ~xk optimal in subspace spanned by first k

directions ~vi

I f (~xk) upper-bounded by f of k-th iterate of

gradient descent

I Converges in n steps

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 14 / 1

Page 21: Conjugate Gradients II: CG and Variants

Properties

I ~xk optimal in subspace spanned by first k

directions ~vi

I f (~xk) upper-bounded by f of k-th iterate of

gradient descent

I Converges in n steps

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 14 / 1

Page 22: Conjugate Gradients II: CG and Variants

Visualization

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 15 / 1

Page 23: Conjugate Gradients II: CG and Variants

Nicer Form

See book.

I Update search direction: βk =~r>k−1~rk−1~r>k−2~rk−2

~vk = ~rk−1 + βk~vk−1

I Line search: αk =~r>k−1~rk−1~v>k A~vk

I Update estimate: ~xk = ~xk−1 + αk~vk

I Update residual: ~rk = ~rk−1 − αkA~vkCS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 16 / 1

Page 24: Conjugate Gradients II: CG and Variants

Typical Stopping Conditions

Don’t run to completion!

‖~rk‖2‖~r0‖2

< ε

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 17 / 1

Page 25: Conjugate Gradients II: CG and Variants

Convergence

f (~xk)− f (~x∗)f (~x0)− f (~x∗)

≤ 2

(√κ− 1√κ + 1

)k

Still depends on condition number!

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 18 / 1

Page 26: Conjugate Gradients II: CG and Variants

Convergence

f (~xk)− f (~x∗)f (~x0)− f (~x∗)

≤ 2

(√κ− 1√κ + 1

)kStill depends on condition number!

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 18 / 1

Page 27: Conjugate Gradients II: CG and Variants

Observation

condA 6= condPA

But we can solve

PA~x = P~b

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 19 / 1

Page 28: Conjugate Gradients II: CG and Variants

Preconditioning

Solve PA~x = P~b forP ≈ A−1.

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 20 / 1

Page 29: Conjugate Gradients II: CG and Variants

Two Issues

I PA may not be symmetric or positive

definite

I Need to find an “easy” P

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 21 / 1

Page 30: Conjugate Gradients II: CG and Variants

Symmetrization

P symmetric and positive definite

=⇒ P−1 = EE>.

Proposition

PA and E−1AE−> have the same eigenvalues.

1. Solve E−1AE−>~y = E−1~b

2. Solve ~x = E−>~y

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 22 / 1

Page 31: Conjugate Gradients II: CG and Variants

Symmetrization

P symmetric and positive definite

=⇒ P−1 = EE>.

Proposition

PA and E−1AE−> have the same eigenvalues.

1. Solve E−1AE−>~y = E−1~b

2. Solve ~x = E−>~y

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 22 / 1

Page 32: Conjugate Gradients II: CG and Variants

Symmetrization

P symmetric and positive definite

=⇒ P−1 = EE>.

Proposition

PA and E−1AE−> have the same eigenvalues.

1. Solve E−1AE−>~y = E−1~b

2. Solve ~x = E−>~y

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 22 / 1

Page 33: Conjugate Gradients II: CG and Variants

Better-Conditioned CG

Update search direction: βk =~r>k−1~rk−1~r>k−2~rk−2

~vk = ~rk−1 + βk~vk−1

Line search: αk =~r>k−1~rk−1

~v>k E−1AE−>~vk

Update estimate: ~yk = ~yk−1 + αk~vk

Update residual: ~rk = ~rk−1 − αkE−1AE−>~vk

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 23 / 1

Page 34: Conjugate Gradients II: CG and Variants

Preconditioned CG

Update search direction: βk =r>k−1P rk−1r>k−2P rk−2

vk = P rk−1 + βkvk−1

Line search: αk =~r>k−1P~rk−1v>k Avk

Update estimate: ~xk = ~xk−1 + αkvkUpdate residual: rk = rk−1 − αkAvk

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 24 / 1

Page 35: Conjugate Gradients II: CG and Variants

Common Preconditioners

I Diagonal (“Jacobi”) A ≈ D

I Sparse approximate inverse

I Incomplete Cholesky A ≈ L∗L>∗

I Domain decomposition

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 25 / 1

Page 36: Conjugate Gradients II: CG and Variants

Other Iterative Schemes I

I Splitting: A =M −N =⇒ M~x = N~x +~b

I Conjugate gradient normal equation residual

(CGNR): A>A~x = A>~b

I Conjugate gradient normal equation error

(CGNE): AA>~y = ~b; ~x = A>~y

I MINRES, SYMLQ: g(~x) ≡ ‖~b− A~x‖22 for

symmetric A

I LSQR, LSMR: Normal equations, same gCS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 26 / 1

Page 37: Conjugate Gradients II: CG and Variants

Other Iterative Schemes II

I GMRES, QMR, BiCG, CGS, BiCGStab: Any

invertible A

I Fletcher-Reeves, Polak-Ribiere: Nonlinear

problems; replace residual with −∇f and

add back line search

Next

CS 205A: Mathematical Methods Conjugate Gradients II: CG and Variants 27 / 1