Dependable direct solutions for linear systems using a little extra precision

29
Dependable direct solutions for linear systems using a little extra precision E. Jason Riedy [email protected] CSE Division, College of Computing Georgia Institute of Technology 21 August, 2009 Jason Riedy (GATech) Dependable solver 21 Aug, 2009 1 / 28

description

Georgia Tech CSE Seminar, Friday, August 21,2009 Solving a square linear system Ax=b often is considered a black box. It's supposed to "just work," and failures often are blamed on the original data or subtleties of floating-point. Now that we have an abundance of cheap computations, however, we can do much better. A little extra precision in just the right places produces accurate solutions cheaply or demonstrates when problems are too hard to solve without significant cost. This talk will outline the method, iterative refinement with a new twist; the benefits, small backward and forward errors; and the trade-offs and unexpected benefits.

Transcript of Dependable direct solutions for linear systems using a little extra precision

Page 1: Dependable direct solutions for linear systems using a little extra precision

Dependable direct solutions for linear systems

using a little extra precision

E. Jason [email protected]

CSE Division, College of ComputingGeorgia Institute of Technology

21 August, 2009

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 1 / 28

Page 2: Dependable direct solutions for linear systems using a little extra precision

Ax = b

Workhorse in scientific computingTwo primary linear algebra problems: Ax = b, Av = λv

Many applications reduce problems into those, often Ax = b.I PDEs: Discretize to one of the above.I Optimization: Solve one at each step.I ...

Commercial supercomputers are built for Ax = b: Linpack

Many people work to solve Ax = b faster.

Today’s focus is solving it better.

(I’m oversimplifying in many ways. And better can lead to faster.)

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 2 / 28

Page 3: Dependable direct solutions for linear systems using a little extra precision

Outline

1 What do I mean by “better”?

2 Refining to more accurate solutions with extra precision

3 Other applications of better: faster, more scalable

Most of this work was done at/with UC Berkeley in conjunction withYozo Hida, Dr. James Demmel, Dr. Xiaoye Li (LBL), and a longsequence of then-undergrads (M. Vishvanath, D. Lu, D. Halligan, ...).

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 3 / 28

Page 4: Dependable direct solutions for linear systems using a little extra precision

Errors in Ax = b

(A, b)

(A, b)

x (0)

x

A−1 b

A−1b

Difficulty

Bac

kwar

d E

rror

2^−50

2^−40

2^−30

2^−20

2^−10

2^10 2^20 2^30

backward error (berr)

Difficulty

For

war

d E

rror

2^−25

2^−20

2^−15

2^−10

2^−5

2^5 2^10 2^15 2^20 2^25 2^30

forward error (ferr)Jason Riedy (GATech) Dependable solver 21 Aug, 2009 4 / 28

Page 5: Dependable direct solutions for linear systems using a little extra precision

Goals for errors in Ax = b

(A, b)

(A, b)

x (0)

xx (k)

x (k)A−1 b

A−1b

Difficulty

Bac

kwar

d E

rror

2^−50

2^−40

2^−30

2^−20

2^−10

2^10 2^20 2^30

backward error (berr)

Difficulty

For

war

d E

rror

2^−25

2^−20

2^−15

2^−10

2^−5

2^5 2^10 2^15 2^20 2^25 2^30

forward error (ferr)Jason Riedy (GATech) Dependable solver 21 Aug, 2009 5 / 28

Page 6: Dependable direct solutions for linear systems using a little extra precision

Possible methods

Interval arithmeticI Tells you when there’s a problem, not how to solve it.I Finding the optimal enclosure is NP-hard!

Exact / rational arithmeticI Storage (& computation) grows exponentially with dimension.

Telescoping precisions (increase precision throughout)I Increases the cost of the O(n3) portion.

Iterative refinementI O(n2) extra work after O(n3) factorization.I Only a little extra precision necessary!I Downside: Dependable, but not validated.

Dependable solverReduce the error to the precision’s limit as often as reasonable, or

clearly indicate when the result is unsure.Jason Riedy (GATech) Dependable solver 21 Aug, 2009 6 / 28

Page 7: Dependable direct solutions for linear systems using a little extra precision

What I’m not going to explain deeply

Precise definition of difficulty:I A condition number relevant to the error in consideration, or,I roughly, the error measure’s sensitivity to perturbation near the

solution.

Numerical scaling / equilibration:I Assume all numbers in the input are roughly in the same scale.I Rarely true for computer-produced problems.I Common cases easy to handle; obscures the important points.I Note: Poor scaling produces simple ill-conditioning.

Details of when each error measure is appropriate.I Backward: normwise, columnwise, componentwise, ...I Forward: normwise, componentwise, ...

All three are inter-linked and address norms.

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 7 / 28

Page 8: Dependable direct solutions for linear systems using a little extra precision

Outline

1 What do I mean by “better”?

2 Refining to more accurate solutions with extra precision

3 Other applications of better: faster, more scalable

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 8 / 28

Page 9: Dependable direct solutions for linear systems using a little extra precision

Iterative refinement

Newton’s method for Ax = bAssume A is n × n, non-singular, factored PA = LU , etc.

1 Solve Ax (0) = b2 Repeat for i = 0, ...:

1 Compute residual r (i) = b − Ax (i).2 (Check backward error criteria)3 Solve Adx (i) = r (i).4 (Check forward error criteria)5 Update x (i+1) = x (i) + dx (i).

Overall algorithm is well-known (Forsythe & Moler, 1967...).

In exact arithmetic, would converge in one step.

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 9 / 28

Page 10: Dependable direct solutions for linear systems using a little extra precision

Iterative refinement

Newton’s method for Ax = bAssume A is n × n, non-singular, factored PA = LU , etc.

1 Solve Ax (0) = b2 Repeat for i = 0, ...:

1 Compute residual r (i) = b − Ax (i). (Using double precision.)2 (Check backward error criteria)3 Solve Adx (i) = r (i). (Using working/single precision.)4 (Check forward error criteria)5 Update x (i+1) = x (i) + dx (i). (New: x with double precision.)

No extra precision: Reduce backward error in one step [Skeel].

A bit of double precision: Reduce errors much, much further.

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 9 / 28

Page 11: Dependable direct solutions for linear systems using a little extra precision

Why should this work?

A brief, informal excursion into the analysis...

r (i) = b − Ax (i) +δr (i)

(A +δA(i))dx (i) = r (i)

x (i+1) = x (i) + dx (i) +δx (i+1)

Very roughly (not correct, approximating behavior, see Lawn165):

Backward Error (Residual)

r (i+1) ≈ εwA A−1 r (i) + A δx (i) + δr (i)

Forward Error

e(i+1) ≈ εwA−1 A e(i) + δx (i) + A−1 δr (i)

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 10 / 28

Page 12: Dependable direct solutions for linear systems using a little extra precision

Test cases

One million random, single-precision, 30× 30 systems Ax = bI 250k: A generated to cover factorization difficultyI Four (x , b), two with random x and two with random bI Solve for true x using greater than quad precision.I Working precision: εw = 2−24 ≈ 6× 10−8

I Extra / double precision: εx = 2−53 ≈ 10−16

Using single precision and small becauseI generating and running one million tests takes time, and alsoI it’s easier to hit difficult cases!

Results apply to double, complex & double complex (with 2√

2factor). Also on tests (fewer) running beyond 1k × 1k .

Note: Same plots apply to sparse matrices in various collections,but far fewer than 1M test cases.

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 11 / 28

Page 13: Dependable direct solutions for linear systems using a little extra precision

Backward error results (before)

Difficulty

Bac

kwar

d E

rror

2^−50

2^−40

2^−30

2^−20

2^−10

2^10 2^20 2^30

All working

2^10 2^20 2^30

All double

Omitting double-prec residuals; same limiting error as allworking.

All-double backward error is for the double-prec x .Jason Riedy (GATech) Dependable solver 21 Aug, 2009 12 / 28

Page 14: Dependable direct solutions for linear systems using a little extra precision

Backward error results (after)

Difficulty

Bac

kwar

d E

rror

2^−50

2^−40

2^−30

2^−20

2^−10

2^10 2^20 2^30

All working

2^10 2^20 2^30

All double

Omitting double-prec residuals; same limiting error as allworking.

All-double backward error is for the double-prec x .Jason Riedy (GATech) Dependable solver 21 Aug, 2009 13 / 28

Page 15: Dependable direct solutions for linear systems using a little extra precision

Forward error results (before)

Difficulty

For

war

d E

rror

2^−25

2^−20

2^−15

2^−10

2^−5

2^5 2^10 2^15 2^20 2^25 2^30

All working

2^5 2^10 2^15 2^20 2^25 2^30

All double

Omitting double-prec residuals; same limiting error as allworking.

All-double forward error is for the single-prec x .Jason Riedy (GATech) Dependable solver 21 Aug, 2009 14 / 28

Page 16: Dependable direct solutions for linear systems using a little extra precision

Forward error results (after)

Difficulty

For

war

d E

rror

2^−25

2^−20

2^−15

2^−10

2^−5

2^5 2^10 2^15 2^20 2^25 2^30

All working

2^5 2^10 2^15 2^20 2^25 2^30

All double

Omitting double-prec residuals; same limiting error as allworking.

All-double forward error is for the single-prec x .Jason Riedy (GATech) Dependable solver 21 Aug, 2009 15 / 28

Page 17: Dependable direct solutions for linear systems using a little extra precision

Iteration costs: backward error

Convergence to εw

Convergence step

Em

piric

al C

DF

0.0

0.2

0.4

0.6

0.8

1.0

0 5 10 15 20 25 30

All workingResidual doubleAll double

Convergence to 10εw

Convergence step

Em

piric

al C

DF

0.0

0.2

0.4

0.6

0.8

1.0

1.0 1.5 2.0 2.5 3.0

All workingResidual doubleAll double

Practical: Stop when backward error is tiny or makes little progress.

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 16 / 28

Page 18: Dependable direct solutions for linear systems using a little extra precision

Iteration costs: backward error

Convergence to ε2w

Convergence step

Em

piric

al C

DF

0.0

0.2

0.4

0.6

0.8

1.0

5 10 15 20 25 30

All workingResidual doubleAll double

Convergence to 10ε2w

Convergence step

Em

piric

al C

DF

0.0

0.2

0.4

0.6

0.8

1.0

5 10 15 20 25 30

All workingResidual doubleAll double

Practical: Stop when backward error is tiny or makes little progress.

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 17 / 28

Page 19: Dependable direct solutions for linear systems using a little extra precision

Iteration costs: forward error

Convergence to εw

Convergence step

Em

piric

al C

DF

0.0

0.2

0.4

0.6

0.8

1.0

5 10 15 20 25 30

All workingResidual doubleAll double

Convergence to√

N · εw

Convergence step

Em

piric

al C

DF

0.0

0.2

0.4

0.6

0.8

1.0

0 5 10 15 20 25 30

All workingResidual doubleAll double

Practical: Stop when dx is tiny or makes little progress.

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 18 / 28

Page 20: Dependable direct solutions for linear systems using a little extra precision

Performance costs

Overhead each phase by precision and type

Overhead is time for phase (incl. fact.) / time for factorization

Dimension

Ove

rhead

2

4

6

10^1.0 10^1.5 10^2.0 10^2.5 10^3.0 10^3.5

singlereal

doublereal

singlecomplex

10^1.0 10^1.5 10^2.0 10^2.5 10^3.0 10^3.5

2

4

6

doublecomplex

Factorization+ Refinement loop

+ Condition numbers+ Ref. + Cond.

Itanium 2Relatively balancedcpu / mem arch.

Double faster thansingle

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 19 / 28

Page 21: Dependable direct solutions for linear systems using a little extra precision

Performance costs

Overhead each phase by precision and type

Overhead is time for phase (incl. fact.) / time for factorization

Dimension

Ove

rhead

2

4

6

8

10^1.0 10^1.5 10^2.0 10^2.5 10^3.0 10^3.5

singlereal

doublereal

singlecomplex

10^1.0 10^1.5 10^2.0 10^2.5 10^3.0 10^3.5

2

4

6

8

doublecomplex

Factorization+ Refinement loop

+ Condition numbers+ Ref. + Cond.

Xeon 3ghzHorribly unbalancedcpu / mem arch.

(Not parallel)

Vector instructions

No vectorization inextra precision ops.

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 20 / 28

Page 22: Dependable direct solutions for linear systems using a little extra precision

Outline

1 What do I mean by “better”?

2 Refining to more accurate solutions with extra precision

3 Other applications of better: faster, more scalable

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 21 / 28

Page 23: Dependable direct solutions for linear systems using a little extra precision

Obvious applications of better

Available in LapackRoutines SGESVXX, DGESVXX, CGESVXX, ZGESVXX

Experimental interface, subject to changes

High-level environmentsDo you want to think about all error conditions all the time?

Should be in Octave & MatlabTM:

x = A\b;The same technique applies to overdetermined least-squares[Lawn188; Demmel, Hida, Li, Riedy]. R or S+ (statistics):

model <- lm(response~var)

I Refine the augmented system

[A αI0 AT

] [x

r/α

]=

[b0

]. [Bjorck]

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 22 / 28

Page 24: Dependable direct solutions for linear systems using a little extra precision

Not so obvious application: Speed!

When single precision is much faster than double...Assume: Targeting backward error, often well-conditioned

Factor A in single precision, use for Adxi = r .

Refine to dp backward error, or fall back to using dp overall.

Earlier Cell (extra slow double): 12 Gflop/s ⇒ 150 Gflop/s![Lawn175; Langou2, Luszczek, Kurzak, Buttari, Dongarra]

(Independent path to the same destination.)

When single precision fits more into memory...Sparse, sparse out-of-core

I Generally limited by indexing performance [Hogg & Scott]I Could use packed data structures from Cell [Williams, et al.]

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 23 / 28

Page 25: Dependable direct solutions for linear systems using a little extra precision

Not so obvious application: Scalability!

When pivoting is a major bottleneck...Sparse, unsymmetric LU factorization:

I Completely separate structural analysis from numerical work.I Introduce backward errors to avoid entry growth.I Fix with refinement.I (SuperLU [Demmel, Li, (+ me)], earlier sym.indef. work)

When pivoting blocks practical theory...

Communication-optimal algorithms for O(n3) linear algebraI Trade some computation for optimal memory transfers / comm.

[Lawn218; Ballard, Demmel, Holtz, Schwartz]I Codes exist, are fast, etc.

But LU cannot use partial pivoting!I Use a new strategy [Demmel, Grigori, Xiang], refine...

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 24 / 28

Page 26: Dependable direct solutions for linear systems using a little extra precision

Summary

We can construct an inexpensive, dependable solver for Ax = b.I Compute an accurate answer whenever feasible.I Reliably detect failures / unsure, even for the forward error.

We can compute better results for Ax = b.I Trade some computation, a little bandwidth for accuracy.I Important bit is keeping all the limiting terms (residual,

solution) to extra precision

Better results can help solve Ax = b more quickly.I Start with a sloppy solver and fix it.

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 25 / 28

Page 27: Dependable direct solutions for linear systems using a little extra precision

Questions / Backup

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 26 / 28

Page 28: Dependable direct solutions for linear systems using a little extra precision

Doubled-precision

Represent a ◦ b exactly as a pair (h, t).

Old algorithms [Knuth, Dekker, Linnainmaa, Kahan; 60s & 70s]

Work on any faithful arithmetic [Priest]

Additionh = a + b

z = h − a

t = (a− (h− z)) + (b− z)

Multiplicationh = a · b(ah, at) = split(a)

(bh, bt) = split(b)

t = ah · at − h

t = ((t + (ah ∗ bt)) +(at ∗ bh)) + (at ∗ bt)

See qd package from [Bailey, Hida, Li]; recent pubs from [Rump,Ogita, Oishi].

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 27 / 28

Page 29: Dependable direct solutions for linear systems using a little extra precision

Iteration costs: backward error to double

Convergence to εx

Convergence step

Em

piric

al C

DF

0.0

0.2

0.4

0.6

0.8

1.0

5 10 15 20 25 30

All workingResidual doubleAll double

Convergence to 10εx

Convergence step

Em

piric

al C

DF

0.0

0.2

0.4

0.6

0.8

1.0

5 10 15 20 25 30

All workingResidual doubleAll double

Practical: Stop when backward error is tiny or makes little progress.

Jason Riedy (GATech) Dependable solver 21 Aug, 2009 28 / 28