Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we...

24
Iterative Methods for Linear Systems Eran Treister Computer Science Department, Ben-Gurion University of the Negev, Israel. March 31, 2019 1 / 24

Transcript of Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we...

Page 1: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

Iterative Methods for Linear Systems

Eran Treister

Computer Science Department,Ben-Gurion University of the Negev,

Israel.

March 31, 2019

1 / 24

Page 2: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

Iterative methods for linear systems

Definition

An iterative method is defined as

x(k+1) = φ(x(k)),

which simply “looks” only at one previous vector or

x(k+1) = φ(x(k), ..., x(0)),

designed to solveAx = b

2 / 24

Page 3: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

Iterative methods for linear systems

Definition

An iterative method is defined as

x(k+1) = φ(x(k)),

which simply “looks” only at one previous vector or

x(k+1) = φ(x(k), ..., x(0)),

designed to solveAx = b

Initialize with an arbitrary guess x(0).

Iteratively improve this guess until the solution of the linearsystem is achieved up to some accuracy.

Usually applied when direct methods are too expensive orimpossible to use.

3 / 24

Page 4: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

Requirements for Iterative methods

The first requirement from φ is that it converges:

limk→∞{x(k)} = x∗

where Ax∗ = b

The second requirement is that the method converges as fastas possible. We define a convergence rate to be

limk→∞

‖x(k+1) − x∗‖‖x(k) − x∗‖p

= C ,

where p is the order of convergence and C is called theconvergence factor

4 / 24

Page 5: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

Error and Residual

Definition (Error vector)

The vectore(k) = x∗ − x(k)

is called the error vector at iteration k .

For convergence, it should hold limk→∞ e(k) = 0

Definition (Residual vector)

The vectorr(k) = b− Ax(k) = Ae(k)

is called the residual vector.

5 / 24

Page 6: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

Error and Residual

Definition (Error vector)

e(k) = x∗ − x(k)

Definition (Residual vector)

r(k) = b− Ax(k) = Ae(k)

The key difference between the two is that we cannotmeasure the error without knowing the solution, but wecan measure the residual. Note that convergence means

limk→∞{e(k)} = lim

k→∞{r(k)} = 0.

6 / 24

Page 7: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

Simple iterative methods

Split the matrix A into two: A = M + N. Then the linearsystem is written as

Mx + Nx = b,

The iteration is defined as:

x(k+1) = M−1(b− Nx(k)) = x(k) + M−1(b− Ax(k)),

7 / 24

Page 8: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

Simple iterative methods

Split the matrix A into two: A = M + N. Then the linearsystem is written as

Mx + Nx = b,

The iteration is defined as:

x(k+1) = M−1(b− Nx(k)) = x(k) + M−1(b− Ax(k)),

Remark

M (called the preconditioner) is ”inverted” every iteration.

The cost of the solution is naturally comprised from thenumber of iterations times the work needed to “invert” M.

8 / 24

Page 9: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

Practical stopping conditions

We usually stop iterating if one of the following is satisfied forsome tolerance ε:

‖Ax(k) − b‖‖b‖

< ε or‖x(k) − x(k−1)‖‖x(k)‖

< ε.

The left term indicates that the residual is low enoughcompared to a zero solution

The second criterion indicates that the relative change in theiterations is small enough.

9 / 24

Page 10: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

General Iterative Method

Input: A ∈ Rn×n, b ∈ Rn, x(0) ∈ Rn, M,N ∈ Rn×n,maxIter , ε, Convergence criterion Output: x s.t

Ax ≈ b

k = 1, ...,maxIter Apply iteration:x(k) = M−1(b− Nx(k−1)) orx(k) = x(k−1) + M−1(b− Ax(k−1)),

If ‖Ax(k)−b‖‖b‖ < ε or alternatively ‖x(k)−x(k−1)‖

‖x(k)‖ < ε.

Convergence is reached, stop the iterations.

Return x(k) as the solution.

10 / 24

Page 11: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

The Jacobi method

Example

Assume that we need to solve Ax = b:4x1 − x2 + x3 = 74x1 − 8x2 + x3 = −21−2x1 + x2 + 5x3 = 15

(1)

Rewrite: 4 −1 14 −8 1−2 1 5

× x1

x2x3

=

7−2115

(2)

11 / 24

Page 12: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

The Jacobi method

Example

Assume that we need to solve Ax = b: 4 −1 14 −8 1−2 1 5

× x1

x2x3

=

7−2115

(3)

A is a diagonal dominant, so it can be approximated well by adiagonal matrix. Let us split the matrix:

A = D+L+U =

4−8

5

+

4−2 1

+

−1 11

,

12 / 24

Page 13: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

The Jacobi method

Example

A = D+L+U =

4−8

5

+

4−2 1

+

−1 11

,Choosing M = D,the method then becomes (in matrix form):

x(k+1) = D−1(b− (L + U)x(k)) = x(k) + D−1(b− Ax(k)). (4)

In our example this will be: x(k+1)1

x(k+1)2

x(k+1)3

=

14(7 + x

(k)2 − x

(k)3 )

18(21 + 4x

(k)1 + x

(k)3 )

15(15 + 2x

(k)1 − x

(k)2 )

13 / 24

Page 14: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

The Jacobi method

Running the iterations from a guess x (0) = [1, 2, 2] yields:Iter: 0: [1.0, 2.0, 2.0]

Iter: 1: [1.75, 3.375, 3.0]

Iter: 2: [1.84375, 3.875, 3.025]

Iter: 3: [1.9625, 3.925, 2.9625]

Iter: 4: [1.99063, 3.97656, 3.0]

Iter: 5: [1.99414, 3.99531, 3.00094]

Iter: 6: [1.99859, 3.99719, 2.99859]

Iter: 7: [1.99965, 3.99912, 3.0]

Iter: 8: [1.99978, 3.99982, 3.00004]

Iter: 9: [1.99995, 3.99989, 2.99995]

14 / 24

Page 15: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

The Jacobi method

Figure: The residual and error history norm for the Jacobi iterations. Notethe logarithmic scale of the y axis, when plotting convergence history.

15 / 24

Page 16: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

The Gauss-Seidel method

The GS method is achieved by the split:

(L + D)x = b− Ux⇒ (L + D)x(k+1) = b− Ux(k),

Choosing M = L + D, each iteration reads

x(k+1) = (L+D)−1(b− Ux(k)

)= x(k)+(L+D)−1

(b− Ax(k)

).

In scalar form, the method is given by

x(k+1)i =

1

aii

bi −∑j<i

aijx(k+1)j −

∑j>i

aijx(k)j

, i = 1, ..., n.

16 / 24

Page 17: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

The Gauss-Seidel method

GS Iteration:

x(k+1) = (L + D)−1(b− Ux(k)

)= x(k) + (L + D)−1

(b− Ax(k)

).

Example x(k+1)1

x(k+1)2

x(k+1)3

=

14(7 + x

(k)2 − x

(k)3 )

18(21 + 4x

(k+1)1 + x

(k)3 )

15(15 + 2x

(k+1)1 − x

(k+1)2 )

Convergence is much faster than the Jacobi method:Iter: 0: [1.0, 2.0, 2.0]

Iter: 1: [1.75, 3.75, 2.95]

Iter: 2: [1.95, 3.96875, 2.98625]

Iter: 3: [1.99562, 3.99609, 2.99903]

Iter: 4: [1.99927, 3.99951, 2.9998]

Iter: 5: [1.99993, 3.99994, 2.99998]

Iter: 6: [1.99999, 3.99999, 3.0]

Iter: 7: [2.0, 4.0, 3.0]17 / 24

Page 18: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

Convergence of Iterative methods

We saw the general problem:

x(k+1) = x(k) + M−1(b− Ax(k)).

The error at (k+1)-th iteration:

e(k+1) = x∗ − x(k+1) = x∗ − x(k) −M−1(Ax∗ − Ax(k)).

The iteration matrix for the error is given by

e(k+1) = (I −M−1A)︸ ︷︷ ︸T

e(k).

18 / 24

Page 19: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

Convergence of Iterative methods

Assuming T is diagonaizable with eigenpairs (λi , vi ), ande(0) =

∑ni=1 αivi :

e(k+1) = T k+1e(0) = T k+1n∑

i=1

αivi =n∑

i=1

αiλk+1i vi

The error e(k+1) will go to 0 (as k →∞) only if the largesteigenvalue in magnitude is smaller than 1.

Recall: the largest eigenvalue in magnitude is defined as thespectral radius.

19 / 24

Page 20: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

Convergence of Iterative methods

Theorem

Given Ax = b where A is invertible,the general iteration

x(k+1) = x(k) + M−1(b− Ax(k))

converges for any starting vector x(0) if and only if

ρ(I −M−1A) < 1.

This spectral radius is also the convergence factor of the iteration.That is, for every vector norm

limk→∞

‖e(k+1)‖‖e(k)‖

= ρ(I −M−1A).

20 / 24

Page 21: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

Checking Convergence

Remark

Spectral radius is hard to compute. Therefore, we often try to usematrix norms to check convergence, since any matrix norm upperbounds the spectral radius. That is,

‖I −M−1A‖ < 1⇒ ρ(I −M−1A) < 1,

and if we found a norm for which ‖I −M−1A‖ < 1, then ourmethod converges.

Example

In the previous examples, the error iteration matrix is: 0 14

−14

48 0 1

825−15 0

⇒ ‖T‖∞ =5

8< 1.

21 / 24

Page 22: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

Practical Convergence test

Definition (Strictly diagonally dominant matrices (SDD))

A matrix A is strictly diagonally dominant in rows if for every row i

|aii | >∑j 6=i

|aij |

.

Theorem

If the matrix A is strictly diagonally dominant in rows, then bothJacobi and Gauss Seidel methods converge.

22 / 24

Page 23: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

The variational meaning of GS

Consider the following problem:

f (x) =1

2‖x− x∗‖2A =

1

2x>Ax− x>b +

1

2(x∗)>b,

where A is positive definite.

To minimize f (x), we require ∇f (x) = 0 and get a linearsystem Ax = b

In GS, we zero the residual ri for each i, given the other x’s.

The residuals are basically the equations of the gradient.Thus, for each i we require ∂f

∂xi= 0, thus ∇f (x (k)) = 0

Corollary

The updates of Gauss-Seidel for each xi are equivalent tominimizing f (x)

23 / 24

Page 24: Iterative Methods for Linear Systems - BGUnoia192/wiki.files/NOIA... · To minimize f(x), we require rf(x) = 0 and get a linear system Ax = b In GS, we zero the residual r i for each

variational GS

Example (Variational property of Gauss Seidel)

Consider the following linear system:

A =

[2 11 3

], b =

[34

].

It is easy to show that f (x) = x21 + x1x2 + 1.5x22 − 3x1 − 4x2. Thecondition ∇f = 0 in this case is

∂f

∂x1= 2x1 + x2 − 3 = 0 (5)

∂f

∂x2= x1 + 3x2 − 4 = 0 (6)

(7)

24 / 24