ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean?...

37
Approximate Undo: SVD and Least Squares

Transcript of ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean?...

Page 1: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Approximate Undo: SVD and LeastSquares

Page 2: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Singular Value Decomposition i

What is the Singular Value Decomposition (‘SVD’)?

The SVD is a factorization of an m× n matrix into

A = UΣVT , where

• U is an m×m orthogonal matrix(Its columns are called ‘left singular vectors’.)

1

Page 3: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Singular Value Decomposition ii

• Σ is an m× n diagonal matrixwith the singular values on the diagonal

Σ =

σ1

. . .σn

0

Convention: σ1 > σ2 > · · · > σn > 0.

• VT is an n× n orthogonal matrix(V’s columns are called ‘right singular vectors’.)

2

Page 4: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Computing the SVD i

How can I compute an SVD of a matrix A?

1. Compute the eigenvalues and eigenvectors of ATA.

ATAv1 = λ1v1 · · · ATAvn = λnvn

2. Make a matrix V from the vectors vi:

V =

| |v1 · · · vn| |

.

(ATA symmetric: V orthogonal if columns have norm 1.)

3

Page 5: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Computing the SVD ii

3. Make a diagonal matrix Σ from the square roots of theeigenvalues:

Σ =

√λ1

. . .√λn 0

4. Find U from

A = UΣVT ⇔ UΣ = AV.

(While being careful about non-squareness and zerosingular values)

4

Page 6: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Computing the SVD iii

In the simplest case:

U = AVΣ−1.

Observe U is orthogonal: (Use: VTATAV = Σ2)

UTU = Σ−1 VTATAV︸ ︷︷ ︸Σ2

Σ−1 = Σ−1Σ2Σ−1 = I.

(Similar for UUT .)

5

Page 8: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

How Expensive is it to Compute the SVD? i

Demo: Relative cost of matrix factorizations (click to visit)

7

Page 9: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

‘Reduced’ SVD i

Is there a ‘reduced’ factorization for non-square matrices?

8

Page 10: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

‘Reduced’ SVD ii

9

Page 11: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

‘Reduced’ SVD iii

Yes:

10

Page 12: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

‘Reduced’ SVD iv

• “Full” version shown in black

• “Reduced” version shown in red

11

Page 13: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

SVD: Applications

Page 14: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Solve Square Linear Systems i

Can the SVD A = UΣVT be used to solve square linear systems?At what cost (once the SVD is known)?

Yes, easy:

Ax = bUΣVTx = bΣ VTx︸︷︷︸

y:=

= UTb

(diagonal, easy to solve) Σy = UTb

Know y, find x = Vy.

12

Page 15: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Solve Square Linear Systems ii

Cost: O(n2)–but more operations than using fw/bw subst. Evenworse when including comparison of LU vs. SVD.

13

Page 16: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Tall and Skinny Systems i

Consider a ‘tall and skinny’ linear system, i.e. one that hasmore equations than unknowns:

In the figure: m > n. How could we solve that?

14

Page 17: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Tall and Skinny Systems ii

First realization: A square linear system often only has a singlesolution. So applying more conditions to the solution willmean we have no exact solution.

Ax = b ← Not going to happen.

Instead: Find x so that ‖Ax − b‖2 is as small as possible.

r = Ax − b is called the residual of the problem.

‖Ax − b‖22 = r21 + · · ·+ r2m ← squares

This is called a (linear) least-squares problem. Since

Find x so that ‖Ax − b‖2 is as small as possible.

15

Page 18: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Tall and Skinny Systems iii

is too long to write every time, we introduce a shorter notation:

Ax ∼= b.

16

Page 19: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Solving Least-Squares i

How can I actually solve a least-squares problem Ax ∼= b?

The job: Make ‖Ax − b‖2 is as small as possible.

Equivalent: Make ‖Ax − b‖22 is as small as possible.

Use: The SVD A = UΣVT .

17

Page 20: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Solving Least-Squares ii

Find x to minimize:

‖Ax − b‖22=

∥∥UΣVTx − b∥∥22=

∥∥UT(UΣVTx − b)∥∥22 (because U is orthogonal)

=

∥∥∥∥∥∥Σ VTx︸︷︷︸y

−UTb

∥∥∥∥∥∥2

2

=∥∥Σy − UTb∥∥22

18

Page 21: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Solving Least-Squares iii

What y minimizes

∥∥Σy − UTb∥∥22 =∥∥∥∥∥∥∥∥∥∥∥∥

σ1

. . .σk

00

y − z

∥∥∥∥∥∥∥∥∥∥∥∥

2

2

?

Pick

yi ={zi/σi if σi 6= 0,0 if σi = 0.

Find x = Vy, done.

19

Page 22: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Solving Least-Squares iv

Slight technicality: There only is a choice if some of the σi arezero. (Otherwise y is uniquely determined.) If there is a choice,this y is the one with the smallest 2-norm that also minimizesthe 2-norm of the residual. And since ‖x‖2 = ‖y‖2 (because Vis orthogonal), x also has the smallest 2-norm of all x′ forwhich ‖Ax′ − b‖2 is minimal.

20

Page 23: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

In-class activity: SVD and Least Squares

21

Page 24: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

The Pseudoinverse: A Shortcut for Least Squares i

How could the solution process for Ax ∼= b be with an SVDA =

UΣVT be ‘packaged up’?

UΣVTx ≈ b⇔ x ≈ VΣ−1UTb

Problem: Σ may not be invertible.

Idea 1: Define a ‘pseudo-inverse’ Σ+ of a diagonal matrix Σ as

Σ+i =

{σ−1i if σi 6= 0,0 if σi = 0.

22

Page 25: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

The Pseudoinverse: A Shortcut for Least Squares ii

Then Ax ∼= b is solved by VΣ+UTb.

Idea 2: Call A+ = VΣ+UT the pseudo-inverse of A.

Then Ax ∼= b is solved by A+b.

23

Page 26: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

The Normal Equations i

You may have learned the ‘normal equations’ ATAx = ATb tosolve Ax ∼= b.Why not use those?

cond(ATA) ≈ cond(A)2

I.e. if A is even somewhat poorly conditioned, then theconditioning of ATA will be a disaster.

The normal equations are not well-behaved numerically.

24

Page 27: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Fitting a Model to Data i

How can I fit a model to measurements? E.g.:

Number of days in polar vortex

Mood

Time t

m

Maybe:m̂(t) = α+ βt + γt2

25

Page 28: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Fitting a Model to Data iiHave: 300 data pooints: (t1,m1), . . . , (t300,m300)

Want: 3 unknowns α, β, γ

Write down equations:

α+ βt1 + γt21 ≈ m1

α+ βt2 + γt22 ≈ m2...

......

α+ βt300 + γt2300 ≈ m300

1 t1 t211 t2 t22...

......

1 t300 t2300

α

β

γ

∼=

m1

m2...

m300

.

So data fitting is just like interpolation, with a Vandermondematrix:

Vα = m.

Only difference: More rows. Solvable using the SVD.

26

Page 30: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Meaning of the Singular Values i

What do the singular values mean? (in particular thefirst/largest one)

A = UΣVT

‖A‖2 = max‖x‖2=1

‖Ax‖2 = max‖x‖2=1

∥∥UΣVTx∥∥2 U orth.= max‖x‖2=1

∥∥ΣVTx∥∥2V orth.= max

‖VTx‖2=1

∥∥ΣVTx∥∥2 Let y = VTx= max

‖y‖2=1‖Σy‖2

Σ diag.= σ1.

So the SVD (finally) provides a way to find the 2-norm.

28

Page 31: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Meaning of the Singular Values ii

Entertainingly, it does so by reducing the problem to findingthe 2-norm of a diagonal matrix.

‖A‖2 = σ1.

29

Page 32: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Condition Numbers i

How would you compute a 2-norm condition number?

cond2(A) = ‖A‖2∥∥A−1∥∥2 = σ1/σn.

30

Page 33: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

SVD as Sum of Outer Products i

What’s another way of writing the SVD?

Starting from (assuming m > n for simplicity)

A = UΣVT =

| |u1 · · · um| |

σ1. . .

σm

0

− v1 −

...− vn −

31

Page 34: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

SVD as Sum of Outer Products ii

we find that

A =

| |u1 · · · um| |

− σ1v1 −

...− σnvn −

= σ1u1vT1 + σ2u2vT2 + · · ·+ σnunvTn.

That means: The SVD writes the matrix A as a sum of outerproducts (of left/right singular vectors). What could that begood for?

32

Page 35: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Low-Rank Approximation (I) i

What is the rank of σ1u1vT1 ?

1. (1 linearly independent column!)

What is the rank of σ1u1vT1 + σ2u2vT2?

2. (2 linearly independent–orthogonal–columns!)

Demo: Image compression (click to visit)

33

Page 36: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Low-Rank Approximation i

What can we say about the low-rank approximation

Ak = σ1u1vT1 + · · ·+ σkukvTk

toA = σ1u1vT1 + σ2u2vT2 + · · ·+ σnunvTn?

For simplicity, assume σ1 > σ2 > · · · > σn>0.

Observe that Ak has rank k. (And A has rank n.)

Then ‖A− B‖2 among all rank-k (or lower) matrices B isminimized by Ak.

(Eckart-Young Theorem)

34

Page 37: ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean? (inparticularthe first/largestone) A= U VT kAk2 = max kxk2 =1 kAxk2 = max kxk2 =1 UU VTx

Low-Rank Approximation iiEven better:

minrankB6k

‖A− B‖2 = ‖A− Ak‖2 = σk+1.

Ak is called the best rank-k approximation to A.

(where k can be any number)

This best-approximation property is what makes the SVDextremely useful in applications and ultimately justifies itshigh cost.

It’s also the rank-k best-approximation in the Frobenius norm:

minrankB6k

‖A− B‖F = ‖A− Ak‖F =√σ2k+1 + · · ·σ

2n.

35