ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean?...

Approximate Undo: SVD and LeastSquares

Singular Value Decomposition i

What is the Singular Value Decomposition (‘SVD’)?

The SVD is a factorization of an m× n matrix into

A = UΣVT , where

• U is an m×m orthogonal matrix(Its columns are called ‘left singular vectors’.)

1

Singular Value Decomposition ii

• Σ is an m× n diagonal matrixwith the singular values on the diagonal

Σ =

σ1

. . .σn

0

Convention: σ1 > σ2 > · · · > σn > 0.

• VT is an n× n orthogonal matrix(V’s columns are called ‘right singular vectors’.)

2

Computing the SVD i

How can I compute an SVD of a matrix A?

1. Compute the eigenvalues and eigenvectors of ATA.

ATAv1 = λ1v1 · · · ATAvn = λnvn

2. Make a matrix V from the vectors vi:

V =

| |v1 · · · vn| |

.

(ATA symmetric: V orthogonal if columns have norm 1.)

3

Computing the SVD ii

3. Make a diagonal matrix Σ from the square roots of theeigenvalues:

Σ =

√λ1

. . .√λn 0

4. Find U from

A = UΣVT ⇔ UΣ = AV.

(While being careful about non-squareness and zerosingular values)

4

Computing the SVD iii

In the simplest case:

U = AVΣ−1.

Observe U is orthogonal: (Use: VTATAV = Σ2)

UTU = Σ−1 VTATAV︸︷︷︸Σ2

Σ−1 = Σ−1Σ2Σ−1 = I.

(Similar for UUT .)

5

Demo: Computing the SVD (click to visit)

6

https://relate.cs.illinois.edu/course/cs357-f17/f/demos/upload/svd/Computing the SVD.html

How Expensive is it to Compute the SVD? i

Demo: Relative cost of matrix factorizations (click to visit)

7

https://relate.cs.illinois.edu/course/cs357-f17/f/demos/upload/svd/Relative cost of matrix factorizations.html

‘Reduced’ SVD i

Is there a ‘reduced’ factorization for non-square matrices?

8

‘Reduced’ SVD ii

9

‘Reduced’ SVD iii

Yes:

10

‘Reduced’ SVD iv

• “Full” version shown in black

• “Reduced” version shown in red

11

SVD: Applications

Solve Square Linear Systems i

Can the SVD A = UΣVT be used to solve square linear systems?At what cost (once the SVD is known)?

Yes, easy:

Ax = bUΣVTx = bΣ VTx︸︷︷︸

y:=

= UTb

(diagonal, easy to solve) Σy = UTb

Know y, find x = Vy.

12

Solve Square Linear Systems ii

Cost: O(n2)–but more operations than using fw/bw subst. Evenworse when including comparison of LU vs. SVD.

13

Tall and Skinny Systems i

Consider a ‘tall and skinny’ linear system, i.e. one that hasmore equations than unknowns:

In the figure: m > n. How could we solve that?

14

Tall and Skinny Systems ii

First realization: A square linear system often only has a singlesolution. So applying more conditions to the solution willmean we have no exact solution.

Ax = b ← Not going to happen.

Instead: Find x so that ‖Ax − b‖2 is as small as possible.

r = Ax − b is called the residual of the problem.

‖Ax − b‖22 = r21 + · · ·+ r2m ← squares

This is called a (linear) least-squares problem. Since

Find x so that ‖Ax − b‖2 is as small as possible.

15

Tall and Skinny Systems iii

is too long to write every time, we introduce a shorter notation:

Ax ∼= b.

16

Solving Least-Squares i

How can I actually solve a least-squares problem Ax ∼= b?

The job: Make ‖Ax − b‖2 is as small as possible.

Equivalent: Make ‖Ax − b‖22 is as small as possible.

Use: The SVD A = UΣVT .

17

Solving Least-Squares ii

Find x to minimize:

‖Ax − b‖22=

∥∥UΣVTx − b∥∥22=

∥∥UT(UΣVTx − b)∥∥22 (because U is orthogonal)

=

∥∥∥∥∥∥Σ VTx︸︷︷︸y

−UTb

∥∥∥∥∥∥2

2

=∥∥Σy − UTb∥∥22

18

Solving Least-Squares iii

What y minimizes

∥∥Σy − UTb∥∥22 =∥∥∥∥∥∥∥∥∥∥∥∥

σ1

. . .σk

00

y − z

∥∥∥∥∥∥∥∥∥∥∥∥

2

2

?

Pick

yi ={zi/σi if σi 6= 0,0 if σi = 0.

Find x = Vy, done.

19

Solving Least-Squares iv

Slight technicality: There only is a choice if some of the σi arezero. (Otherwise y is uniquely determined.) If there is a choice,this y is the one with the smallest 2-norm that also minimizesthe 2-norm of the residual. And since ‖x‖2 = ‖y‖2 (because Vis orthogonal), x also has the smallest 2-norm of all x′ forwhich ‖Ax′ − b‖2 is minimal.

20

In-class activity: SVD and Least Squares

21

The Pseudoinverse: A Shortcut for Least Squares i

How could the solution process for Ax ∼= b be with an SVDA =

UΣVT be ‘packaged up’?

UΣVTx ≈ b⇔ x ≈ VΣ−1UTb

Problem: Σ may not be invertible.

Idea 1: Define a ‘pseudo-inverse’ Σ+ of a diagonal matrix Σ as

Σ+i =

{σ−1i if σi 6= 0,0 if σi = 0.

22

The Pseudoinverse: A Shortcut for Least Squares ii

Then Ax ∼= b is solved by VΣ+UTb.

Idea 2: Call A+ = VΣ+UT the pseudo-inverse of A.

Then Ax ∼= b is solved by A+b.

23

The Normal Equations i

You may have learned the ‘normal equations’ ATAx = ATb tosolve Ax ∼= b.Why not use those?

cond(ATA) ≈ cond(A)2

I.e. if A is even somewhat poorly conditioned, then theconditioning of ATA will be a disaster.

The normal equations are not well-behaved numerically.

24

Fitting a Model to Data i

How can I fit a model to measurements? E.g.:

Number of days in polar vortex

Mood

Time t

m

Maybe:m̂(t) = α+ βt + γt2

25

Fitting a Model to Data iiHave: 300 data pooints: (t1,m1), . . . , (t300,m300)

Want: 3 unknowns α, β, γ

Write down equations:

α+ βt1 + γt21 ≈ m1

α+ βt2 + γt22 ≈ m2...

......

α+ βt300 + γt2300 ≈ m300

→

1 t1 t211 t2 t22...

......

1 t300 t2300

α

β

γ

∼=

m1

m2...

m300

.

So data fitting is just like interpolation, with a Vandermondematrix:

Vα = m.

Only difference: More rows. Solvable using the SVD.

26

Demo: Data Fitting with Least Squares (click to visit)

27

https://relate.cs.illinois.edu/course/cs357-f17/f/demos/upload/svd_app/Data Fitting with Least Squares.html

Meaning of the Singular Values i

What do the singular values mean? (in particular thefirst/largest one)

A = UΣVT

‖A‖2 = max‖x‖2=1

‖Ax‖2 = max‖x‖2=1

∥∥UΣVTx∥∥2 U orth.= max‖x‖2=1

∥∥ΣVTx∥∥2V orth.= max

‖VTx‖2=1

∥∥ΣVTx∥∥2 Let y = VTx= max

‖y‖2=1‖Σy‖2

Σ diag.= σ1.

So the SVD (finally) provides a way to find the 2-norm.

28

Meaning of the Singular Values ii

Entertainingly, it does so by reducing the problem to findingthe 2-norm of a diagonal matrix.

‖A‖2 = σ1.

29

Condition Numbers i

How would you compute a 2-norm condition number?

cond2(A) = ‖A‖2∥∥A−1∥∥2 = σ1/σn.

30

SVD as Sum of Outer Products i

What’s another way of writing the SVD?

Starting from (assuming m > n for simplicity)

A = UΣVT =

| |u1 · · · um| |

σ1. . .

σm

0

− v1 −

...− vn −

31

SVD as Sum of Outer Products ii

we find that

A =

| |u1 · · · um| |

− σ1v1 −

...− σnvn −

= σ1u1vT1 + σ2u2vT2 + · · ·+ σnunvTn.

That means: The SVD writes the matrix A as a sum of outerproducts (of left/right singular vectors). What could that begood for?

32

Low-Rank Approximation (I) i

What is the rank of σ1u1vT1 ?

1. (1 linearly independent column!)

What is the rank of σ1u1vT1 + σ2u2vT2?

2. (2 linearly independent–orthogonal–columns!)

Demo: Image compression (click to visit)

33

https://relate.cs.illinois.edu/course/cs357-f17/f/demos/upload/svd_app/Image compression.html

Low-Rank Approximation i

What can we say about the low-rank approximation

Ak = σ1u1vT1 + · · ·+ σkukvTk

toA = σ1u1vT1 + σ2u2vT2 + · · ·+ σnunvTn?

For simplicity, assume σ1 > σ2 > · · · > σn>0.

Observe that Ak has rank k. (And A has rank n.)

Then ‖A− B‖2 among all rank-k (or lower) matrices B isminimized by Ak.

(Eckart-Young Theorem)

34

Low-Rank Approximation iiEven better:

minrankB6k

‖A− B‖2 = ‖A− Ak‖2 = σk+1.

Ak is called the best rank-k approximation to A.

(where k can be any number)

This best-approximation property is what makes the SVDextremely useful in applications and ultimately justifies itshigh cost.

It’s also the rank-k best-approximation in the Frobenius norm:

minrankB6k

‖A− B‖F = ‖A− Ak‖F =√σ2k+1 + · · ·σ

2n.

35

ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean?...

Documents

Transcript of ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean?...