ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean?...
Transcript of ApproximateUndo:SVDandLeast SquaresMeaningoftheSingularValuesi Whatdothesingularvaluesmean?...
Approximate Undo: SVD and LeastSquares
Singular Value Decomposition i
What is the Singular Value Decomposition (‘SVD’)?
The SVD is a factorization of an m× n matrix into
A = UΣVT , where
• U is an m×m orthogonal matrix(Its columns are called ‘left singular vectors’.)
1
Singular Value Decomposition ii
• Σ is an m× n diagonal matrixwith the singular values on the diagonal
Σ =
σ1
. . .σn
0
Convention: σ1 > σ2 > · · · > σn > 0.
• VT is an n× n orthogonal matrix(V’s columns are called ‘right singular vectors’.)
2
Computing the SVD i
How can I compute an SVD of a matrix A?
1. Compute the eigenvalues and eigenvectors of ATA.
ATAv1 = λ1v1 · · · ATAvn = λnvn
2. Make a matrix V from the vectors vi:
V =
| |v1 · · · vn| |
.
(ATA symmetric: V orthogonal if columns have norm 1.)
3
Computing the SVD ii
3. Make a diagonal matrix Σ from the square roots of theeigenvalues:
Σ =
√λ1
. . .√λn 0
4. Find U from
A = UΣVT ⇔ UΣ = AV.
(While being careful about non-squareness and zerosingular values)
4
Computing the SVD iii
In the simplest case:
U = AVΣ−1.
Observe U is orthogonal: (Use: VTATAV = Σ2)
UTU = Σ−1 VTATAV︸ ︷︷ ︸Σ2
Σ−1 = Σ−1Σ2Σ−1 = I.
(Similar for UUT .)
5
Demo: Computing the SVD (click to visit)
6
How Expensive is it to Compute the SVD? i
Demo: Relative cost of matrix factorizations (click to visit)
7
‘Reduced’ SVD i
Is there a ‘reduced’ factorization for non-square matrices?
8
‘Reduced’ SVD ii
9
‘Reduced’ SVD iii
Yes:
10
‘Reduced’ SVD iv
• “Full” version shown in black
• “Reduced” version shown in red
11
SVD: Applications
Solve Square Linear Systems i
Can the SVD A = UΣVT be used to solve square linear systems?At what cost (once the SVD is known)?
Yes, easy:
Ax = bUΣVTx = bΣ VTx︸︷︷︸
y:=
= UTb
(diagonal, easy to solve) Σy = UTb
Know y, find x = Vy.
12
Solve Square Linear Systems ii
Cost: O(n2)–but more operations than using fw/bw subst. Evenworse when including comparison of LU vs. SVD.
13
Tall and Skinny Systems i
Consider a ‘tall and skinny’ linear system, i.e. one that hasmore equations than unknowns:
In the figure: m > n. How could we solve that?
14
Tall and Skinny Systems ii
First realization: A square linear system often only has a singlesolution. So applying more conditions to the solution willmean we have no exact solution.
Ax = b ← Not going to happen.
Instead: Find x so that ‖Ax − b‖2 is as small as possible.
r = Ax − b is called the residual of the problem.
‖Ax − b‖22 = r21 + · · ·+ r2m ← squares
This is called a (linear) least-squares problem. Since
Find x so that ‖Ax − b‖2 is as small as possible.
15
Tall and Skinny Systems iii
is too long to write every time, we introduce a shorter notation:
Ax ∼= b.
16
Solving Least-Squares i
How can I actually solve a least-squares problem Ax ∼= b?
The job: Make ‖Ax − b‖2 is as small as possible.
Equivalent: Make ‖Ax − b‖22 is as small as possible.
Use: The SVD A = UΣVT .
17
Solving Least-Squares ii
Find x to minimize:
‖Ax − b‖22=
∥∥UΣVTx − b∥∥22=
∥∥UT(UΣVTx − b)∥∥22 (because U is orthogonal)
=
∥∥∥∥∥∥Σ VTx︸︷︷︸y
−UTb
∥∥∥∥∥∥2
2
=∥∥Σy − UTb∥∥22
18
Solving Least-Squares iii
What y minimizes
∥∥Σy − UTb∥∥22 =∥∥∥∥∥∥∥∥∥∥∥∥
σ1
. . .σk
00
y − z
∥∥∥∥∥∥∥∥∥∥∥∥
2
2
?
Pick
yi ={zi/σi if σi 6= 0,0 if σi = 0.
Find x = Vy, done.
19
Solving Least-Squares iv
Slight technicality: There only is a choice if some of the σi arezero. (Otherwise y is uniquely determined.) If there is a choice,this y is the one with the smallest 2-norm that also minimizesthe 2-norm of the residual. And since ‖x‖2 = ‖y‖2 (because Vis orthogonal), x also has the smallest 2-norm of all x′ forwhich ‖Ax′ − b‖2 is minimal.
20
In-class activity: SVD and Least Squares
21
The Pseudoinverse: A Shortcut for Least Squares i
How could the solution process for Ax ∼= b be with an SVDA =
UΣVT be ‘packaged up’?
UΣVTx ≈ b⇔ x ≈ VΣ−1UTb
Problem: Σ may not be invertible.
Idea 1: Define a ‘pseudo-inverse’ Σ+ of a diagonal matrix Σ as
Σ+i =
{σ−1i if σi 6= 0,0 if σi = 0.
22
The Pseudoinverse: A Shortcut for Least Squares ii
Then Ax ∼= b is solved by VΣ+UTb.
Idea 2: Call A+ = VΣ+UT the pseudo-inverse of A.
Then Ax ∼= b is solved by A+b.
23
The Normal Equations i
You may have learned the ‘normal equations’ ATAx = ATb tosolve Ax ∼= b.Why not use those?
cond(ATA) ≈ cond(A)2
I.e. if A is even somewhat poorly conditioned, then theconditioning of ATA will be a disaster.
The normal equations are not well-behaved numerically.
24
Fitting a Model to Data i
How can I fit a model to measurements? E.g.:
Number of days in polar vortex
Mood
Time t
m
Maybe:m̂(t) = α+ βt + γt2
25
Fitting a Model to Data iiHave: 300 data pooints: (t1,m1), . . . , (t300,m300)
Want: 3 unknowns α, β, γ
Write down equations:
α+ βt1 + γt21 ≈ m1
α+ βt2 + γt22 ≈ m2...
......
α+ βt300 + γt2300 ≈ m300
→
1 t1 t211 t2 t22...
......
1 t300 t2300
α
β
γ
∼=
m1
m2...
m300
.
So data fitting is just like interpolation, with a Vandermondematrix:
Vα = m.
Only difference: More rows. Solvable using the SVD.
26
Demo: Data Fitting with Least Squares (click to visit)
27
Meaning of the Singular Values i
What do the singular values mean? (in particular thefirst/largest one)
A = UΣVT
‖A‖2 = max‖x‖2=1
‖Ax‖2 = max‖x‖2=1
∥∥UΣVTx∥∥2 U orth.= max‖x‖2=1
∥∥ΣVTx∥∥2V orth.= max
‖VTx‖2=1
∥∥ΣVTx∥∥2 Let y = VTx= max
‖y‖2=1‖Σy‖2
Σ diag.= σ1.
So the SVD (finally) provides a way to find the 2-norm.
28
Meaning of the Singular Values ii
Entertainingly, it does so by reducing the problem to findingthe 2-norm of a diagonal matrix.
‖A‖2 = σ1.
29
Condition Numbers i
How would you compute a 2-norm condition number?
cond2(A) = ‖A‖2∥∥A−1∥∥2 = σ1/σn.
30
SVD as Sum of Outer Products i
What’s another way of writing the SVD?
Starting from (assuming m > n for simplicity)
A = UΣVT =
| |u1 · · · um| |
σ1. . .
σm
0
− v1 −
...− vn −
31
SVD as Sum of Outer Products ii
we find that
A =
| |u1 · · · um| |
− σ1v1 −
...− σnvn −
= σ1u1vT1 + σ2u2vT2 + · · ·+ σnunvTn.
That means: The SVD writes the matrix A as a sum of outerproducts (of left/right singular vectors). What could that begood for?
32
Low-Rank Approximation (I) i
What is the rank of σ1u1vT1 ?
1. (1 linearly independent column!)
What is the rank of σ1u1vT1 + σ2u2vT2?
2. (2 linearly independent–orthogonal–columns!)
Demo: Image compression (click to visit)
33
Low-Rank Approximation i
What can we say about the low-rank approximation
Ak = σ1u1vT1 + · · ·+ σkukvTk
toA = σ1u1vT1 + σ2u2vT2 + · · ·+ σnunvTn?
For simplicity, assume σ1 > σ2 > · · · > σn>0.
Observe that Ak has rank k. (And A has rank n.)
Then ‖A− B‖2 among all rank-k (or lower) matrices B isminimized by Ak.
(Eckart-Young Theorem)
34
Low-Rank Approximation iiEven better:
minrankB6k
‖A− B‖2 = ‖A− Ak‖2 = σk+1.
Ak is called the best rank-k approximation to A.
(where k can be any number)
This best-approximation property is what makes the SVDextremely useful in applications and ultimately justifies itshigh cost.
It’s also the rank-k best-approximation in the Frobenius norm:
minrankB6k
‖A− B‖F = ‖A− Ak‖F =√σ2k+1 + · · ·σ
2n.
35