Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less...

282
Fast RIP matrices with fewer rows Jelani Nelson Eric Price Mary Wootters Princeton MIT Michigan 2013-04-05 σ large σ small Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 1 / 52

Transcript of Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less...

Page 1: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Fast RIP matrices with fewer rows

Jelani Nelson Eric Price Mary Wootters

Princeton MIT Michigan

2013-04-05

σlarge σsmall

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 1 / 52

Page 2: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Outline

1 IntroductionCompressive sensingJohnson Lindenstrauss TransformsOur result

2 Concentration of measure: a toolboxOverviewSymmetrizationGaussian ProcessesLipschitz Concentration

3 ProofOverviewCovering Number

4 Conclusion

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 2 / 52

Page 3: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Outline

1 IntroductionCompressive sensingJohnson Lindenstrauss TransformsOur result

2 Concentration of measure: a toolboxOverviewSymmetrizationGaussian ProcessesLipschitz Concentration

3 ProofOverviewCovering Number

4 Conclusion

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 2 / 52

Page 4: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Outline

1 IntroductionCompressive sensingJohnson Lindenstrauss TransformsOur result

2 Concentration of measure: a toolboxOverviewSymmetrizationGaussian ProcessesLipschitz Concentration

3 ProofOverviewCovering Number

4 Conclusion

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 2 / 52

Page 5: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Outline

1 IntroductionCompressive sensingJohnson Lindenstrauss TransformsOur result

2 Concentration of measure: a toolboxOverviewSymmetrizationGaussian ProcessesLipschitz Concentration

3 ProofOverviewCovering Number

4 Conclusion

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 2 / 52

Page 6: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Outline

1 IntroductionCompressive sensingJohnson Lindenstrauss TransformsOur result

2 Concentration of measure: a toolboxOverviewSymmetrizationGaussian ProcessesLipschitz Concentration

3 ProofOverviewCovering Number

4 Conclusion

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 3 / 52

Page 7: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Compressive Sensing

Given: A few linear measurements of an (approximately) k -sparsevector x ∈ Rn.Goal: Recover x (approximately).

x

=

y

Φm

n

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 4 / 52

Page 8: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Compressive Sensing Algorithms: Two Classesx

=

y

Φm

n

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 5 / 52

Page 9: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Compressive Sensing Algorithms: Two Classesx

=

y

Φm

n

Recovery algorithmtied to matrix structure

(e.g. Count-Sketch)

Structure-aware

FasterSparse matricesLess robust

Often:

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 5 / 52

Page 10: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Compressive Sensing Algorithms: Two Classesx

=

y

Φm

n

Recovery algorithmtied to matrix structure

(e.g. Count-Sketch)

Structure-aware

FasterSparse matricesLess robust

Often:

Recovery algorithmsjust multiply by Φ,ΦT

(e.g. L1 minimization)

Structure-oblivious

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 5 / 52

Page 11: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Compressive Sensing Algorithms: Two Classesx

=

y

Φm

n

Recovery algorithmtied to matrix structure

(e.g. Count-Sketch)

Structure-aware

FasterSparse matricesLess robust

Often:

Recovery algorithmsjust multiply by Φ,ΦT

(e.g. L1 minimization)

Structure-oblivious

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 5 / 52

Page 12: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Compressive Sensing Algorithms: Two Classesx

=

y

Φm

n

Recovery algorithmtied to matrix structure

(e.g. Count-Sketch)

Structure-aware

FasterSparse matricesLess robust

Often:

Recovery algorithmsjust multiply by Φ,ΦT

(e.g. L1 minimization)

Structure-oblivious

SlowerDense matricesMore robust

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 5 / 52

Page 13: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Compressive Sensing Algorithms: Two Classesx

=

y

Φm

n

Recovery algorithmtied to matrix structure

(e.g. Count-Sketch)

Structure-aware

FasterSparse matricesLess robust

Often:

Recovery algorithmsjust multiply by Φ,ΦT

(e.g. L1 minimization)

Structure-oblivious

SlowerDense matricesMore robust

Yesterday:Fourier→→→ sparse

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 5 / 52

Page 14: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Compressive Sensing Algorithms: Two Classesx

=

y

Φm

n

Recovery algorithmtied to matrix structure

(e.g. Count-Sketch)

Structure-aware

FasterSparse matricesLess robust

Often:

Recovery algorithmsjust multiply by Φ,ΦT

(e.g. L1 minimization)

Structure-oblivious

SlowerDense matricesMore robust

Yesterday:Fourier→→→ sparse

Today

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 5 / 52

Page 15: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Algorithms for compressive sensing

Goal: recover approximately k -sparse x from y = Φx .

A lot of people use convex optimization:

min ‖x‖1s.t. Φx = y

Also Iterative Hard Thresholding, CoSaMP, OMP, StOMP, ROMP....For all of these:

I the time it takes to multiply by Φ or ΦT is the bottleneck.I the Restricted Isometry Property is a sufficient condition.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 6 / 52

Page 16: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Algorithms for compressive sensing

Goal: recover approximately k -sparse x from y = Φx .A lot of people use convex optimization:

min ‖x‖1s.t. Φx = y

Also Iterative Hard Thresholding, CoSaMP, OMP, StOMP, ROMP....For all of these:

I the time it takes to multiply by Φ or ΦT is the bottleneck.I the Restricted Isometry Property is a sufficient condition.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 6 / 52

Page 17: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Algorithms for compressive sensing

Goal: recover approximately k -sparse x from y = Φx .A lot of people use convex optimization:

min ‖x‖1s.t. Φx = y

Also Iterative Hard Thresholding, CoSaMP, OMP, StOMP, ROMP....

For all of these:

I the time it takes to multiply by Φ or ΦT is the bottleneck.I the Restricted Isometry Property is a sufficient condition.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 6 / 52

Page 18: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Algorithms for compressive sensing

Goal: recover approximately k -sparse x from y = Φx .A lot of people use convex optimization:

min ‖x‖1s.t. Φx = y

Also Iterative Hard Thresholding, CoSaMP, OMP, StOMP, ROMP....For all of these:

I the time it takes to multiply by Φ or ΦT is the bottleneck.

I the Restricted Isometry Property is a sufficient condition.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 6 / 52

Page 19: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Algorithms for compressive sensing

Goal: recover approximately k -sparse x from y = Φx .A lot of people use convex optimization:

min ‖x‖1s.t. Φx = y

Also Iterative Hard Thresholding, CoSaMP, OMP, StOMP, ROMP....For all of these:

I the time it takes to multiply by Φ or ΦT is the bottleneck.I the Restricted Isometry Property is a sufficient condition.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 6 / 52

Page 20: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Restricted Isometry Property (RIP)

Φ

k

All of these submatricesare well conditioned.

(1− ε)‖x‖22 ≤ ‖Φx‖22 ≤ (1 + ε)‖x‖22

for all k -sparse x ∈ Rn.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 7 / 52

Page 21: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Restricted Isometry Property (RIP)

Φ

k

All of these submatricesare well conditioned.

(1− ε)‖x‖22 ≤ ‖Φx‖22 ≤ (1 + ε)‖x‖22

for all k -sparse x ∈ Rn.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 7 / 52

Page 22: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Goals

What properties should an RIP matrix have?

Good compression: m small

I Random Gaussian matrix: Θ(k log (n/k)) rows.

Fast multiplication:

I Reconstruction dominated by log n multiplications by Φ,ΦT .I Random Gaussian matrix: Θ(nk log n) time.

Goal: an RIP matrix with O(n log n) multiplication and small m.

* Talk will assume n0.1 < k < n0.9, so log k ' log n ' log(n/k).

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 8 / 52

Page 23: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Goals

What properties should an RIP matrix have?

Good compression: m smallI Random Gaussian matrix: Θ(k log (n/k)) rows.

Fast multiplication:

I Reconstruction dominated by log n multiplications by Φ,ΦT .I Random Gaussian matrix: Θ(nk log n) time.

Goal: an RIP matrix with O(n log n) multiplication and small m.

* Talk will assume n0.1 < k < n0.9, so log k ' log n ' log(n/k).

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 8 / 52

Page 24: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Goals

What properties should an RIP matrix have?

Good compression: m smallI Random Gaussian matrix: Θ(k log n) rows.

Fast multiplication:

I Reconstruction dominated by log n multiplications by Φ,ΦT .I Random Gaussian matrix: Θ(nk log n) time.

Goal: an RIP matrix with O(n log n) multiplication and small m.

* Talk will assume n0.1 < k < n0.9, so log k ' log n ' log(n/k).

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 8 / 52

Page 25: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Goals

What properties should an RIP matrix have?

Good compression: m smallI Random Gaussian matrix: Θ(k log n) rows.

Fast multiplication:I Reconstruction dominated by log n multiplications by Φ,ΦT .

I Random Gaussian matrix: Θ(nk log n) time.

Goal: an RIP matrix with O(n log n) multiplication and small m.

* Talk will assume n0.1 < k < n0.9, so log k ' log n ' log(n/k).

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 8 / 52

Page 26: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Goals

What properties should an RIP matrix have?

Good compression: m smallI Random Gaussian matrix: Θ(k log n) rows.

Fast multiplication:I Reconstruction dominated by log n multiplications by Φ,ΦT .I Random Gaussian matrix: Θ(nk log n) time.

Goal: an RIP matrix with O(n log n) multiplication and small m.

* Talk will assume n0.1 < k < n0.9, so log k ' log n ' log(n/k).

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 8 / 52

Page 27: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Goals

What properties should an RIP matrix have?

Good compression: m smallI Random Gaussian matrix: Θ(k log n) rows.

Fast multiplication:I Reconstruction dominated by log n multiplications by Φ,ΦT .I Random Gaussian matrix: Θ(nk log n) time.

Goal: an RIP matrix with O(n log n) multiplication and small m.

* Talk will assume n0.1 < k < n0.9, so log k ' log n ' log(n/k).

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 8 / 52

Page 28: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

An open question

A = Fm

n

Let A contain random rows from a Fourier matrix.

You can multiply by A in O(n log n) time.How many rows do you need to ensure that A has the RIP?

m = O(k log4 n) [CT06,RV08,CGV13].

Ideal:m = O(k log n).

(Related: how about partial circulant matrices?)m = O(k log4 n) [RRT12,KMR12].

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 9 / 52

Page 29: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

An open question

A = Fm

n

Let A contain random rows from a Fourier matrix.You can multiply by A in O(n log n) time.

How many rows do you need to ensure that A has the RIP?

m = O(k log4 n) [CT06,RV08,CGV13].

Ideal:m = O(k log n).

(Related: how about partial circulant matrices?)m = O(k log4 n) [RRT12,KMR12].

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 9 / 52

Page 30: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

An open question

A = Fm

n

Let A contain random rows from a Fourier matrix.You can multiply by A in O(n log n) time.How many rows do you need to ensure that A has the RIP?

m = O(k log4 n) [CT06,RV08,CGV13].

Ideal:m = O(k log n).

(Related: how about partial circulant matrices?)m = O(k log4 n) [RRT12,KMR12].

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 9 / 52

Page 31: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

An open question

A = Fm

n

Let A contain random rows from a Fourier matrix.You can multiply by A in O(n log n) time.How many rows do you need to ensure that A has the RIP?

m = O(k log4 n) [CT06,RV08,CGV13].

Ideal:m = O(k log n).

(Related: how about partial circulant matrices?)m = O(k log4 n) [RRT12,KMR12].

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 9 / 52

Page 32: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

An open question

A = Fm

n

Let A contain random rows from a Fourier matrix.You can multiply by A in O(n log n) time.How many rows do you need to ensure that A has the RIP?

m = O(k log4 n) [CT06,RV08,CGV13].

Ideal:m = O(k log n).

(Related: how about partial circulant matrices?)m = O(k log4 n) [RRT12,KMR12].

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 9 / 52

Page 33: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

An open question

A = Fm

n

Let A contain random rows from a Fourier matrix.You can multiply by A in O(n log n) time.How many rows do you need to ensure that A has the RIP?

m = O(k log4 n) [CT06,RV08,CGV13].

Ideal:m = O(k log n).

(Related: how about partial circulant matrices?)m = O(k log4 n) [RRT12,KMR12].

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 9 / 52

Page 34: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Outline

1 IntroductionCompressive sensingJohnson Lindenstrauss TransformsOur result

2 Concentration of measure: a toolboxOverviewSymmetrizationGaussian ProcessesLipschitz Concentration

3 ProofOverviewCovering Number

4 Conclusion

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 10 / 52

Page 35: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Another motivation:Johnson Lindenstrauss (JL) Transforms

High dimensional dataS ⊂ Rn

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 11 / 52

Page 36: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Another motivation:Johnson Lindenstrauss (JL) Transforms

High dimensional dataS ⊂ Rn

Linear map Φ

Low dimensional sketchΦ(S) ∈ Rm

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 11 / 52

Page 37: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Another motivation:Johnson Lindenstrauss (JL) Transforms

High dimensional dataS ⊂ Rn

Linear map Φ

Low dimensional sketchΦ(S) ∈ Rm

Φ preserves the geometry of S

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 11 / 52

Page 38: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Another motivation:Johnson Lindenstrauss (JL) Transforms

High dimensional dataS ⊂ Rn

Linear map Φ

Low dimensional sketchΦ(S) ∈ Rm

Φ preserves the geometry of S

(1− ε)‖x‖2 ≤ ‖Φx‖2 ≤ (1 + ε)‖x‖2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 11 / 52

Page 39: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Another motivation:Johnson Lindenstrauss (JL) Transforms

High dimensional dataS ⊂ Rn

Linear map Φ

Low dimensional sketchΦ(S) ∈ Rm

Φ preserves the geometry of S

(1− ε)‖x‖2 ≤ ‖Φx‖2 ≤ (1 + ε)‖x‖2〈Φx ,Φy〉 = 〈x , y〉 ± ε‖x‖2‖y‖2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 11 / 52

Page 40: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Johnson-Lindenstrauss Lemma

Theorem (variant of Johnson-Lindenstrauss ’84)Let x ∈ Rn. A random Gaussian matrix Φ will have

(1− ε)‖x‖2 ≤ ‖Φx‖2 ≤ (1 + ε)‖x‖2

with probability 1− δ, so long as

m &1ε2

log(1/δ)

Set δ = 1/2k : embed 2k points into O(k) dimensions.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 12 / 52

Page 41: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Johnson-Lindenstrauss Lemma

Theorem (variant of Johnson-Lindenstrauss ’84)Let x ∈ Rn. A random Gaussian matrix Φ will have

(1− ε)‖x‖2 ≤ ‖Φx‖2 ≤ (1 + ε)‖x‖2

with probability 1− δ, so long as

m &1ε2

log(1/δ)

Set δ = 1/2k : embed 2k points into O(k) dimensions.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 12 / 52

Page 42: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

What do we want in a JL matrix?

Target dimension should be small (close to 1ε2

k for 2k points).Fast multiplication.

I Approximate numerical algebra problems (e.g., linear regression,low-rank approximation)

I k -means clustering

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 13 / 52

Page 43: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

What do we want in a JL matrix?

Target dimension should be small (close to 1ε2

k for 2k points).

Fast multiplication.I Approximate numerical algebra problems (e.g., linear regression,

low-rank approximation)I k -means clustering

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 13 / 52

Page 44: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

What do we want in a JL matrix?

Target dimension should be small (close to 1ε2

k for 2k points).Fast multiplication.

I Approximate numerical algebra problems (e.g., linear regression,low-rank approximation)

I k -means clustering

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 13 / 52

Page 45: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

How do we get a JL matrix?

Gaussians

I Dimension O( 1ε2 k).

I 1ε2 nk multiplication time.

Best way known for fast JL: by [Krahmer-Ward ’11], RIP⇒ JL.

I Existing results: dimension O( 1ε2 k log4 n).

I n log n multiplication time.

And by [BDDW ’08], JL⇒ RIP; so equivalent.

1

1Round trip loses log n factor in dimension

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 14 / 52

Page 46: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

How do we get a JL matrix?

GaussiansI Dimension O( 1

ε2 k).

I 1ε2 nk multiplication time.

Best way known for fast JL: by [Krahmer-Ward ’11], RIP⇒ JL.

I Existing results: dimension O( 1ε2 k log4 n).

I n log n multiplication time.

And by [BDDW ’08], JL⇒ RIP; so equivalent.

1

1Round trip loses log n factor in dimension

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 14 / 52

Page 47: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

How do we get a JL matrix?

GaussiansI Dimension O( 1

ε2 k).I 1

ε2 nk multiplication time.

Best way known for fast JL: by [Krahmer-Ward ’11], RIP⇒ JL.

I Existing results: dimension O( 1ε2 k log4 n).

I n log n multiplication time.

And by [BDDW ’08], JL⇒ RIP; so equivalent.

1

1Round trip loses log n factor in dimension

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 14 / 52

Page 48: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

How do we get a JL matrix?

GaussiansI Dimension O( 1

ε2 k).I 1

ε2 nk multiplication time.Best way known for fast JL: by [Krahmer-Ward ’11], RIP⇒ JL.

I Existing results: dimension O( 1ε2 k log4 n).

I n log n multiplication time.

And by [BDDW ’08], JL⇒ RIP; so equivalent.

1

1Round trip loses log n factor in dimension

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 14 / 52

Page 49: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

How do we get a JL matrix?

GaussiansI Dimension O( 1

ε2 k).I 1

ε2 nk multiplication time.Best way known for fast JL: by [Krahmer-Ward ’11], RIP⇒ JL.

I Existing results: dimension O( 1ε2 k log4 n).

I n log n multiplication time.

And by [BDDW ’08], JL⇒ RIP; so equivalent.

1

1Round trip loses log n factor in dimension

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 14 / 52

Page 50: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

How do we get a JL matrix?

GaussiansI Dimension O( 1

ε2 k).I 1

ε2 nk multiplication time.Best way known for fast JL: by [Krahmer-Ward ’11], RIP⇒ JL.

I Existing results: dimension O( 1ε2 k log4 n).

I n log n multiplication time.

And by [BDDW ’08], JL⇒ RIP; so equivalent.

1

1Round trip loses log n factor in dimension

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 14 / 52

Page 51: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

How do we get a JL matrix?

GaussiansI Dimension O( 1

ε2 k).I 1

ε2 nk multiplication time.Best way known for fast JL: by [Krahmer-Ward ’11], RIP⇒ JL.

I Existing results: dimension O( 1ε2 k log4 n).

I n log n multiplication time.

And by [BDDW ’08], JL⇒ RIP; so equivalent.1

1Round trip loses log n factor in dimensionEric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 14 / 52

Page 52: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Outline

1 IntroductionCompressive sensingJohnson Lindenstrauss TransformsOur result

2 Concentration of measure: a toolboxOverviewSymmetrizationGaussian ProcessesLipschitz Concentration

3 ProofOverviewCovering Number

4 Conclusion

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 15 / 52

Page 53: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Our result: a fast RIP matrix with fewer rows

H A = Φ

Random sign flipsSubsampled Fourier

B

m

New construction of fast RIP matrices: sparse times Fourier.k log3 n rows and n log n multiplication time.

TheoremIf m ' k log3 n, B ' logc n, and A is a random partial Fourier matrix,then Φ has the RIP with probability at least 2/3.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 16 / 52

Page 54: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

GeneralizationOur approach is actually works for more general A:

H A = Φ

Random sign flipsSubsampled Fourier

B

m

Am

A2

A1

...

If A is a “decent” RIP matrix:

A has RIP (whp), but too many (mB) rows.RIP-ness degrades “gracefully” as number of rows decreases:

I For all Ai the RIP constant, although 1, is still controlled.

Then Φ is a good RIP matrix:Φ has the RIP (whp) with m = O(k log3 n) rows.Time to multiply by Φ = time to multiply by A + mB.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 17 / 52

Page 55: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

GeneralizationOur approach is actually works for more general A:

H A = Φ

Random sign flipsSubsampled Fourier

B

m

Am

A2

A1

...

If A is a “decent” RIP matrix:A has RIP (whp), but too many (mB) rows.

RIP-ness degrades “gracefully” as number of rows decreases:

I For all Ai the RIP constant, although 1, is still controlled.

Then Φ is a good RIP matrix:Φ has the RIP (whp) with m = O(k log3 n) rows.Time to multiply by Φ = time to multiply by A + mB.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 17 / 52

Page 56: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

GeneralizationOur approach is actually works for more general A:

H A = Φ

Random sign flipsSubsampled Fourier

B

m

Am

A2

A1

...

If A is a “decent” RIP matrix:A has RIP (whp), but too many (mB) rows.RIP-ness degrades “gracefully” as number of rows decreases:

I For all Ai the RIP constant, although 1, is still controlled.

Then Φ is a good RIP matrix:Φ has the RIP (whp) with m = O(k log3 n) rows.Time to multiply by Φ = time to multiply by A + mB.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 17 / 52

Page 57: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

GeneralizationOur approach is actually works for more general A:

H A = Φ

Random sign flipsSubsampled Fourier

B

m

Am

A2

A1

...

If A is a “decent” RIP matrix:A has RIP (whp), but too many (mB) rows.RIP-ness degrades “gracefully” as number of rows decreases:

I For all Ai the RIP constant, although 1, is still controlled.

Then Φ is a good RIP matrix:Φ has the RIP (whp) with m = O(k log3 n) rows.Time to multiply by Φ = time to multiply by A + mB.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 17 / 52

Page 58: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

GeneralizationOur approach is actually works for more general A:

H A = Φ

Random sign flipsSubsampled Fourier

B

m

Am

A2

A1

...

If A is a “decent” RIP matrix:A has RIP (whp), but too many (mB) rows.RIP-ness degrades “gracefully” as number of rows decreases:

I For all Ai the RIP constant, although 1, is still controlled.

Then Φ is a good RIP matrix:Φ has the RIP (whp) with m = O(k log3 n) rows.Time to multiply by Φ = time to multiply by A + mB.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 17 / 52

Page 59: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Results in the areaConstruction Measurements m Multiplication Time

Sparse JL matrices [KN12]1ε2

k log n εmn

Partial Fourier [RV08,CGV13]1ε2

k log4 n n log n

Partial Circulant [KMR12]1ε2

k log4 n n log n

Our result: Hash of partial Fourier1ε2

k log3 n n log n

Our result: Hash of partial circulant1ε2

k log3 n n log n

Iterated Fourier [AC06,AL09,AR13]1ε2

k log n† n log n

† Requires k ≤ n1/2−δ. This is the “easy” case:

n k log4 n

n log n

Dimension:

Time:

k log n

k2 log5 n

Gaussian[RV08]

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 18 / 52

Page 60: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Results in the areaConstruction Measurements m Multiplication Time

Sparse JL matrices [KN12]1ε2

k log n εmn

Partial Fourier [RV08,CGV13]1ε2

k log4 n n log n

Partial Circulant [KMR12]1ε2

k log4 n n log n

Our result: Hash of partial Fourier1ε2

k log3 n n log n

Our result: Hash of partial circulant1ε2

k log3 n n log n

Iterated Fourier [AC06,AL09,AR13]1ε2

k log n† n log n

† Requires k ≤ n1/2−δ. This is the “easy” case:

n k log4 n

n log n

Dimension:

Time:

k log n

k2 log5 n

Gaussian[RV08]

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 18 / 52

Page 61: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Results in the areaConstruction Measurements m Multiplication Time

Sparse JL matrices [KN12]1ε2

k log n εmn

Partial Fourier [RV08,CGV13]1ε2

k log4 n n log n

Partial Circulant [KMR12]1ε2

k log4 n n log n

Our result: Hash of partial Fourier1ε2

k log3 n n log n

Our result: Hash of partial circulant1ε2

k log3 n n log n

Iterated Fourier [AC06,AL09,AR13]1ε2

k log n† n log n

† Requires k ≤ n1/2−δ. This is the “easy” case:

n k log4 n

n log n

Dimension:

Time:

k log n

k2 log5 n

Gaussian[RV08]

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 18 / 52

Page 62: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Results in the areaConstruction Measurements m Multiplication Time

Sparse JL matrices [KN12]1ε2

k log n εmn

Partial Fourier [RV08,CGV13]1ε2

k log4 n n log n

Partial Circulant [KMR12]1ε2

k log4 n n log n

Our result: Hash of partial Fourier1ε2

k log3 n n log n

Our result: Hash of partial circulant1ε2

k log3 n n log n

Iterated Fourier [AC06,AL09,AR13]1ε2

k log n† n log n

† Requires k ≤ n1/2−δ. This is the “easy” case:

n k log4 n

n log n

Dimension:

Time:

k log n

k2 log5 n

Gaussian[RV08]

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 18 / 52

Page 63: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Concentration of Measure

Let Σk is unit-norm k -sparse vectors.We want to show for our distribution Φ on matrices that

E supx∈Σk

∣∣∣‖Φx‖22 − ‖x‖22∣∣∣ < ε,

(Expectation of *) = *

Expected deviation of ΦT Φ from mean In, in a funny norm.

Probabilists have lots of tools to analyze this.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 19 / 52

Page 64: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Concentration of Measure

Let Σk is unit-norm k -sparse vectors.We want to show for our distribution Φ on matrices that

E supx∈Σk

∣∣∣‖Φx‖22 − ‖x‖22∣∣∣ < ε,

(Expectation of *) = *

Expected deviation of ΦT Φ from mean In, in a funny norm.

Probabilists have lots of tools to analyze this.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 19 / 52

Page 65: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Concentration of Measure

Let Σk is unit-norm k -sparse vectors.We want to show for our distribution Φ on matrices that

E supx∈Σk

∣∣∣‖Φx‖22 − ‖x‖22∣∣∣ < ε,

(Expectation of *) = *

Expected deviation of ΦT Φ from mean In, in a funny norm.

Probabilists have lots of tools to analyze this.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 19 / 52

Page 66: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Concentration of Measure

Let Σk is unit-norm k -sparse vectors.We want to show for our distribution Φ on matrices that

E supx∈Σk

∣∣∣‖Φx‖22 − ‖x‖22∣∣∣ < ε,

(Expectation of *) = *

Expected deviation of ΦT Φ from mean In, in a funny norm.

Probabilists have lots of tools to analyze this.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 19 / 52

Page 67: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Outline

1 IntroductionCompressive sensingJohnson Lindenstrauss TransformsOur result

2 Concentration of measure: a toolboxOverviewSymmetrizationGaussian ProcessesLipschitz Concentration

3 ProofOverviewCovering Number

4 Conclusion

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 20 / 52

Page 68: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Tools

Screwdriver

Drill

Bit sets

Bit

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 21 / 52

Page 69: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Tools

Screwdriver

Drill

Bit sets

Bit

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 21 / 52

Page 70: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Tools

Screwdriver

Drill

Bit sets

Bit

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 21 / 52

Page 71: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Tools

Screwdriver

Drill

Bit sets

Bit

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 21 / 52

Page 72: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Tools

Screwdriver

Drill

Bit sets

Bit

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 21 / 52

Page 73: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Tools

Screwdriver

Drill

Bit sets

Bit

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 21 / 52

Page 74: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Tools

Common interface: m drivers, n bits =⇒ mn combinations.

Hex shanks

Common interfacefor drill bits

Gaussians

Common interfacefor probability

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 22 / 52

Page 75: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Tools

Common interface: m drivers, n bits =⇒ mn combinations.

Hex shanks

Common interfacefor drill bits

Gaussians

Common interfacefor probability

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 22 / 52

Page 76: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Tools

Common interface: m drivers, n bits =⇒ mn combinations.

Hex shanks

Common interfacefor drill bits

Gaussians

Common interfacefor probability

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 22 / 52

Page 77: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

A Probabilist’s Toolbox

Convert to Gaussians

Symmetrization

Subgaussians

Berry-Esseen

Gaussian concentration

Hoeffding bound

Dudley’s entropy integral

Lipschitz concentration

Gaussian

Will prove: symmetrization and Dudley’s entropy integral.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 23 / 52

Page 78: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

A Probabilist’s Toolbox

Convert to Gaussians

Symmetrization

Subgaussians

Berry-Esseen

Gaussian concentration

Hoeffding bound

Dudley’s entropy integral

Lipschitz concentration

Gaussian

Will prove: symmetrization and Dudley’s entropy integral.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 23 / 52

Page 79: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Outline

1 IntroductionCompressive sensingJohnson Lindenstrauss TransformsOur result

2 Concentration of measure: a toolboxOverviewSymmetrizationGaussian ProcessesLipschitz Concentration

3 ProofOverviewCovering Number

4 Conclusion

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 24 / 52

Page 80: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

SymmetrizationLemma (Symmetrization)Suppose X1, . . . ,Xt are i.i.d. with mean µ. For any norm ‖·‖,

E

[∥∥∥∥∥1t

∑i

Xi − µ

∥∥∥∥∥]≤ 2E

[∥∥∥∥∥1t

∑i

siXi

∥∥∥∥∥]

≤ 3E

[∥∥∥∥∥1t

∑i

giXi

∥∥∥∥∥]

where si ∈ ±1 independently.

How well does X concentrate about its mean?

Example (RIP)For some norm ‖·‖, RIP constant of subsampled Fourier

‖AT A− I‖ = ‖∑

ATi Ai − I‖.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 25 / 52

Page 81: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

SymmetrizationLemma (Symmetrization)Suppose X1, . . . ,Xt are i.i.d. with mean µ. For any norm ‖·‖,

E

[∥∥∥∥∥1t

∑i

Xi − µ

∥∥∥∥∥]≤ 2E

[∥∥∥∥∥1t

∑i

siXi

∥∥∥∥∥]

≤ 3E

[∥∥∥∥∥1t

∑i

giXi

∥∥∥∥∥]

where si ∈ ±1 independently.

How well does X concentrate about its mean?

Example (RIP)For some norm ‖·‖, RIP constant of subsampled Fourier

‖AT A− I‖ = ‖∑

ATi Ai − I‖.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 25 / 52

Page 82: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

SymmetrizationLemma (Symmetrization)Suppose X1, . . . ,Xt are i.i.d. with mean µ. For any norm ‖·‖,

E

[∥∥∥∥∥1t

∑i

Xi − µ

∥∥∥∥∥]≤ 2E

[∥∥∥∥∥1t

∑i

siXi

∥∥∥∥∥]

≤ 3E

[∥∥∥∥∥1t

∑i

giXi

∥∥∥∥∥]

where si ∈ ±1 independently.

How well does X concentrate about its mean?

Example (RIP)For some norm ‖·‖, RIP constant of subsampled Fourier

‖AT A− I‖

= ‖∑

ATi Ai − I‖.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 25 / 52

Page 83: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

SymmetrizationLemma (Symmetrization)Suppose X1, . . . ,Xt are i.i.d. with mean µ. For any norm ‖·‖,

E

[∥∥∥∥∥1t

∑i

Xi − µ

∥∥∥∥∥]≤ 2E

[∥∥∥∥∥1t

∑i

siXi

∥∥∥∥∥]

≤ 3E

[∥∥∥∥∥1t

∑i

giXi

∥∥∥∥∥]

where si ∈ ±1 independently.

How well does X concentrate about its mean?

Example (RIP)For some norm ‖·‖, RIP constant of subsampled Fourier

‖AT A− I‖ = ‖∑

ATi Ai − I‖.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 25 / 52

Page 84: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

SymmetrizationLemma (Symmetrization)Suppose X1, . . . ,Xt are i.i.d. with mean µ. For any norm ‖·‖,

E

[∥∥∥∥∥1t

∑i

Xi − µ

∥∥∥∥∥]≤ 2E

[∥∥∥∥∥1t

∑i

siXi

∥∥∥∥∥]

≤ 3E

[∥∥∥∥∥1t

∑i

giXi

∥∥∥∥∥]

where si ∈ ±1 independently.

Proof.

Draw X ′1, . . . ,X′t independently from the same distribution.

E[‖1t

∑Xi − E[

1t

∑X ′i ]‖] ≤ E[‖1

t

∑(Xi − X ′i )‖]

= E[‖1t

∑si(Xi − X ′i )‖]

and apply the triangle inequality.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 25 / 52

Page 85: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

SymmetrizationLemma (Symmetrization)Suppose X1, . . . ,Xt are i.i.d. with mean µ. For any norm ‖·‖,

E

[∥∥∥∥∥1t

∑i

Xi − µ

∥∥∥∥∥]≤ 2E

[∥∥∥∥∥1t

∑i

siXi

∥∥∥∥∥]

≤ 3E

[∥∥∥∥∥1t

∑i

giXi

∥∥∥∥∥]

where si ∈ ±1 independently.

Proof.Draw X ′1, . . . ,X

′t independently from the same distribution.

E[‖1t

∑Xi − E[

1t

∑X ′i ]‖] ≤ E[‖1

t

∑(Xi − X ′i )‖]

= E[‖1t

∑si(Xi − X ′i )‖]

and apply the triangle inequality.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 25 / 52

Page 86: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

SymmetrizationLemma (Symmetrization)Suppose X1, . . . ,Xt are i.i.d. with mean µ. For any norm ‖·‖,

E

[∥∥∥∥∥1t

∑i

Xi − µ

∥∥∥∥∥]≤ 2E

[∥∥∥∥∥1t

∑i

siXi

∥∥∥∥∥]

≤ 3E

[∥∥∥∥∥1t

∑i

giXi

∥∥∥∥∥]

where si ∈ ±1 independently.

Proof.Draw X ′1, . . . ,X

′t independently from the same distribution.

E[‖1t

∑Xi − E[

1t

∑X ′i ]‖]

≤ E[‖1t

∑(Xi − X ′i )‖]

= E[‖1t

∑si(Xi − X ′i )‖]

and apply the triangle inequality.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 25 / 52

Page 87: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

SymmetrizationLemma (Symmetrization)Suppose X1, . . . ,Xt are i.i.d. with mean µ. For any norm ‖·‖,

E

[∥∥∥∥∥1t

∑i

Xi − µ

∥∥∥∥∥]≤ 2E

[∥∥∥∥∥1t

∑i

siXi

∥∥∥∥∥]

≤ 3E

[∥∥∥∥∥1t

∑i

giXi

∥∥∥∥∥]

where si ∈ ±1 independently.

Proof.Draw X ′1, . . . ,X

′t independently from the same distribution.

E[‖1t

∑Xi − E[

1t

∑X ′i ]‖] ≤ E[‖1

t

∑(Xi − X ′i )‖]

= E[‖1t

∑si(Xi − X ′i )‖]

and apply the triangle inequality.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 25 / 52

Page 88: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

SymmetrizationLemma (Symmetrization)Suppose X1, . . . ,Xt are i.i.d. with mean µ. For any norm ‖·‖,

E

[∥∥∥∥∥1t

∑i

Xi − µ

∥∥∥∥∥]≤ 2E

[∥∥∥∥∥1t

∑i

siXi

∥∥∥∥∥]

≤ 3E

[∥∥∥∥∥1t

∑i

giXi

∥∥∥∥∥]

where si ∈ ±1 independently.

Proof.Draw X ′1, . . . ,X

′t independently from the same distribution.

E[‖1t

∑Xi − E[

1t

∑X ′i ]‖] ≤ E[‖1

t

∑(Xi − X ′i )‖]

= E[‖1t

∑si(Xi − X ′i )‖]

and apply the triangle inequality.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 25 / 52

Page 89: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

SymmetrizationLemma (Symmetrization)Suppose X1, . . . ,Xt are i.i.d. with mean µ. For any norm ‖·‖,

E

[∥∥∥∥∥1t

∑i

Xi − µ

∥∥∥∥∥]≤ 2E

[∥∥∥∥∥1t

∑i

siXi

∥∥∥∥∥]

≤ 3E

[∥∥∥∥∥1t

∑i

giXi

∥∥∥∥∥]

where si ∈ ±1 independently.

Proof.Draw X ′1, . . . ,X

′t independently from the same distribution.

E[‖1t

∑Xi − E[

1t

∑X ′i ]‖] ≤ E[‖1

t

∑(Xi − X ′i )‖]

= E[‖1t

∑si(Xi − X ′i )‖]

and apply the triangle inequality.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 25 / 52

Page 90: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

SymmetrizationLemma (Symmetrization)Suppose X1, . . . ,Xt are i.i.d. with mean µ. For any norm ‖·‖,

E

[∥∥∥∥∥1t

∑i

Xi − µ

∥∥∥∥∥]≤ 2E

[∥∥∥∥∥1t

∑i

siXi

∥∥∥∥∥]

≤ 3E

[∥∥∥∥∥1t

∑i

giXi

∥∥∥∥∥]

where si ∈ ±1 independently.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 25 / 52

Page 91: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

SymmetrizationLemma (Symmetrization)Suppose X1, . . . ,Xt are i.i.d. with mean µ. For any norm ‖·‖,

E

[∥∥∥∥∥1t

∑i

Xi − µ

∥∥∥∥∥]≤ 2E

[∥∥∥∥∥1t

∑i

siXi

∥∥∥∥∥]≤ 3E

[∥∥∥∥∥1t

∑i

giXi

∥∥∥∥∥]

where si ∈ ±1 independently and gi ∼ N(0,1) independently.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 25 / 52

Page 92: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

SymmetrizationLemma (Symmetrization)Suppose X1, . . . ,Xt are i.i.d. with mean µ. For any norm ‖·‖,

E

[∥∥∥∥∥1t

∑i

Xi − µ

∥∥∥∥∥]≤ 2E

[∥∥∥∥∥1t

∑i

siXi

∥∥∥∥∥]≤ 3E

[∥∥∥∥∥1t

∑i

giXi

∥∥∥∥∥]

where si ∈ ±1 independently and gi ∼ N(0,1) independently.

Proof.

We have E[|gi |] ≈ .8 > 2/3.

2E[‖∑

siXi‖]

≤ 3E[‖∑

siEEE[[[|||gi|||]]]Xi‖]

≤ 3E[‖∑

si|||gi|||Xi‖]

= 3E[‖∑

giXi‖].

Because gi = si E[|gi |] and E[|gi |] > 2/3.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 25 / 52

Page 93: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

SymmetrizationLemma (Symmetrization)Suppose X1, . . . ,Xt are i.i.d. with mean µ. For any norm ‖·‖,

E

[∥∥∥∥∥1t

∑i

Xi − µ

∥∥∥∥∥]≤ 2E

[∥∥∥∥∥1t

∑i

siXi

∥∥∥∥∥]≤ 3E

[∥∥∥∥∥1t

∑i

giXi

∥∥∥∥∥]

where si ∈ ±1 independently and gi ∼ N(0,1) independently.

Proof.We have E[|gi |] ≈ .8 > 2/3.

2E[‖∑

siXi‖]

≤ 3E[‖∑

siEEE[[[|||gi|||]]]Xi‖]

≤ 3E[‖∑

si|||gi|||Xi‖]

= 3E[‖∑

giXi‖].

Because gi = si E[|gi |] and E[|gi |] > 2/3.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 25 / 52

Page 94: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

SymmetrizationLemma (Symmetrization)Suppose X1, . . . ,Xt are i.i.d. with mean µ. For any norm ‖·‖,

E

[∥∥∥∥∥1t

∑i

Xi − µ

∥∥∥∥∥]≤ 2E

[∥∥∥∥∥1t

∑i

siXi

∥∥∥∥∥]≤ 3E

[∥∥∥∥∥1t

∑i

giXi

∥∥∥∥∥]

where si ∈ ±1 independently and gi ∼ N(0,1) independently.

Proof.We have E[|gi |] ≈ .8 > 2/3.

2E[‖∑

siXi‖]

≤ 3E[‖∑

siEEE[[[|||gi|||]]]Xi‖]

≤ 3E[‖∑

si|||gi|||Xi‖]

= 3E[‖∑

giXi‖].

Because gi = si E[|gi |] and E[|gi |] > 2/3.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 25 / 52

Page 95: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

SymmetrizationLemma (Symmetrization)Suppose X1, . . . ,Xt are i.i.d. with mean µ. For any norm ‖·‖,

E

[∥∥∥∥∥1t

∑i

Xi − µ

∥∥∥∥∥]≤ 2E

[∥∥∥∥∥1t

∑i

siXi

∥∥∥∥∥]≤ 3E

[∥∥∥∥∥1t

∑i

giXi

∥∥∥∥∥]

where si ∈ ±1 independently and gi ∼ N(0,1) independently.

Proof.We have E[|gi |] ≈ .8 > 2/3.

2E[‖∑

siXi‖] ≤ 3E[‖∑

siEEE[[[|||gi|||]]]Xi‖]

≤ 3E[‖∑

si|||gi|||Xi‖]

= 3E[‖∑

giXi‖].

Because gi = si E[|gi |] and E[|gi |] > 2/3.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 25 / 52

Page 96: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

SymmetrizationLemma (Symmetrization)Suppose X1, . . . ,Xt are i.i.d. with mean µ. For any norm ‖·‖,

E

[∥∥∥∥∥1t

∑i

Xi − µ

∥∥∥∥∥]≤ 2E

[∥∥∥∥∥1t

∑i

siXi

∥∥∥∥∥]≤ 3E

[∥∥∥∥∥1t

∑i

giXi

∥∥∥∥∥]

where si ∈ ±1 independently and gi ∼ N(0,1) independently.

Proof.We have E[|gi |] ≈ .8 > 2/3.

2E[‖∑

siXi‖] ≤ 3E[‖∑

siEEE[[[|||gi|||]]]Xi‖]

≤ 3E[‖∑

si|||gi|||Xi‖]

= 3E[‖∑

giXi‖].

Because gi = si E[|gi |] and E[|gi |] > 2/3.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 25 / 52

Page 97: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

SymmetrizationLemma (Symmetrization)Suppose X1, . . . ,Xt are i.i.d. with mean µ. For any norm ‖·‖,

E

[∥∥∥∥∥1t

∑i

Xi − µ

∥∥∥∥∥]≤ 2E

[∥∥∥∥∥1t

∑i

siXi

∥∥∥∥∥]≤ 3E

[∥∥∥∥∥1t

∑i

giXi

∥∥∥∥∥]

where si ∈ ±1 independently and gi ∼ N(0,1) independently.

Proof.We have E[|gi |] ≈ .8 > 2/3.

2E[‖∑

siXi‖] ≤ 3E[‖∑

siEEE[[[|||gi|||]]]Xi‖]

≤ 3E[‖∑

si|||gi|||Xi‖]

= 3E[‖∑

giXi‖].

Because gi = si E[|gi |] and E[|gi |] > 2/3.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 25 / 52

Page 98: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Outline

1 IntroductionCompressive sensingJohnson Lindenstrauss TransformsOur result

2 Concentration of measure: a toolboxOverviewSymmetrizationGaussian ProcessesLipschitz Concentration

3 ProofOverviewCovering Number

4 Conclusion

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 26 / 52

Page 99: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian ProcessesGaussian process Gx : a Gaussian at each point x ∈ T .

Standard problem: E supx∈T Gx .

Example (Maximum singular value of random Gaussian matrix)Let A be a random m × n Gaussian matrix. For any u ∈ Rm andv ∈ Rn, define

Gu,v := uT Av = 〈uvT ,A〉.

Then Gu,v ∼ N(0, ‖uvT‖2F ).

E‖A‖2 = E supu,v∈Sm−1×Sn−1

uT Av = E supu,v∈Sm−1×Sn−1

Gu,v

Depends on the geometry of T .Distance: ‖x − y‖ is standard deviation of Gx −Gy .In example: ‖(u, v)− (u′, v ′)‖ = ‖uvT − u′v ′T‖F .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 27 / 52

Page 100: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian ProcessesGaussian process Gx : a Gaussian at each point x ∈ T .

Standard problem: E supx∈T Gx .

Example (Maximum singular value of random Gaussian matrix)Let A be a random m × n Gaussian matrix. For any u ∈ Rm andv ∈ Rn, define

Gu,v := uT Av = 〈uvT ,A〉.

Then Gu,v ∼ N(0, ‖uvT‖2F ).

E‖A‖2 = E supu,v∈Sm−1×Sn−1

uT Av = E supu,v∈Sm−1×Sn−1

Gu,v

Depends on the geometry of T .Distance: ‖x − y‖ is standard deviation of Gx −Gy .In example: ‖(u, v)− (u′, v ′)‖ = ‖uvT − u′v ′T‖F .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 27 / 52

Page 101: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian ProcessesGaussian process Gx : a Gaussian at each point x ∈ T .

Standard problem: E supx∈T Gx .

Example (Maximum singular value of random Gaussian matrix)Let A be a random m × n Gaussian matrix. For any u ∈ Rm andv ∈ Rn, define

Gu,v := uT Av = 〈uvT ,A〉.

Then Gu,v ∼ N(0, ‖uvT‖2F ).

E‖A‖2 = E supu,v∈Sm−1×Sn−1

uT Av = E supu,v∈Sm−1×Sn−1

Gu,v

Depends on the geometry of T .Distance: ‖x − y‖ is standard deviation of Gx −Gy .In example: ‖(u, v)− (u′, v ′)‖ = ‖uvT − u′v ′T‖F .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 27 / 52

Page 102: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian ProcessesGaussian process Gx : a Gaussian at each point x ∈ T .Standard problem: E supx∈T Gx .

Example (Maximum singular value of random Gaussian matrix)Let A be a random m × n Gaussian matrix. For any u ∈ Rm andv ∈ Rn, define

Gu,v := uT Av = 〈uvT ,A〉.

Then Gu,v ∼ N(0, ‖uvT‖2F ).

E‖A‖2 = E supu,v∈Sm−1×Sn−1

uT Av = E supu,v∈Sm−1×Sn−1

Gu,v

Depends on the geometry of T .Distance: ‖x − y‖ is standard deviation of Gx −Gy .In example: ‖(u, v)− (u′, v ′)‖ = ‖uvT − u′v ′T‖F .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 27 / 52

Page 103: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian ProcessesGaussian process Gx : a Gaussian at each point x ∈ T .Standard problem: E supx∈T Gx .

Example (Maximum singular value of random Gaussian matrix)Let A be a random m × n Gaussian matrix. For any u ∈ Rm andv ∈ Rn, define

Gu,v := uT Av = 〈uvT ,A〉.

Then Gu,v ∼ N(0, ‖uvT‖2F ).

E‖A‖2 = E supu,v∈Sm−1×Sn−1

uT Av = E supu,v∈Sm−1×Sn−1

Gu,v

Depends on the geometry of T .

Distance: ‖x − y‖ is standard deviation of Gx −Gy .In example: ‖(u, v)− (u′, v ′)‖ = ‖uvT − u′v ′T‖F .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 27 / 52

Page 104: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian ProcessesGaussian process Gx : a Gaussian at each point x ∈ T .Standard problem: E supx∈T Gx .

Example (Maximum singular value of random Gaussian matrix)Let A be a random m × n Gaussian matrix. For any u ∈ Rm andv ∈ Rn, define

Gu,v := uT Av = 〈uvT ,A〉.

Then Gu,v ∼ N(0, ‖uvT‖2F ).

E‖A‖2 = E supu,v∈Sm−1×Sn−1

uT Av = E supu,v∈Sm−1×Sn−1

Gu,v

Depends on the geometry of T .Distance: ‖x − y‖ is standard deviation of Gx −Gy .

In example: ‖(u, v)− (u′, v ′)‖ = ‖uvT − u′v ′T‖F .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 27 / 52

Page 105: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian ProcessesGaussian process Gx : a Gaussian at each point x ∈ T .Standard problem: E supx∈T Gx .

Example (Maximum singular value of random Gaussian matrix)Let A be a random m × n Gaussian matrix. For any u ∈ Rm andv ∈ Rn, define

Gu,v := uT Av = 〈uvT ,A〉.

Then Gu,v ∼ N(0, ‖uvT‖2F ).

E‖A‖2 = E supu,v∈Sm−1×Sn−1

uT Av = E supu,v∈Sm−1×Sn−1

Gu,v

Depends on the geometry of T .Distance: ‖x − y‖ is standard deviation of Gx −Gy .In example: ‖(u, v)− (u′, v ′)‖ = ‖uvT − u′v ′T‖F .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 27 / 52

Page 106: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian ProcessesGoal: E supx∈T Gx , where Gx −Gy ∼ N(0, ‖x − y‖2).

Ignoring geometry:

I Pr[Gx > σmax t ] ≤ e−t2/2

I Union bound: with high probability, Gx . σmax√

log n.I E supx∈T Gx . σmax

√log n

Two levels: σmax√

log 4 + σsmall√

log n.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 28 / 52

Page 107: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian ProcessesGoal: E supx∈T Gx , where Gx −Gy ∼ N(0, ‖x − y‖2).

Ignoring geometry:

I Pr[Gx > σmax t ] ≤ e−t2/2

I Union bound: with high probability, Gx . σmax√

log n.I E supx∈T Gx . σmax

√log n

Two levels: σmax√

log 4 + σsmall√

log n.

σmax

Position = xColor = Gx

G0 = 0

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 28 / 52

Page 108: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian ProcessesGoal: E supx∈T Gx , where Gx −Gy ∼ N(0, ‖x − y‖2).Ignoring geometry:

I Pr[Gx > σmax t ] ≤ e−t2/2

I Union bound: with high probability, Gx . σmax√

log n.I E supx∈T Gx . σmax

√log n

Two levels: σmax√

log 4 + σsmall√

log n.

σmax

Position = xColor = Gx

G0 = 0

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 28 / 52

Page 109: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian ProcessesGoal: E supx∈T Gx , where Gx −Gy ∼ N(0, ‖x − y‖2).Ignoring geometry:

I Pr[Gx > σmax t ] ≤ e−t2/2

I Union bound: with high probability, Gx . σmax√

log n.I E supx∈T Gx . σmax

√log n

Two levels: σmax√

log 4 + σsmall√

log n.

σmax

Position = xColor = Gx

G0 = 0

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 28 / 52

Page 110: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian ProcessesGoal: E supx∈T Gx , where Gx −Gy ∼ N(0, ‖x − y‖2).Ignoring geometry:

I Pr[Gx > σmax t ] ≤ e−t2/2

I Union bound: with high probability, Gx . σmax√

log n.

I E supx∈T Gx . σmax√

log n

Two levels: σmax√

log 4 + σsmall√

log n.

σmax

Position = xColor = Gx

G0 = 0

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 28 / 52

Page 111: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian ProcessesGoal: E supx∈T Gx , where Gx −Gy ∼ N(0, ‖x − y‖2).Ignoring geometry:

I Pr[Gx > σmax t ] ≤ e−t2/2

I Union bound: with high probability, Gx . σmax√

log n.I E supx∈T Gx . σmax

√log n

Two levels: σmax√

log 4 + σsmall√

log n.

σmax

Position = xColor = Gx

G0 = 0

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 28 / 52

Page 112: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian ProcessesGoal: E supx∈T Gx , where Gx −Gy ∼ N(0, ‖x − y‖2).Ignoring geometry:

I Pr[Gx > σmax t ] ≤ e−t2/2

I Union bound: with high probability, Gx . σmax√

log n.I E supx∈T Gx . σmax

√log n

Two levels: σmax√

log 4 + σsmall√

log n.

σmax

σsmall

Position = xColor = Gx

G0 = 0

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 28 / 52

Page 113: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian ProcessesGoal: E supx∈T Gx , where Gx −Gy ∼ N(0, ‖x − y‖2).Ignoring geometry:

I Pr[Gx > σmax t ] ≤ e−t2/2

I Union bound: with high probability, Gx . σmax√

log n.I E supx∈T Gx . σmax

√log n

Two levels: σmax√

log 4 + σsmall√

log n.

σmax

σsmall

Position = xColor = Gx

G0 = 0

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 28 / 52

Page 114: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian Processes: chainingBound E supx∈T Gx , where Gx −Gy has variance ‖x − y‖2.Two levels: σmax

√log 4 + σsmall

√log n.

Why stop at two?

E supx∈T

Gx . σ1√

log N(σ2) +

σ2√

log N(σ3) + σ3√

log N(σ4) +

· · ·

σmax

σsmall

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 29 / 52

Page 115: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian Processes: chainingBound E supx∈T Gx , where Gx −Gy has variance ‖x − y‖2.Two levels: σmax

√log 4 + σsmall

√log n.

Why stop at two?

E supx∈T

Gx . σ1√

log N(σ2) +

σ2√

log N(σ3) + σ3√

log N(σ4) +

· · ·

T

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 29 / 52

Page 116: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian Processes: chainingBound E supx∈T Gx , where Gx −Gy has variance ‖x − y‖2.Two levels: σmax

√log 4 + σsmall

√log n.

Why stop at two?

E supx∈T

Gx . σ1√

log N(σ2) +

σ2√

log N(σ3) + σ3√

log N(σ4) +

· · ·

T

σ1

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 29 / 52

Page 117: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian Processes: chainingBound E supx∈T Gx , where Gx −Gy has variance ‖x − y‖2.Two levels: σmax

√log 4 + σsmall

√log n.

Why stop at two?

E supx∈T

Gx . σ1√

log N(σ2) +

σ2√

log N(σ3) + σ3√

log N(σ4) +

· · ·

T

σ1σ2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 29 / 52

Page 118: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian Processes: chainingBound E supx∈T Gx , where Gx −Gy has variance ‖x − y‖2.Two levels: σmax

√log 4 + σsmall

√log n.

Why stop at two?

E supx∈T

Gx . σ1√

log N(σ2) +

σ2√

log N(σ3) + σ3√

log N(σ4) +

· · ·

T

σ1# balls necessary:

N(σ2)

(covering numberdepends on T , ‖·‖)

σ2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 29 / 52

Page 119: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian Processes: chainingBound E supx∈T Gx , where Gx −Gy has variance ‖x − y‖2.Two levels: σ1

√log N(σ2) + σ2

√log n.

Why stop at two?

E supx∈T

Gx . σ1√

log N(σ2) +

σ2√

log N(σ3) + σ3√

log N(σ4) +

· · ·

T

σ1# balls necessary:

N(σ2)

(covering numberdepends on T , ‖·‖)

σ2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 29 / 52

Page 120: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian Processes: chainingBound E supx∈T Gx , where Gx −Gy has variance ‖x − y‖2.Two levels: σ1

√log N(σ2) + σ2

√log n.

Why stop at two?

E supx∈T

Gx . σ1√

log N(σ2) +

σ2√

log N(σ3) + σ3√

log N(σ4) +

· · ·

T

σ1# balls necessary:

N(σ2)

(covering numberdepends on T , ‖·‖)

σ2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 29 / 52

Page 121: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian Processes: chainingBound E supx∈T Gx , where Gx −Gy has variance ‖x − y‖2.Two levels: σ1

√log N(σ2) + σ2

√log n.

Why stop at two?

E supx∈T

Gx . σ1√

log N(σ2) +

σ2√

log N(σ3) + σ3√

log N(σ4) +

· · ·

T

σ1# balls necessary:

N(σ2)

(covering numberdepends on T , ‖·‖)

σ2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 29 / 52

Page 122: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian Processes: chainingBound E supx∈T Gx , where Gx −Gy has variance ‖x − y‖2.Two levels: σ1

√log N(σ2) + σ2

√log n.

Why stop at two?

E supx∈T

Gx . σ1√

log N(σ2) + σ2√

log N(σ3) +

σ3√

log N(σ4) +

· · ·

T

σ3

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 29 / 52

Page 123: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian Processes: chainingBound E supx∈T Gx , where Gx −Gy has variance ‖x − y‖2.Two levels: σ1

√log N(σ2) + σ2

√log n.

Why stop at two?

E supx∈T

Gx . σ1√

log N(σ2) + σ2√

log N(σ3) + σ3√

log N(σ4) +

· · ·

T

σ4

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 29 / 52

Page 124: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian Processes: chainingBound E supx∈T Gx , where Gx −Gy has variance ‖x − y‖2.Two levels: σ1

√log N(σ2) + σ2

√log n.

Why stop at two?

E supx∈T

Gx . σ1√

log N(σ2) + σ2√

log N(σ3) + σ3√

log N(σ4) + · · ·

T

σr

= σ1/2r

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 29 / 52

Page 125: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian Processes: chainingBound E supx∈T Gx , where Gx −Gy has variance ‖x − y‖2.Two levels: σ1

√log N(σ2) + σ2

√log n.

Why stop at two?

E supx∈T

Gx .∞∑

r=0

σ1

2r

√log N

( σ1

2r+1

)

T

σr = σ1/2r

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 29 / 52

Page 126: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian Processes: chainingBound E supx∈T Gx , where Gx −Gy has variance ‖x − y‖2.Two levels: σ1

√log N(σ2) + σ2

√log n.

Why stop at two?

E supx∈T

Gx .∞∑

r=0

σ1

2r

√log N

( σ1

2r+1

)

T

σr = σ1/2r

σ

√log N(σ)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 29 / 52

Page 127: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian Processes: chainingBound E supx∈T Gx , where Gx −Gy has variance ‖x − y‖2.Two levels: σ1

√log N(σ2) + σ2

√log n.

Why stop at two?

E supx∈T

Gx .∫ ∞

0

√log N(σ)dσ

T

σr = σ1/2r

σ

√log N(σ)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 29 / 52

Page 128: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian ProcessesDudley’s Entropy Integral, Talagrand’s generic chaining

Theorem (Dudley’s Entropy Integral)Define the norm ‖·‖ of a Gaussian process G by

‖x − y‖ = standard deviation of (Gx −Gy ).

Then

γ2(T , ‖·‖) :=

E supx∈T

Gx .∫ ∞

0

√log N(T , ‖·‖,u)du

Bound a random variable using geometry.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 30 / 52

Page 129: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian ProcessesDudley’s Entropy Integral, Talagrand’s generic chaining

Theorem (Dudley’s Entropy Integral)Define the norm ‖·‖ of a Gaussian process G by

‖x − y‖ = standard deviation of (Gx −Gy ).

Thenγ2(T , ‖·‖) := E sup

x∈TGx .

∫ ∞0

√log N(T , ‖·‖,u)du

Bound a random variable using geometry.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 30 / 52

Page 130: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Gaussian ProcessesDudley’s Entropy Integral, Talagrand’s generic chaining

Theorem (Dudley’s Entropy Integral)Define the norm ‖·‖ of a Gaussian process G by

‖x − y‖ = standard deviation of (Gx −Gy ).

Thenγ2(T , ‖·‖) := E sup

x∈TGx .

∫ ∞0

√log N(T , ‖·‖,u)du

Bound a random variable using geometry.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 30 / 52

Page 131: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Outline

1 IntroductionCompressive sensingJohnson Lindenstrauss TransformsOur result

2 Concentration of measure: a toolboxOverviewSymmetrizationGaussian ProcessesLipschitz Concentration

3 ProofOverviewCovering Number

4 Conclusion

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 31 / 52

Page 132: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Lipschitz Concentration of Gaussians

TheoremIf f : Rn → R is C-Lipschitz and g ∼ N(0, In), then for any t > 0,

Pr[f (g) > E[f (g)] + Ct ] ≤ e−Ω(t2).

f concentrates as well as individual Gaussians.Can replace f with −f to get lower tail bound.

ExampleIf g ∼ N(0, In), then with probability 1− δ,

‖g‖2 ≤√

n + O(√

log(1/δ)).

For n = O(1/ε2 log(1/δ)), this is 1± ε approximation.

=⇒ the Johnson-Lindenstrauss lemma.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 32 / 52

Page 133: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Lipschitz Concentration of Gaussians

TheoremIf f : Rn → R is C-Lipschitz and g ∼ N(0, In), then for any t > 0,

Pr[f (g) > E[f (g)] + Ct ] ≤ e−Ω(t2).

f concentrates as well as individual Gaussians.Can replace f with −f to get lower tail bound.

ExampleIf g ∼ N(0, In), then with probability 1− δ,

‖g‖2 ≤√

n + O(√

log(1/δ)).

For n = O(1/ε2 log(1/δ)), this is 1± ε approximation.

=⇒ the Johnson-Lindenstrauss lemma.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 32 / 52

Page 134: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Lipschitz Concentration of Gaussians

TheoremIf f : Rn → R is C-Lipschitz and g ∼ N(0, In), then for any t > 0,

Pr[f (g) > E[f (g)] + Ct ] ≤ e−Ω(t2).

f concentrates as well as individual Gaussians.Can replace f with −f to get lower tail bound.

ExampleIf g ∼ N(0, In), then with probability 1− δ,

‖g‖2 ≤√

n + O(√

log(1/δ)).

For n = O(1/ε2 log(1/δ)), this is 1± ε approximation.

=⇒ the Johnson-Lindenstrauss lemma.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 32 / 52

Page 135: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Lipschitz Concentration of Gaussians

TheoremIf f : Rn → R is C-Lipschitz and g ∼ N(0, In), then for any t > 0,

Pr[f (g) > E[f (g)] + Ct ] ≤ e−Ω(t2).

f concentrates as well as individual Gaussians.Can replace f with −f to get lower tail bound.

ExampleIf g ∼ N(0, In), then with probability 1− δ,

‖g‖2 ≤√

n + O(√

log(1/δ)).

For n = O(1/ε2 log(1/δ)), this is 1± ε approximation.

=⇒ the Johnson-Lindenstrauss lemma.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 32 / 52

Page 136: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

A Probabilist’s Toolbox (recap)

Convert to Gaussians

Symmetrization

Subgaussians

Berry-Esseen

Gaussian concentration

Hoeffding bound

Dudley’s entropy integral

Lipschitz concentration

Gaussian

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 33 / 52

Page 137: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Outline

1 IntroductionCompressive sensingJohnson Lindenstrauss TransformsOur result

2 Concentration of measure: a toolboxOverviewSymmetrizationGaussian ProcessesLipschitz Concentration

3 ProofOverviewCovering Number

4 Conclusion

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 34 / 52

Page 138: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Goal

H A = Φ

Random sign flips

B

m

logc nmB rows of Fourier matrix

For Σk denoting unit-norm k -sparse vectors, we want

E supx∈Σk

∣∣∣‖Φx‖22 − ‖x‖22∣∣∣ < ε,

(Expectation of *) = *Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 35 / 52

Page 139: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof outline: Rudelson-VershyninRudelson-Vershynin: subsampled Fourier, O(k log4 n) rows.

Nelson-P-Wootters: sparse times Fourier, O(k log3 n) rows.

Expectedsup deviation

E sup‖AT A − I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

Symmetrization

Expectednorm

of Gaussian

Union bound

Answer!

log2 n loss log n loss

γ2 : supremum of Gaussian process

Σk : k -sparse unit vectors

‖·‖ : a norm that depends on A

(specified in a few slides)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 36 / 52

Page 140: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof outline: Rudelson-VershyninRudelson-Vershynin: subsampled Fourier, O(k log4 n) rows.

Nelson-P-Wootters: sparse times Fourier, O(k log3 n) rows.

Expectedsup deviation

E sup‖AT A − I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

Symmetrization

Expectednorm

of Gaussian

Union bound

Answer!

log2 n loss log n loss

γ2 : supremum of Gaussian process

Σk : k -sparse unit vectors

‖·‖ : a norm that depends on A

(specified in a few slides)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 36 / 52

Page 141: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof outline: Rudelson-VershyninRudelson-Vershynin: subsampled Fourier, O(k log4 n) rows.

Nelson-P-Wootters: sparse times Fourier, O(k log3 n) rows.

Expectedsup deviation

E sup‖AT A − I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

Symmetrization

Expectednorm

of Gaussian

Union bound

Answer!

log2 n loss log n loss

γ2 : supremum of Gaussian process

Σk : k -sparse unit vectors

‖·‖ : a norm that depends on A

(specified in a few slides)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 36 / 52

Page 142: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof outline: Rudelson-VershyninRudelson-Vershynin: subsampled Fourier, O(k log4 n) rows.

Nelson-P-Wootters: sparse times Fourier, O(k log3 n) rows.

Expectedsup deviation

E sup‖AT A − I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

Symmetrization

Expectednorm

of Gaussian

Union bound

Answer!

log2 n loss log n loss

γ2 : supremum of Gaussian process

Σk : k -sparse unit vectors

‖·‖ : a norm that depends on A

(specified in a few slides)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 36 / 52

Page 143: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof outline: Rudelson-VershyninRudelson-Vershynin: subsampled Fourier, O(k log4 n) rows.

Nelson-P-Wootters: sparse times Fourier, O(k log3 n) rows.

Expectedsup deviation

E sup‖AT A − I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

Symmetrization

Expectednorm

of Gaussian

Union bound

Answer!

log2 n loss log n loss

γ2 : supremum of Gaussian process

Σk : k -sparse unit vectors

‖·‖ : a norm that depends on A

(specified in a few slides)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 36 / 52

Page 144: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof outline: Rudelson-VershyninRudelson-Vershynin: subsampled Fourier, O(k log4 n) rows.

Nelson-P-Wootters: sparse times Fourier, O(k log3 n) rows.

Expectedsup deviation

E sup‖AT A − I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

Symmetrization

Expectednorm

of Gaussian

Union bound

Answer!

log2 n loss log n loss

γ2 : supremum of Gaussian process

Σk : k -sparse unit vectors

‖·‖ : a norm that depends on A

(specified in a few slides)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 36 / 52

Page 145: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof outline: Rudelson-VershyninRudelson-Vershynin: subsampled Fourier, O(k log4 n) rows.

Nelson-P-Wootters: sparse times Fourier, O(k log3 n) rows.

Expectedsup deviation

E sup‖AT A − I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

E ‖z− E[z]‖

Symmetrization

Expectednorm

of Gaussian

Union bound

Answer!

log2 n loss log n loss

γ2 : supremum of Gaussian process

Σk : k -sparse unit vectors

‖·‖ : a norm that depends on A

(specified in a few slides)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 36 / 52

Page 146: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof outline: Rudelson-VershyninRudelson-Vershynin: subsampled Fourier, O(k log4 n) rows.

Nelson-P-Wootters: sparse times Fourier, O(k log3 n) rows.

Expectedsup deviation

E sup‖AT A − I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

E ‖z− E[z]‖

Symmetrization

Expectednorm

of Gaussian

Union bound

Answer!

log2 n loss log n loss

γ2 : supremum of Gaussian process

Σk : k -sparse unit vectors

‖·‖ : a norm that depends on A

(specified in a few slides)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 36 / 52

Page 147: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof outline: Rudelson-VershyninRudelson-Vershynin: subsampled Fourier, O(k log4 n) rows.

Nelson-P-Wootters: sparse times Fourier, O(k log3 n) rows.

Expectedsup deviation

E sup‖AT A − I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

E ‖z− E[z]‖

Symmetrization

Expectednorm

of GaussianE ‖g‖

Union bound

Answer!

log2 n loss log n loss

γ2 : supremum of Gaussian process

Σk : k -sparse unit vectors

‖·‖ : a norm that depends on A

(specified in a few slides)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 36 / 52

Page 148: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof outline: Rudelson-VershyninRudelson-Vershynin: subsampled Fourier, O(k log4 n) rows.

Nelson-P-Wootters: sparse times Fourier, O(k log3 n) rows.

Expectedsup deviation

E sup‖AT A − I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

E ‖z− E[z]‖

Symmetrization

Expectednorm

of GaussianE ‖g‖

Union bound

Answer!

log2 n loss log n loss

γ2 : supremum of Gaussian process

Σk : k -sparse unit vectors

‖·‖ : a norm that depends on A

(specified in a few slides)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 36 / 52

Page 149: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof outline: Rudelson-VershyninRudelson-Vershynin: subsampled Fourier, O(k log4 n) rows.

Nelson-P-Wootters: sparse times Fourier, O(k log3 n) rows.

Expectedsup deviation

E sup‖AT A − I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

E ‖z− E[z]‖

Symmetrization

Expectednorm

of GaussianE ‖g‖

Union bound

Answer!

log2 n loss log n loss

γ2 : supremum of Gaussian process

Σk : k -sparse unit vectors

‖·‖ : a norm that depends on A

(specified in a few slides)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 36 / 52

Page 150: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof outline: Rudelson-VershyninRudelson-Vershynin: subsampled Fourier, O(k log4 n) rows.

Nelson-P-Wootters: sparse times Fourier, O(k log3 n) rows.

Expectedsup deviation

E sup‖AT A − I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

E ‖z− E[z]‖

Symmetrization

Expectednorm

of GaussianE ‖g‖

Union bound

Answer!

log2 n loss log n loss

γ2 : supremum of Gaussian process

Σk : k -sparse unit vectors

‖·‖ : a norm that depends on A

(specified in a few slides)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 36 / 52

Page 151: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof outlineRudelson-Vershynin: subsampled Fourier, O(k log4 n) rows.Nelson-P-Wootters: sparse times Fourier, O(k log3 n) rows.

Expectedsup deviation

E sup‖ΦT Φ− I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

E ‖z− E[z]‖

Symmetrization

Expectednorm

of GaussianE ‖g‖

Union bound

Answer!

log2 n loss log n loss

γ2 : supremum of Gaussian process

Σk : k -sparse unit vectors

‖·‖ : a norm that depends on A

(specified in a few slides)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 36 / 52

Page 152: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof outlineRudelson-Vershynin: subsampled Fourier, O(k log4 n) rows.Nelson-P-Wootters: sparse times Fourier, O(k log3 n) rows.

Expectedsup deviation

E sup‖ΦT Φ− I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk ,‖‖‖‖‖‖‖‖‖·‖‖‖‖‖‖‖‖‖)

Dudley

Coveringnumber

N(Σk ,‖‖‖‖‖‖‖‖‖·‖‖‖‖‖‖‖‖‖,u)

Maurey:randomize

Expecteddeviation

E‖‖‖‖‖‖‖‖‖z− E[z]‖‖‖‖‖‖‖‖‖

Symmetrization

Expectednorm

of GaussianE‖‖‖‖‖‖‖‖‖g‖‖‖‖‖‖‖‖‖

Union bound

Answer!

log2 n loss log n loss

γ2 : supremum of Gaussian process

Σk : k -sparse unit vectors

‖·‖ : a norm that depends on A

(specified in a few slides)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 36 / 52

Page 153: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof outlineRudelson-Vershynin: subsampled Fourier, O(k log4 n) rows.Nelson-P-Wootters: sparse times Fourier, O(k log3 n) rows.

Expectedsup deviation

E sup‖ΦT Φ− I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk ,‖‖‖‖‖‖‖‖‖·‖‖‖‖‖‖‖‖‖)

Dudley

Coveringnumber

N(Σk ,‖‖‖‖‖‖‖‖‖·‖‖‖‖‖‖‖‖‖,u)

Maurey:randomize

Expecteddeviation

E‖‖‖‖‖‖‖‖‖z− E[z]‖‖‖‖‖‖‖‖‖

Symmetrization

Expectednorm

of GaussianE‖‖‖‖‖‖‖‖‖g‖‖‖‖‖‖‖‖‖

Union bound

Answer!

log2 n loss log n loss

γ2 : supremum of Gaussian process

Σk : k -sparse unit vectors

‖·‖ : a norm that depends on A

(specified in a few slides)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 36 / 52

Page 154: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof part I: triangle inequality

H A = Φ

E supx∈Σk

∣∣∣‖Φx‖22 − ‖x‖22∣∣∣

≤ E supx∈Σk

∣∣∣‖Φx‖22 − ‖Ax‖22∣∣∣+ E sup

x∈Σk

∣∣∣‖Ax‖22 − ‖x‖22∣∣∣

= E supx∈Σk

∣∣∣‖XAs‖22 − Es ‖XAs‖22∣∣∣+ (RIP constant of A),

where XA is some matrix depending x and A, and s is the vector ofrandom sign flips used in H.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 37 / 52

Page 155: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof part I: triangle inequality

H A = Φ

E supx∈Σk

∣∣∣‖Φx‖22 − ‖x‖22∣∣∣

≤ E supx∈Σk

∣∣∣‖Φx‖22 − ‖Ax‖22∣∣∣+ E sup

x∈Σk

∣∣∣‖Ax‖22 − ‖x‖22∣∣∣

= E supx∈Σk

∣∣∣‖XAs‖22 − Es ‖XAs‖22∣∣∣+ (RIP constant of A),

where XA is some matrix depending x and A, and s is the vector ofrandom sign flips used in H.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 37 / 52

Page 156: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof part I: triangle inequality

E supx∈Σk

∣∣‖XAs‖22 − Es ‖XAs‖22∣∣ + (RIP constant of A)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 38 / 52

Page 157: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof part I: triangle inequality

E supx∈Σk

∣∣‖XAs‖22 − Es ‖XAs‖22∣∣ + (RIP constant of A)

By assumption, this is small.(Recall A has extra rows)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 38 / 52

Page 158: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof part I: triangle inequality

E supx∈Σk

∣∣‖XAs‖22 − Es ‖XAs‖22∣∣ + (RIP constant of A)

By assumption, this is small.(Recall A has extra rows)

This is a Rademacher Chaos Process.We have to do some work to show that it is small.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 38 / 52

Page 159: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Proof part II: probability and geometry

By [KMR12] and some manipulation, can bound the Rademacherchaos using

γ2(Σk , ‖ · ‖A)

Some norm induced by A

The supremum of a Gaussian process over Σk with norm ‖·‖A

Dudley’s entropy integral: can estimate this by bounding the coveringnumber N(Σk , ‖·‖A,u).

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 39 / 52

Page 160: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Definition of the Norm

N(Σk , ‖·‖A,u)

for the norm ‖x‖A:

Ax= subset of x A1x

`2

B

A2x

`2

B

A3x

`2

B

A4x

`2

B

`∞

m

‖x‖A = maxi∈[m]‖Aix‖2.

Rudelson-Vershynin: estimates N(Σk , ‖·‖A,u) when B = 1.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 40 / 52

Page 161: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Definition of the Norm

N(Σk , ‖·‖A,u)

for the norm ‖x‖A:

Ax= subset of x A1x

`2

B

A2x

`2

B

A3x

`2

B

A4x

`2

B

`∞

m

‖x‖A = maxi∈[m]‖Aix‖2.

Rudelson-Vershynin: estimates N(Σk , ‖·‖A,u) when B = 1.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 40 / 52

Page 162: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Definition of the Norm

N(Σk , ‖·‖A,u)

for the norm ‖x‖A:

Ax= subset of x A1x

`2

B

A2x

`2

B

A3x

`2

B

A4x

`2

B

`∞

m

‖x‖A = maxi∈[m]‖Aix‖2.

Rudelson-Vershynin: estimates N(Σk , ‖·‖A,u) when B = 1.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 40 / 52

Page 163: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Definition of the Norm

N(Σk , ‖·‖A,u)

for the norm ‖x‖A:

Ax= subset of x A1x

`2

B

A2x

`2

B

A3x

`2

B

A4x

`2

B

`∞

m

‖x‖A = maxi∈[m]‖Aix‖2.

Rudelson-Vershynin: estimates N(Σk , ‖·‖A,u) when B = 1.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 40 / 52

Page 164: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Definition of the Norm

N(Σk , ‖·‖A,u)

for the norm ‖x‖A:

Ax= subset of x A1x

`2

B

A2x

`2

B

A3x

`2

B

A4x

`2

B

`∞

m

‖x‖A = maxi∈[m]‖Aix‖2.

Rudelson-Vershynin: estimates N(Σk , ‖·‖A,u) when B = 1.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 40 / 52

Page 165: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Definition of the Norm

N(Σk , ‖·‖A,u)

for the norm ‖x‖A:

Ax= subset of x A1x

`2

B

A2x

`2

B

A3x

`2

B

A4x

`2

B

`∞

m

‖x‖A = maxi∈[m]‖Aix‖2.

Rudelson-Vershynin: estimates N(Σk , ‖·‖A,u) when B = 1.Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 40 / 52

Page 166: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Progress

Expectedsup deviation

E sup‖ΦT Φ− I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

E ‖z− E[z]‖

Symmetrization

Expectednorm

of GaussianE ‖g‖

Union bound

Answer!

log2 n loss log n loss

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 41 / 52

Page 167: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Outline

1 IntroductionCompressive sensingJohnson Lindenstrauss TransformsOur result

2 Concentration of measure: a toolboxOverviewSymmetrizationGaussian ProcessesLipschitz Concentration

3 ProofOverviewCovering Number

4 Conclusion

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 42 / 52

Page 168: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number Bound

N(Σk , ‖·‖A,u)

≤ N(B1, ‖·‖A,u/√

k)

u

Σk = k -sparse x | ‖x‖2 ≤ 1

⊂√

kB1 = x | ‖x‖1 ≤√

k

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 43 / 52

Page 169: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number Bound

N(Σk , ‖·‖A,u) ≤ N(B1, ‖·‖A,u/√

k)

u

Σk = k -sparse x | ‖x‖2 ≤ 1⊂√

kB1 = x | ‖x‖1 ≤√

k

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 43 / 52

Page 170: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering number bound

N(B1, ‖·‖A,u)

Simpler to imagine: what about `2?How many `2 balls of radius u required to cover B1?

N(B1, ‖·‖2,u) .

(1/u)O(n)

by an easy volume argumentnO(1/u2) trickier; next few slides

Latter bound is better when u 1/√

n.Maurey’s empirical method: generalizes to arbitrary norms

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 44 / 52

Page 171: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering number bound

N(B1, ‖·‖A,u)

Simpler to imagine: what about `2?

How many `2 balls of radius u required to cover B1?

N(B1, ‖·‖2,u)

.

(1/u)O(n)

by an easy volume argumentnO(1/u2) trickier; next few slides

Latter bound is better when u 1/√

n.Maurey’s empirical method: generalizes to arbitrary norms

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 44 / 52

Page 172: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering number bound

N(B1, ‖·‖A,u)

Simpler to imagine: what about `2?How many `2 balls of radius u required to cover B1?

N(B1, ‖·‖2,u)

.

(1/u)O(n)

by an easy volume argumentnO(1/u2) trickier; next few slides

Latter bound is better when u 1/√

n.Maurey’s empirical method: generalizes to arbitrary norms

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 44 / 52

Page 173: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering number bound

N(B1, ‖·‖A,u)

Simpler to imagine: what about `2?How many `2 balls of radius u required to cover B1?

N(B1, ‖·‖2,u) .

(1/u)O(n) by an easy volume argument

nO(1/u2) trickier; next few slides

Latter bound is better when u 1/√

n.Maurey’s empirical method: generalizes to arbitrary norms

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 44 / 52

Page 174: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering number bound

N(B1, ‖·‖A,u)

Simpler to imagine: what about `2?How many `2 balls of radius u required to cover B1?

N(B1, ‖·‖2,u) .

(1/u)O(n) by an easy volume argumentnO(1/u2) trickier; next few slides

Latter bound is better when u 1/√

n.Maurey’s empirical method: generalizes to arbitrary norms

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 44 / 52

Page 175: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering number bound

N(B1, ‖·‖A,u)

Simpler to imagine: what about `2?How many `2 balls of radius u required to cover B1?

N(B1, ‖·‖2,u) .

(1/u)O(n) by an easy volume argumentnO(1/u2) trickier; next few slides

Latter bound is better when u 1/√

n.

Maurey’s empirical method: generalizes to arbitrary norms

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 44 / 52

Page 176: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering number bound

N(B1, ‖·‖A,u)

Simpler to imagine: what about `2?How many `2 balls of radius u required to cover B1?

N(B1, ‖·‖2,u) .

(1/u)O(n) by an easy volume argumentnO(1/u2) trickier; next few slides

Latter bound is better when u 1/√

n.Maurey’s empirical method: generalizes to arbitrary norms

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 44 / 52

Page 177: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering number bound

N(B1, ‖·‖A,u)

Simpler to imagine: what about `2?How many `2 balls of radius u required to cover B1?

N(B1, ‖·‖A,u) .

(√

B/u)O(n) by an easy volume argumentnO(B/u2) trickier; next few slides

Latter bound is better when u 1/√

n.Maurey’s empirical method: generalizes to arbitrary norms

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 44 / 52

Page 178: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

How many balls of radius u required to cover B

+

1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.The sample mean z = 1

t∑

zi converges to x as t →∞.Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.

I Then N(B1, ‖·‖,u) ≤ number of z

≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 179: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.The sample mean z = 1

t∑

zi converges to x as t →∞.Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.

I Then N(B1, ‖·‖,u) ≤ number of z

≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 180: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

x

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.The sample mean z = 1

t∑

zi converges to x as t →∞.Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.

I Then N(B1, ‖·‖,u) ≤ number of z

≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 181: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

x

z1

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.

The sample mean z = 1t∑

zi converges to x as t →∞.Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.

I Then N(B1, ‖·‖,u) ≤ number of z

≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 182: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

x

z2

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.

The sample mean z = 1t∑

zi converges to x as t →∞.Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.

I Then N(B1, ‖·‖,u) ≤ number of z

≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 183: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

x

z3

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.

The sample mean z = 1t∑

zi converges to x as t →∞.Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.

I Then N(B1, ‖·‖,u) ≤ number of z

≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 184: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

xz4

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.

The sample mean z = 1t∑

zi converges to x as t →∞.Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.

I Then N(B1, ‖·‖,u) ≤ number of z

≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 185: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

x

z5

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.

The sample mean z = 1t∑

zi converges to x as t →∞.Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.

I Then N(B1, ‖·‖,u) ≤ number of z

≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 186: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.The sample mean z = 1

t∑

zi converges to x as t →∞.

Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.

I Then N(B1, ‖·‖,u) ≤ number of z

≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 187: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.The sample mean z = 1

t∑

zi converges to x as t →∞.

Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.

I Then N(B1, ‖·‖,u) ≤ number of z

≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 188: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.The sample mean z = 1

t∑

zi converges to x as t →∞.

Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.

I Then N(B1, ‖·‖,u) ≤ number of z

≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 189: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.The sample mean z = 1

t∑

zi converges to x as t →∞.

Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.

I Then N(B1, ‖·‖,u) ≤ number of z

≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 190: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.The sample mean z = 1

t∑

zi converges to x as t →∞.

Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.

I Then N(B1, ‖·‖,u) ≤ number of z

≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 191: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.The sample mean z = 1

t∑

zi converges to x as t →∞.

Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.

I Then N(B1, ‖·‖,u) ≤ number of z

≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 192: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

Radius u

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.The sample mean z = 1

t∑

zi converges to x as t →∞.Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.

I Then N(B1, ‖·‖,u) ≤ number of z

≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 193: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

Radius u1/t

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.The sample mean z = 1

t∑

zi converges to x as t →∞.Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.

I Then N(B1, ‖·‖,u) ≤ number of z

≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 194: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

Radius u1/t

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.The sample mean z = 1

t∑

zi converges to x as t →∞.Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.I Then N(B1, ‖·‖,u) ≤ number of z

≤ (n + 1)t .

I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 195: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

Radius u1/t

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.The sample mean z = 1

t∑

zi converges to x as t →∞.Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.I Then N(B1, ‖·‖,u) ≤ number of z ≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 196: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

Radius u1/t

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.The sample mean z = 1

t∑

zi converges to x as t →∞.Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ uE[‖z− x‖] ≤ uE[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.I Then N(B1, ‖·‖,u) ≤ number of z ≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 197: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

Radius u1/t

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.The sample mean z = 1

t∑

zi converges to x as t →∞.Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ uE[‖z− x‖] ≤ uE[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.I Then N(B1, ‖·‖,u) ≤ number of z ≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Will show: E[‖z− x‖A] ≤√

B/t

=⇒ N(T , ‖·‖A,u) ≤ nB/u2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 198: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

Radius u1/t

How many balls of radius u required to cover B+1 ?

Consider any x ∈ B+1 .

Let z1, . . . , zt be i.i.d. randomized roundings of x to simplex.The sample mean z = 1

t∑

zi converges to x as t →∞.Let t be large enough that, regardless of x ,

E[‖z− x‖] ≤ uE[‖z− x‖] ≤ uE[‖z− x‖] ≤ u.

All x lie within u of at least one possible z.I Then N(B1, ‖·‖,u) ≤ number of z ≤ (n + 1)t .I Only (n + 1)t possible tuples (z1, . . . , zt ) =⇒ z.

Will show: E[‖z− x‖A] ≤√

B/t =⇒ N(T , ‖·‖A,u) ≤ nB/u2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 45 / 52

Page 199: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundGoal: E[‖z− x‖A] .

√B/t .

Symmetrize!

E[‖1t

∑zi − x‖A] . E[‖1

t

∑gizi‖A]

=:1√tE[‖g‖A]

where g ∈ Rn has

gj ∼ N(0,number of zi at ej

t)

independently in each coordinate.Hence E[‖g‖22] = (fraction of zi that are nonzero) ≤ 1.

Goal: E[‖g‖A] ≤√

B.

(Note: E[‖g‖2] ≤ 1 =⇒ N(B1, `2,u) ≤ n1/u2.)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 46 / 52

Page 200: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundGoal: E[‖z− x‖A] .

√B/t .

Symmetrize!

E[‖1t

∑zi − x‖A]

. E[‖1t

∑gizi‖A]

=:1√tE[‖g‖A]

where g ∈ Rn has

gj ∼ N(0,number of zi at ej

t)

independently in each coordinate.

Hence E[‖g‖22] = (fraction of zi that are nonzero) ≤ 1.

Goal: E[‖g‖A] ≤√

B.

(Note: E[‖g‖2] ≤ 1 =⇒ N(B1, `2,u) ≤ n1/u2.)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 46 / 52

Page 201: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundGoal: E[‖z− x‖A] .

√B/t .

Symmetrize!

E[‖1t

∑zi − x‖A] . E[‖1

t

∑gizi‖A]

=:1√tE[‖g‖A]

where g ∈ Rn has

gj ∼ N(0,number of zi at ej

t)

independently in each coordinate.

Hence E[‖g‖22] = (fraction of zi that are nonzero) ≤ 1.

Goal: E[‖g‖A] ≤√

B.

(Note: E[‖g‖2] ≤ 1 =⇒ N(B1, `2,u) ≤ n1/u2.)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 46 / 52

Page 202: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundGoal: E[‖z− x‖A] .

√B/t .

Symmetrize!

E[‖1t

∑zi − x‖A] . E[‖1

t

∑gizi‖A]

=:1√tE[‖g‖A]

where g ∈ Rn has

gj ∼ N(0,number of zi at ej

t)

independently in each coordinate.

Hence E[‖g‖22] = (fraction of zi that are nonzero) ≤ 1.

Goal: E[‖g‖A] ≤√

B.

(Note: E[‖g‖2] ≤ 1 =⇒ N(B1, `2,u) ≤ n1/u2.)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 46 / 52

Page 203: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundGoal: E[‖z− x‖A] .

√B/t .

Symmetrize!

E[‖1t

∑zi − x‖A] . E[‖1

t

∑gizi‖A]

=:1√tE[‖g‖A]

where g ∈ Rn has

gj ∼ N(0,number of zi at ej

t)

independently in each coordinate.Hence E[‖g‖22] = (fraction of zi that are nonzero) ≤ 1.

Goal: E[‖g‖A] ≤√

B.

(Note: E[‖g‖2] ≤ 1 =⇒ N(B1, `2,u) ≤ n1/u2.)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 46 / 52

Page 204: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundGoal: E[‖z− x‖A] .

√B/t .

Symmetrize!

E[‖1t

∑zi − x‖A] . E[‖1

t

∑gizi‖A]

=:1√tE[‖g‖A]

where g ∈ Rn has

gj ∼ N(0,number of zi at ej

t)

independently in each coordinate.Hence E[‖g‖22] = (fraction of zi that are nonzero) ≤ 1.

Goal: E[‖g‖A] ≤√

B.

(Note: E[‖g‖2] ≤ 1 =⇒ N(B1, `2,u) ≤ n1/u2.)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 46 / 52

Page 205: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundGoal: E[‖z− x‖A] .

√B/t .

Symmetrize!

E[‖1t

∑zi − x‖A] . E[‖1

t

∑gizi‖A]

=:1√tE[‖g‖A]

where g ∈ Rn has

gj ∼ N(0,number of zi at ej

t)

independently in each coordinate.Hence E[‖g‖22] = (fraction of zi that are nonzero) ≤ 1.

Goal: E[‖g‖A] ≤√

B.

(Note: E[‖g‖2] ≤ 1 =⇒ N(B1, `2,u) ≤ n1/u2.)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 46 / 52

Page 206: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Progress

Expectedsup deviation

E sup‖ΦT Φ− I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

E ‖z− E[z]‖

Symmetrization

Expectednorm

of GaussianE ‖g‖

Union bound

Answer!

log2 n loss log n loss

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 47 / 52

Page 207: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Progress

Expectedsup deviation

E sup‖ΦT Φ− I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

E ‖z− E[z]‖

Symmetrization

Expectednorm

of GaussianE ‖g‖

Union bound

Answer!

log2 n loss log n loss

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 47 / 52

Page 208: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (intuition)Just want to bound E[‖g‖A].

Ag= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

g ∈ Rn has Gaussian coordinates, k -sparse, total variance 1.

Each is N(0,1)

Each coordinate gj = Fjg ∼ N(0,1).Rudelson-Vershynin: B = 1

I Naive: ‖g‖∞ .√

log n.I Gives the log n loss in their bound.

If the gi were independent:

‖Aig‖2 ≤√

B + O(√

log n) w.h.p.

Would get E[‖g‖A] .√

B

so union bound just loses a constant.

They’re not independent... but the Ai satisfy “very weak” RIP.

I Bound ‖Aig‖2 using ‖g‖2, which has independent entries.

Lipschitz concentration(just like

√n +

√log(1/δ) in tutorial)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 48 / 52

Page 209: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (intuition)Just want to bound E[‖g‖A].

Ag= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

g ∈ Rn has Gaussian coordinates, k -sparse, total variance 1.

Each is N(0,1)

Each coordinate gj = Fjg ∼ N(0,1).

Rudelson-Vershynin: B = 1

I Naive: ‖g‖∞ .√

log n.I Gives the log n loss in their bound.

If the gi were independent:

‖Aig‖2 ≤√

B + O(√

log n) w.h.p.

Would get E[‖g‖A] .√

B

so union bound just loses a constant.

They’re not independent... but the Ai satisfy “very weak” RIP.

I Bound ‖Aig‖2 using ‖g‖2, which has independent entries.

Lipschitz concentration(just like

√n +

√log(1/δ) in tutorial)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 48 / 52

Page 210: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (intuition)Just want to bound E[‖g‖A].

Ag= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

g ∈ Rn has Gaussian coordinates, k -sparse, total variance 1.

Each is N(0,1)

Each coordinate gj = Fjg ∼ N(0,1).Rudelson-Vershynin: B = 1

I Naive: ‖g‖∞ .√

log n.I Gives the log n loss in their bound.

If the gi were independent:

‖Aig‖2 ≤√

B + O(√

log n) w.h.p.

Would get E[‖g‖A] .√

B

so union bound just loses a constant.

They’re not independent... but the Ai satisfy “very weak” RIP.

I Bound ‖Aig‖2 using ‖g‖2, which has independent entries.

Lipschitz concentration(just like

√n +

√log(1/δ) in tutorial)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 48 / 52

Page 211: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (intuition)Just want to bound E[‖g‖A].

Ag= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

g ∈ Rn has Gaussian coordinates, k -sparse, total variance 1.

Each is N(0,1)

Each coordinate gj = Fjg ∼ N(0,1).Rudelson-Vershynin: B = 1

I Naive: ‖g‖∞ .√

log n.

I Gives the log n loss in their bound.If the gi were independent:

‖Aig‖2 ≤√

B + O(√

log n) w.h.p.

Would get E[‖g‖A] .√

B

so union bound just loses a constant.

They’re not independent... but the Ai satisfy “very weak” RIP.

I Bound ‖Aig‖2 using ‖g‖2, which has independent entries.

Lipschitz concentration(just like

√n +

√log(1/δ) in tutorial)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 48 / 52

Page 212: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (intuition)Just want to bound E[‖g‖A].

Ag= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

g ∈ Rn has Gaussian coordinates, k -sparse, total variance 1.

Each is N(0,1)

Each coordinate gj = Fjg ∼ N(0,1).Rudelson-Vershynin: B = 1

I Naive: ‖g‖∞ .√

log n.I Gives the log n loss in their bound.

If the gi were independent:

‖Aig‖2 ≤√

B + O(√

log n) w.h.p.

Would get E[‖g‖A] .√

B

so union bound just loses a constant.

They’re not independent... but the Ai satisfy “very weak” RIP.

I Bound ‖Aig‖2 using ‖g‖2, which has independent entries.

Lipschitz concentration(just like

√n +

√log(1/δ) in tutorial)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 48 / 52

Page 213: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (intuition)Just want to bound E[‖g‖A].

Ag= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

g ∈ Rn has Gaussian coordinates, k -sparse, total variance 1.

Each is N(0,1)

Each coordinate gj = Fjg ∼ N(0,1).Rudelson-Vershynin: B = 1

I Naive: ‖g‖∞ .√

log n.I Gives the log n loss in their bound.

If the gi were independent:

‖Aig‖2 ≤√

B + O(√

log n) w.h.p.

Would get E[‖g‖A] .√

B

so union bound just loses a constant.

They’re not independent... but the Ai satisfy “very weak” RIP.

I Bound ‖Aig‖2 using ‖g‖2, which has independent entries.

Lipschitz concentration(just like

√n +

√log(1/δ) in tutorial)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 48 / 52

Page 214: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (intuition)Just want to bound E[‖g‖A].

Ag= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

g ∈ Rn has Gaussian coordinates, k -sparse, total variance 1.

Each is N(0,1)

Each coordinate gj = Fjg ∼ N(0,1).Rudelson-Vershynin: B = 1

I Naive: ‖g‖∞ .√

log n.I Gives the log n loss in their bound.

If the gi were independent:

‖Aig‖2 ≤√

B + O(√

log n) w.h.p.

Would get E[‖g‖A] .√

B

so union bound just loses a constant.They’re not independent... but the Ai satisfy “very weak” RIP.

I Bound ‖Aig‖2 using ‖g‖2, which has independent entries.

Lipschitz concentration(just like

√n +

√log(1/δ) in tutorial)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 48 / 52

Page 215: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (intuition)Just want to bound E[‖g‖A].

Ag= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

g ∈ Rn has Gaussian coordinates, k -sparse, total variance 1.

Each is N(0,1)

Each coordinate gj = Fjg ∼ N(0,1).Rudelson-Vershynin: B = 1

I Naive: ‖g‖∞ .√

log n.I Gives the log n loss in their bound.

If the gi were independent:

‖Aig‖2 ≤√

B + O(√

log n) w.h.p.

Would get E[‖g‖A] .√

B so union bound just loses a constant.

They’re not independent... but the Ai satisfy “very weak” RIP.

I Bound ‖Aig‖2 using ‖g‖2, which has independent entries.

Lipschitz concentration(just like

√n +

√log(1/δ) in tutorial)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 48 / 52

Page 216: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (intuition)Just want to bound E[‖g‖A].

Ag= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

g ∈ Rn has Gaussian coordinates, k -sparse, total variance 1.

Each is N(0,1)

Each coordinate gj = Fjg ∼ N(0,1).Rudelson-Vershynin: B = 1

I Naive: ‖g‖∞ .√

log n.I Gives the log n loss in their bound.

If the gi were independent:

‖Aig‖2 ≤√

B + O(√

log n) w.h.p.

Would get E[‖g‖A] .√

B so union bound just loses a constant.They’re not independent... but the Ai satisfy “very weak” RIP.

I Bound ‖Aig‖2 using ‖g‖2, which has independent entries.

Lipschitz concentration(just like

√n +

√log(1/δ) in tutorial)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 48 / 52

Page 217: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (intuition)Just want to bound E[‖g‖A].

Ag= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

g ∈ Rn has Gaussian coordinates, k -sparse, total variance 1.

Each is N(0,1)

Each coordinate gj = Fjg ∼ N(0,1).Rudelson-Vershynin: B = 1

I Naive: ‖g‖∞ .√

log n.I Gives the log n loss in their bound.

If the gi were independent:

‖Aig‖2 ≤√

B + O(√

log n) w.h.p.

Would get E[‖g‖A] .√

B so union bound just loses a constant.They’re not independent... but the Ai satisfy “very weak” RIP.

I Bound ‖Aig‖2 using ‖g‖2, which has independent entries.Lipschitz concentration

(just like√

n +√

log(1/δ) in tutorial)

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 48 / 52

Page 218: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (by example)Ag

= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

Just want to bound E[‖g‖A].

Recall: x ∈ Σk/√

k −→ z1, . . . , zt −→ g.The full proof is complicated.Will assume

gi ∼ N(0, xi)

(the “large t” limit).Two examples:

1 Very concentrated: x1 = 1/√

k , rest is zero.2 Very spread out: x1 = · · · = xk = 1/k .

Example (1):‖x‖1 1.Fourier transform g is Gaussian with variance 1/

√k .

‖g‖∞ . k−1/4 ·√

log n 1.Hence ‖Aig‖2

√B for all i .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 49 / 52

Page 219: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (by example)Ag

= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

Just want to bound E[‖g‖A].

Recall: x ∈ Σk/√

k −→ z1, . . . , zt −→ g.

The full proof is complicated.Will assume

gi ∼ N(0, xi)

(the “large t” limit).Two examples:

1 Very concentrated: x1 = 1/√

k , rest is zero.2 Very spread out: x1 = · · · = xk = 1/k .

Example (1):‖x‖1 1.Fourier transform g is Gaussian with variance 1/

√k .

‖g‖∞ . k−1/4 ·√

log n 1.Hence ‖Aig‖2

√B for all i .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 49 / 52

Page 220: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (by example)Ag

= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

Just want to bound E[‖g‖A].

Recall: x ∈ Σk/√

k −→ z1, . . . , zt −→ g.The full proof is complicated.

Will assumegi ∼ N(0, xi)

(the “large t” limit).Two examples:

1 Very concentrated: x1 = 1/√

k , rest is zero.2 Very spread out: x1 = · · · = xk = 1/k .

Example (1):‖x‖1 1.Fourier transform g is Gaussian with variance 1/

√k .

‖g‖∞ . k−1/4 ·√

log n 1.Hence ‖Aig‖2

√B for all i .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 49 / 52

Page 221: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (by example)Ag

= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

Just want to bound E[‖g‖A].

Recall: x ∈ Σk/√

k −→ z1, . . . , zt −→ g.The full proof is complicated.Will assume

gi ∼ N(0, xi)

(the “large t” limit).

Two examples:

1 Very concentrated: x1 = 1/√

k , rest is zero.2 Very spread out: x1 = · · · = xk = 1/k .

Example (1):‖x‖1 1.Fourier transform g is Gaussian with variance 1/

√k .

‖g‖∞ . k−1/4 ·√

log n 1.Hence ‖Aig‖2

√B for all i .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 49 / 52

Page 222: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (by example)Ag

= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

Just want to bound E[‖g‖A].

Recall: x ∈ Σk/√

k −→ z1, . . . , zt −→ g.The full proof is complicated.Will assume

gi ∼ N(0, xi)

(the “large t” limit).Two examples:

1 Very concentrated: x1 = 1/√

k , rest is zero.2 Very spread out: x1 = · · · = xk = 1/k .

Example (1):‖x‖1 1.Fourier transform g is Gaussian with variance 1/

√k .

‖g‖∞ . k−1/4 ·√

log n 1.Hence ‖Aig‖2

√B for all i .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 49 / 52

Page 223: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (by example)Ag

= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

x

i

xi

Just want to bound E[‖g‖A].

Recall: x ∈ Σk/√

k −→ z1, . . . , zt −→ g.The full proof is complicated.Will assume

gi ∼ N(0, xi)

(the “large t” limit).Two examples:

1 Very concentrated: x1 = 1/√

k , rest is zero.

2 Very spread out: x1 = · · · = xk = 1/k .

Example (1):‖x‖1 1.Fourier transform g is Gaussian with variance 1/

√k .

‖g‖∞ . k−1/4 ·√

log n 1.Hence ‖Aig‖2

√B for all i .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 49 / 52

Page 224: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (by example)Ag

= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

x

i

xi

Just want to bound E[‖g‖A].

Recall: x ∈ Σk/√

k −→ z1, . . . , zt −→ g.The full proof is complicated.Will assume

gi ∼ N(0, xi)

(the “large t” limit).Two examples:

1 Very concentrated: x1 = 1/√

k , rest is zero.2 Very spread out: x1 = · · · = xk = 1/k .

Example (1):‖x‖1 1.Fourier transform g is Gaussian with variance 1/

√k .

‖g‖∞ . k−1/4 ·√

log n 1.Hence ‖Aig‖2

√B for all i .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 49 / 52

Page 225: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (by example)Ag

= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

x

i

xi

Just want to bound E[‖g‖A].

Recall: x ∈ Σk/√

k −→ z1, . . . , zt −→ g.The full proof is complicated.Will assume

gi ∼ N(0, xi)

(the “large t” limit).Two examples:

1 Very concentrated: x1 = 1/√

k , rest is zero.2 Very spread out: x1 = · · · = xk = 1/k .

Example (1):

‖x‖1 1.Fourier transform g is Gaussian with variance 1/

√k .

‖g‖∞ . k−1/4 ·√

log n 1.Hence ‖Aig‖2

√B for all i .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 49 / 52

Page 226: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (by example)Ag

= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

x

i

xi

Just want to bound E[‖g‖A].

Recall: x ∈ Σk/√

k −→ z1, . . . , zt −→ g.The full proof is complicated.Will assume

gi ∼ N(0, xi)

(the “large t” limit).Two examples:

1 Very concentrated: x1 = 1/√

k , rest is zero.2 Very spread out: x1 = · · · = xk = 1/k .

Example (1):‖x‖1 1.

Fourier transform g is Gaussian with variance 1/√

k .‖g‖∞ . k−1/4 ·

√log n 1.

Hence ‖Aig‖2 √

B for all i .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 49 / 52

Page 227: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (by example)Ag

= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

x

i

xi

Just want to bound E[‖g‖A].

Recall: x ∈ Σk/√

k −→ z1, . . . , zt −→ g.The full proof is complicated.Will assume

gi ∼ N(0, xi)

(the “large t” limit).Two examples:

1 Very concentrated: x1 = 1/√

k , rest is zero.2 Very spread out: x1 = · · · = xk = 1/k .

Example (1):‖x‖1 1.Fourier transform g is Gaussian with variance 1/

√k .

‖g‖∞ . k−1/4 ·√

log n 1.Hence ‖Aig‖2

√B for all i .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 49 / 52

Page 228: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (by example)Ag

= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

x

i

xi

Just want to bound E[‖g‖A].

Recall: x ∈ Σk/√

k −→ z1, . . . , zt −→ g.The full proof is complicated.Will assume

gi ∼ N(0, xi)

(the “large t” limit).Two examples:

1 Very concentrated: x1 = 1/√

k , rest is zero.2 Very spread out: x1 = · · · = xk = 1/k .

Example (1):‖x‖1 1.Fourier transform g is Gaussian with variance 1/

√k .

‖g‖∞ . k−1/4 ·√

log n 1.

Hence ‖Aig‖2 √

B for all i .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 49 / 52

Page 229: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (by example)Ag

= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

x

i

xi

Just want to bound E[‖g‖A].

Recall: x ∈ Σk/√

k −→ z1, . . . , zt −→ g.The full proof is complicated.Will assume

gi ∼ N(0, xi)

(the “large t” limit).Two examples:

1 Very concentrated: x1 = 1/√

k , rest is zero.2 Very spread out: x1 = · · · = xk = 1/k .

Example (1):‖x‖1 1.Fourier transform g is Gaussian with variance 1/

√k .

‖g‖∞ . k−1/4 ·√

log n 1.Hence ‖Aig‖2

√B for all i .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 49 / 52

Page 230: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (by example)Ag

= subset of g

A1g

`2

A2g

`2

A3g

`2

A4g

`2

`∞

x

i

xi

Just want to bound E[‖g‖A].

Recall: x ∈ Σk/√

k −→ z1, . . . , zt −→ g.The full proof is complicated.Will assume

gi ∼ N(0, xi)

(the “large t” limit).Two examples:

1 Very concentrated: x1 = 1/√

k , rest is zero.2 Very spread out: x1 = · · · = xk = 1/k .

Example (1):‖x‖1 1.Fourier transform g is Gaussian with variance 1/

√k .

‖g‖∞ . k−1/4 ·√

log n 1.Hence ‖Aig‖2

√B for all i .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 49 / 52

Page 231: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (example 2)Just want to bound E[‖g‖A] when gi ∼ N(0,1/k) for i ∈ [k ].

‖Aig‖2 is C-Lipschitz with factor

C = ‖Ai‖RIP · 1/√

k

Then with high probability,

‖Aig‖2 .√

B + C√

log n

Naive bound:

‖Ai‖RIP ≤ ‖Ai‖F =√

Bk =⇒ C ≤√

B

“Very weak” RIP bound, with B = logc n:

‖Ai‖RIP . ‖Ai‖F/√

log n

SoC .

√B/ log n =⇒ ‖Aig‖2 .

√B w.h.p

So E‖g‖A = max‖Aig‖2 .√

B.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 50 / 52

Page 232: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (example 2)Just want to bound E[‖g‖A] when gi ∼ N(0,1/k) for i ∈ [k ].

‖Aig‖2 is C-Lipschitz with factor

C = ‖Ai‖RIP · 1/√

k

Then with high probability,

‖Aig‖2 .√

B + C√

log n

Naive bound:

‖Ai‖RIP ≤ ‖Ai‖F =√

Bk =⇒ C ≤√

B

“Very weak” RIP bound, with B = logc n:

‖Ai‖RIP . ‖Ai‖F/√

log n

SoC .

√B/ log n =⇒ ‖Aig‖2 .

√B w.h.p

So E‖g‖A = max‖Aig‖2 .√

B.

Aig

`2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 50 / 52

Page 233: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (example 2)Just want to bound E[‖g‖A] when gi ∼ N(0,1/k) for i ∈ [k ].

‖Aig‖2 is C-Lipschitz with factor

C = ‖Ai‖RIP · 1/√

k

Then with high probability,

‖Aig‖2 .√

B + C√

log n

Naive bound:

‖Ai‖RIP ≤ ‖Ai‖F =√

Bk =⇒ C ≤√

B

“Very weak” RIP bound, with B = logc n:

‖Ai‖RIP . ‖Ai‖F/√

log n

SoC .

√B/ log n =⇒ ‖Aig‖2 .

√B w.h.p

So E‖g‖A = max‖Aig‖2 .√

B.

Aig

`2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 50 / 52

Page 234: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (example 2)Just want to bound E[‖g‖A] when gi ∼ N(0,1/k) for i ∈ [k ].

‖Aig‖2 is C-Lipschitz with factor

C = ‖Ai‖RIP · 1/√

k

Then with high probability,

‖Aig‖2 .√

B + C√

log n

Naive bound:

‖Ai‖RIP ≤ ‖Ai‖F =√

Bk =⇒ C ≤√

B

“Very weak” RIP bound, with B = logc n:

‖Ai‖RIP . ‖Ai‖F/√

log n

SoC .

√B/ log n =⇒ ‖Aig‖2 .

√B w.h.p

So E‖g‖A = max‖Aig‖2 .√

B.

Aig

`2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 50 / 52

Page 235: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (example 2)Just want to bound E[‖g‖A] when gi ∼ N(0,1/k) for i ∈ [k ].

‖Aig‖2 is C-Lipschitz with factor

C = ‖Ai‖RIP · 1/√

k

Then with high probability,

‖Aig‖2 .√

B + C√

log n

Naive bound:

‖Ai‖RIP ≤ ‖Ai‖F =√

Bk =⇒ C ≤√

B

“Very weak” RIP bound, with B = logc n:

‖Ai‖RIP . ‖Ai‖F/√

log n

SoC .

√B/ log n =⇒ ‖Aig‖2 .

√B w.h.p

So E‖g‖A = max‖Aig‖2 .√

B.

Aig

`2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 50 / 52

Page 236: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (example 2)Just want to bound E[‖g‖A] when gi ∼ N(0,1/k) for i ∈ [k ].

‖Aig‖2 is C-Lipschitz with factor

C = ‖Ai‖RIP · 1/√

k

Then with high probability,

‖Aig‖2 .√

B + C√

log n

Naive bound:

‖Ai‖RIP ≤ ‖Ai‖F =√

Bk =⇒ C ≤√

B

“Very weak” RIP bound, with B = logc n:

‖Ai‖RIP . ‖Ai‖F/√

log n

SoC .

√B/ log n =⇒ ‖Aig‖2 .

√B w.h.p

So E‖g‖A = max‖Aig‖2 .√

B.

Aig

`2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 50 / 52

Page 237: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm (example 2)Just want to bound E[‖g‖A] when gi ∼ N(0,1/k) for i ∈ [k ].

‖Aig‖2 is C-Lipschitz with factor

C = ‖Ai‖RIP · 1/√

k

Then with high probability,

‖Aig‖2 .√

B + C√

log n

Naive bound:

‖Ai‖RIP ≤ ‖Ai‖F =√

Bk =⇒ C ≤√

B

“Very weak” RIP bound, with B = logc n:

‖Ai‖RIP . ‖Ai‖F/√

log n

SoC .

√B/ log n =⇒ ‖Aig‖2 .

√B w.h.p

So E‖g‖A = max‖Aig‖2 .√

B.

Aig

`2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 50 / 52

Page 238: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Unrolling everything

Expectedsup deviation

E sup‖ΦT Φ− I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

E ‖z− E[z]‖

Symmetrization

Expectednorm

of GaussianE ‖g‖

Union bound

Answer!

log2 n loss log n loss

√B

√B/t

nkB/u2

√kB log3 n

√k log3 n

m

Union bound just loses a constant factorSample mean z expects to lie within u of x for t ≥ B/u2Covering number of B1 is (n + 1)B/u2Entropy integral is

√kB log3 nRIP constant ε .

√k log3 n

m

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 51 / 52

Page 239: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Unrolling everything

Expectedsup deviation

E sup‖ΦT Φ− I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

E ‖z− E[z]‖

Symmetrization

Expectednorm

of GaussianE ‖g‖

Union bound

Answer!

log2 n loss log n loss

√B

√B/t

nkB/u2

√kB log3 n

√k log3 n

m

Union bound just loses a constant factorSample mean z expects to lie within u of x for t ≥ B/u2Covering number of B1 is (n + 1)B/u2Entropy integral is

√kB log3 nRIP constant ε .

√k log3 n

m

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 51 / 52

Page 240: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Unrolling everything

Expectedsup deviation

E sup‖ΦT Φ− I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

E ‖z− E[z]‖

Symmetrization

Expectednorm

of GaussianE ‖g‖

Union bound

Answer!

log2 n loss log n loss

√B

√B/t

nkB/u2

√kB log3 n

√k log3 n

m

Union bound just loses a constant factorSample mean z expects to lie within u of x for t ≥ B/u2Covering number of B1 is (n + 1)B/u2Entropy integral is

√kB log3 nRIP constant ε .

√k log3 n

m

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 51 / 52

Page 241: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Unrolling everything

Expectedsup deviation

E sup‖ΦT Φ− I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

E ‖z− E[z]‖

Symmetrization

Expectednorm

of GaussianE ‖g‖

Union bound

Answer!

log2 n loss log n loss

√B

√B/t

nkB/u2

√kB log3 n

√k log3 n

m

Union bound just loses a constant factorSample mean z expects to lie within u of x for t ≥ B/u2Covering number of B1 is (n + 1)B/u2Entropy integral is

√kB log3 nRIP constant ε .

√k log3 n

m

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 51 / 52

Page 242: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Unrolling everything

Expectedsup deviation

E sup‖ΦT Φ− I‖

Symmetrization

Expectedsup norm

of Gaussian

KrahmerMendelsonRauhut ’13

γ2(Σk , ‖·‖)

Dudley

Coveringnumber

N(Σk , ‖·‖,u)

Maurey:randomize

Expecteddeviation

E ‖z− E[z]‖

Symmetrization

Expectednorm

of GaussianE ‖g‖

Union bound

Answer!

log2 n loss log n loss

√B

√B/t

nkB/u2

√kB log3 n

√k log3 n

m

Union bound just loses a constant factorSample mean z expects to lie within u of x for t ≥ B/u2Covering number of B1 is (n + 1)B/u2Entropy integral is

√kB log3 nRIP constant ε .

√k log3 n

m

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 51 / 52

Page 243: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Summary and Open Questions

We get fast RIP matrices with O(k log3 n) rows.

Is the extra log2 n necessary?

I Loss seems to be from Dudley’s entropy integral:

sup∑≤∑

sup

I Generic chaining: tight but harder to use. [Fernique, Talagrand]

For JL, is the extra log3 n necessary?

I [Krahmer-Ward] only needs “model-based” RIP; could save log n.

Any other constructions?Lots of cool techniques in the field; can we use them elsewhere?

Thanks!

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 52 / 52

Page 244: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Summary and Open Questions

We get fast RIP matrices with O(k log3 n) rows.

Is the extra log2 n necessary?

I Loss seems to be from Dudley’s entropy integral:

sup∑≤∑

sup

I Generic chaining: tight but harder to use. [Fernique, Talagrand]

For JL, is the extra log3 n necessary?

I [Krahmer-Ward] only needs “model-based” RIP; could save log n.

Any other constructions?Lots of cool techniques in the field; can we use them elsewhere?

Thanks!

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 52 / 52

Page 245: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Summary and Open Questions

We get fast RIP matrices with O(k log3 n) rows.

Is the extra log2 n necessary?I Loss seems to be from Dudley’s entropy integral:

sup∑≤∑

sup

I Generic chaining: tight but harder to use. [Fernique, Talagrand]

For JL, is the extra log3 n necessary?

I [Krahmer-Ward] only needs “model-based” RIP; could save log n.

Any other constructions?Lots of cool techniques in the field; can we use them elsewhere?

Thanks!

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 52 / 52

Page 246: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Summary and Open Questions

We get fast RIP matrices with O(k log3 n) rows.

Is the extra log2 n necessary?I Loss seems to be from Dudley’s entropy integral:

sup∑≤∑

sup

I Generic chaining: tight but harder to use. [Fernique, Talagrand]

For JL, is the extra log3 n necessary?

I [Krahmer-Ward] only needs “model-based” RIP; could save log n.

Any other constructions?Lots of cool techniques in the field; can we use them elsewhere?

Thanks!

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 52 / 52

Page 247: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Summary and Open Questions

We get fast RIP matrices with O(k log3 n) rows.

Is the extra log2 n necessary?I Loss seems to be from Dudley’s entropy integral:

sup∑≤∑

sup

I Generic chaining: tight but harder to use. [Fernique, Talagrand]

For JL, is the extra log3 n necessary?I [Krahmer-Ward] only needs “model-based” RIP; could save log n.

Any other constructions?Lots of cool techniques in the field; can we use them elsewhere?

Thanks!

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 52 / 52

Page 248: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Summary and Open Questions

We get fast RIP matrices with O(k log3 n) rows.

Is the extra log2 n necessary?I Loss seems to be from Dudley’s entropy integral:

sup∑≤∑

sup

I Generic chaining: tight but harder to use. [Fernique, Talagrand]

For JL, is the extra log3 n necessary?I [Krahmer-Ward] only needs “model-based” RIP; could save log n.

Any other constructions?

Lots of cool techniques in the field; can we use them elsewhere?

Thanks!

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 52 / 52

Page 249: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Summary and Open Questions

We get fast RIP matrices with O(k log3 n) rows.

Is the extra log2 n necessary?I Loss seems to be from Dudley’s entropy integral:

sup∑≤∑

sup

I Generic chaining: tight but harder to use. [Fernique, Talagrand]

For JL, is the extra log3 n necessary?I [Krahmer-Ward] only needs “model-based” RIP; could save log n.

Any other constructions?Lots of cool techniques in the field; can we use them elsewhere?

Thanks!

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 52 / 52

Page 250: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Summary and Open Questions

We get fast RIP matrices with O(k log3 n) rows.

Is the extra log2 n necessary?I Loss seems to be from Dudley’s entropy integral:

sup∑≤∑

sup

I Generic chaining: tight but harder to use. [Fernique, Talagrand]

For JL, is the extra log3 n necessary?I [Krahmer-Ward] only needs “model-based” RIP; could save log n.

Any other constructions?Lots of cool techniques in the field; can we use them elsewhere?

Thanks!

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 52 / 52

Page 251: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 53 / 52

Page 252: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Thoughts on lossRecall that

N(B1, `2,u) ≤ n1/u2

So the entropy integral gives

γ2(B1, `2) ≤ log3/2 n.

But the associated Gaussian process is just:

γ2(B1, `2) = E supx∈B1

〈g, x〉

for g ∼ N(0, In). (Check: E[(〈g, x〉 − 〈g, y〉)2] = ‖x − y‖22)We can compute this directly:

E supx∈B1

〈g, x〉 = ‖g‖∞ =√

log n.

Generic chaining: there exists a partition A1,A2, . . . such that

γ2 ' supx

∑√log|Ai+1|d(x ,Ai)

Dudley: choose Ai so sup d(x ,Ai) ≤ σ1/2i .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 54 / 52

Page 253: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Thoughts on lossRecall that

N(B1, `2,u) ≤ n1/u2

So the entropy integral gives

γ2(B1, `2) ≤ log3/2 n.

But the associated Gaussian process is just:

γ2(B1, `2) = E supx∈B1

〈g, x〉

for g ∼ N(0, In). (Check: E[(〈g, x〉 − 〈g, y〉)2] = ‖x − y‖22)We can compute this directly:

E supx∈B1

〈g, x〉 = ‖g‖∞ =√

log n.

Generic chaining: there exists a partition A1,A2, . . . such that

γ2 ' supx

∑√log|Ai+1|d(x ,Ai)

Dudley: choose Ai so sup d(x ,Ai) ≤ σ1/2i .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 54 / 52

Page 254: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Thoughts on lossRecall that

N(B1, `2,u) ≤ n1/u2

So the entropy integral gives

γ2(B1, `2) ≤ log3/2 n.

But the associated Gaussian process is just:

γ2(B1, `2) = E supx∈B1

〈g, x〉

for g ∼ N(0, In). (Check: E[(〈g, x〉 − 〈g, y〉)2] = ‖x − y‖22)

We can compute this directly:

E supx∈B1

〈g, x〉 = ‖g‖∞ =√

log n.

Generic chaining: there exists a partition A1,A2, . . . such that

γ2 ' supx

∑√log|Ai+1|d(x ,Ai)

Dudley: choose Ai so sup d(x ,Ai) ≤ σ1/2i .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 54 / 52

Page 255: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Thoughts on lossRecall that

N(B1, `2,u) ≤ n1/u2

So the entropy integral gives

γ2(B1, `2) ≤ log3/2 n.

But the associated Gaussian process is just:

γ2(B1, `2) = E supx∈B1

〈g, x〉

for g ∼ N(0, In). (Check: E[(〈g, x〉 − 〈g, y〉)2] = ‖x − y‖22)We can compute this directly:

E supx∈B1

〈g, x〉 = ‖g‖∞ =√

log n.

Generic chaining: there exists a partition A1,A2, . . . such that

γ2 ' supx

∑√log|Ai+1|d(x ,Ai)

Dudley: choose Ai so sup d(x ,Ai) ≤ σ1/2i .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 54 / 52

Page 256: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Thoughts on lossRecall that

N(B1, `2,u) ≤ n1/u2

So the entropy integral gives

γ2(B1, `2) ≤ log3/2 n.

But the associated Gaussian process is just:

γ2(B1, `2) = E supx∈B1

〈g, x〉

for g ∼ N(0, In). (Check: E[(〈g, x〉 − 〈g, y〉)2] = ‖x − y‖22)We can compute this directly:

E supx∈B1

〈g, x〉 = ‖g‖∞ =√

log n.

Generic chaining: there exists a partition A1,A2, . . . such that

γ2 ' supx

∑√log|Ai+1|d(x ,Ai)

Dudley: choose Ai so sup d(x ,Ai) ≤ σ1/2i .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 54 / 52

Page 257: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Thoughts on lossRecall that

N(B1, `2,u) ≤ n1/u2

So the entropy integral gives

γ2(B1, `2) ≤ log3/2 n.

But the associated Gaussian process is just:

γ2(B1, `2) = E supx∈B1

〈g, x〉

for g ∼ N(0, In). (Check: E[(〈g, x〉 − 〈g, y〉)2] = ‖x − y‖22)We can compute this directly:

E supx∈B1

〈g, x〉 = ‖g‖∞ =√

log n.

Generic chaining: there exists a partition A1,A2, . . . such that

γ2 ' supx

∑√log|Ai+1|d(x ,Ai)

Dudley: choose Ai so sup d(x ,Ai) ≤ σ1/2i .Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 54 / 52

Page 258: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

Answer is nt , where t is such that

E := E[‖1t

∑zi − x‖] ≤ u.

E[1t∑

zi ]

Symmetrize:

E .1tE[‖∑

gizi‖]

for gi ∼ N(0,1) i.i.d.Then g :=

∑gizi is an independent Gaussian in each coordinate.

In `2,

1tE[‖g‖2] ≤ 1

tE[‖g‖22]1/2 =

√number nonzero zi

t≤ 1√

t.

giving an nO(1/u2) bound.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 55 / 52

Page 259: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

Answer is nt , where t is such that

E := E[‖1t

∑zi − x‖] ≤ u.

E[1t∑

zi ]

Symmetrize:

E .1tE[‖∑

gizi‖]

for gi ∼ N(0,1) i.i.d.Then g :=

∑gizi is an independent Gaussian in each coordinate.

In `2,

1tE[‖g‖2] ≤ 1

tE[‖g‖22]1/2 =

√number nonzero zi

t≤ 1√

t.

giving an nO(1/u2) bound.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 55 / 52

Page 260: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

Answer is nt , where t is such that

E := E[‖1t

∑zi − x‖] ≤ u.

E[1t∑

zi ]

Symmetrize:

E .1tE[‖∑

gizi‖]

for gi ∼ N(0,1) i.i.d.

Then g :=∑

gizi is an independent Gaussian in each coordinate.In `2,

1tE[‖g‖2] ≤ 1

tE[‖g‖22]1/2 =

√number nonzero zi

t≤ 1√

t.

giving an nO(1/u2) bound.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 55 / 52

Page 261: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

Answer is nt , where t is such that

E := E[‖1t

∑zi − x‖] ≤ u.

E[1t∑

zi ]

Symmetrize:

E .1tE[‖∑

gizi‖]

for gi ∼ N(0,1) i.i.d.Then g :=

∑gizi is an independent Gaussian in each coordinate.

In `2,

1tE[‖g‖2] ≤ 1

tE[‖g‖22]1/2 =

√number nonzero zi

t≤ 1√

t.

giving an nO(1/u2) bound.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 55 / 52

Page 262: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Covering Number BoundMaurey’s empirical method

Answer is nt , where t is such that

E := E[‖1t

∑zi − x‖] ≤ u.

E[1t∑

zi ]

Symmetrize:

E .1tE[‖∑

gizi‖]

for gi ∼ N(0,1) i.i.d.Then g :=

∑gizi is an independent Gaussian in each coordinate.

In `2,

1tE[‖g‖2] ≤ 1

tE[‖g‖22]1/2 =

√number nonzero zi

t≤ 1√

t.

giving an nO(1/u2) bound.

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 55 / 52

Page 263: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 1)

x ∈ Σk/√

k ⊂ B1 rounded to z1, . . . , zt symmetrized to g.

G(x) = Ez,g‖g‖A

First: split x into “large” and “small” coordinates.

G(x) ≤ G(xlarge) + G(xsmall)

xlarge: Locations where xi > (log n)/k

I Bound:‖xlarge‖1

≤ 1/ log n.

I Given ‖x‖22 ≤ 1/k , maximal ‖xlarge‖1 if spread out.

I k/(log2 n) of value (log n)/kI Absorbs the loss from union bound.

So can focus on ‖x‖∞ < (log n)/k .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 56 / 52

Page 264: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 1)

x ∈ Σk/√

k ⊂ B1 rounded to z1, . . . , zt symmetrized to g.

G(x) = Ez,g‖g‖A

First: split x into “large” and “small” coordinates.

G(x) ≤ G(xlarge) + G(xsmall)

xlarge: Locations where xi > (log n)/k

I Bound:‖xlarge‖1

≤ 1/ log n.

I Given ‖x‖22 ≤ 1/k , maximal ‖xlarge‖1 if spread out.

I k/(log2 n) of value (log n)/kI Absorbs the loss from union bound.

So can focus on ‖x‖∞ < (log n)/k .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 56 / 52

Page 265: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 1)

x ∈ Σk/√

k ⊂ B1 rounded to z1, . . . , zt symmetrized to g.

G(x) = Ez,g‖g‖A

First: split x into “large” and “small” coordinates.

G(x) ≤ G(xlarge) + G(xsmall)

xlarge: Locations where xi > (log n)/kI Bound:

‖xlarge‖1

≤ 1/ log n.

I Given ‖x‖22 ≤ 1/k , maximal ‖xlarge‖1 if spread out.

I k/(log2 n) of value (log n)/kI Absorbs the loss from union bound.

So can focus on ‖x‖∞ < (log n)/k .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 56 / 52

Page 266: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 1)

x ∈ Σk/√

k ⊂ B1 rounded to z1, . . . , zt symmetrized to g.

G(x) = Ez,g‖g‖A

First: split x into “large” and “small” coordinates.

G(x) ≤ G(xlarge) + G(xsmall)

xlarge: Locations where xi > (log n)/kI Bound:

‖xlarge‖1

≤ 1/ log n.

I Given ‖x‖22 ≤ 1/k , maximal ‖xlarge‖1 if spread out.

I k/(log2 n) of value (log n)/kI Absorbs the loss from union bound.

So can focus on ‖x‖∞ < (log n)/k .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 56 / 52

Page 267: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 1)

x ∈ Σk/√

k ⊂ B1 rounded to z1, . . . , zt symmetrized to g.

G(x) = Ez,g‖g‖A

First: split x into “large” and “small” coordinates.

G(x) ≤ G(xlarge) + G(xsmall)

xlarge: Locations where xi > (log n)/kI Bound:

‖xlarge‖1

≤ 1/ log n.

I Given ‖x‖22 ≤ 1/k , maximal ‖xlarge‖1 if spread out.

I k/(log2 n) of value (log n)/k

I Absorbs the loss from union bound.

So can focus on ‖x‖∞ < (log n)/k .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 56 / 52

Page 268: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 1)

x ∈ Σk/√

k ⊂ B1 rounded to z1, . . . , zt symmetrized to g.

G(x) = Ez,g‖g‖A

First: split x into “large” and “small” coordinates.

G(x) ≤ G(xlarge) + G(xsmall)

xlarge: Locations where xi > (log n)/kI Bound:

‖xlarge‖1 ≤ 1/ log n.

I Given ‖x‖22 ≤ 1/k , maximal ‖xlarge‖1 if spread out.

I k/(log2 n) of value (log n)/k

I Absorbs the loss from union bound.

So can focus on ‖x‖∞ < (log n)/k .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 56 / 52

Page 269: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 1)

x ∈ Σk/√

k ⊂ B1 rounded to z1, . . . , zt symmetrized to g.

G(x) = Ez,g‖g‖A

First: split x into “large” and “small” coordinates.

G(x) ≤ G(xlarge) + G(xsmall)

xlarge: Locations where xi > (log n)/kI Bound:

‖xlarge‖1 ≤ 1/ log n.

I Given ‖x‖22 ≤ 1/k , maximal ‖xlarge‖1 if spread out.

I k/(log2 n) of value (log n)/kI Absorbs the loss from union bound.

So can focus on ‖x‖∞ < (log n)/k .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 56 / 52

Page 270: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 1)

x ∈ Σk/√

k ⊂ B1 rounded to z1, . . . , zt symmetrized to g.

G(x) = Ez,g‖g‖A

First: split x into “large” and “small” coordinates.

G(x) ≤ G(xlarge) + G(xsmall)

xlarge: Locations where xi > (log n)/kI Bound:

‖xlarge‖1 ≤ 1/ log n.

I Given ‖x‖22 ≤ 1/k , maximal ‖xlarge‖1 if spread out.

I k/(log2 n) of value (log n)/kI Absorbs the loss from union bound.

So can focus on ‖x‖∞ < (log n)/k .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 56 / 52

Page 271: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 2)k -sparse x rounded to z1, . . . , zt symmetrized to g.‖x‖∞ < (log n)/k

gi ∼ N(0, σ2i ) for σ2

i = #zj at vertex ei/t2

≈ xi/t .

‖Aig‖2 is C-Lipschitz with factor

C = ‖Ai‖RIP · ‖σ‖∞Naive bound:

C . ‖Ai‖F ·√‖x‖∞/t

≤√

Bk ·√

log n/(kt) =√

B log n/t

“Very weak” RIP bound:

for some B = logc n,

‖Ai‖RIP . log4 n(√

B +√

k)

≤ ‖Ai‖F/ log n.

GivesC .

√B/(t log n)

So with high probability, ‖Aig‖2 .√

B/t + C√

log n .√

B/t .So E‖g‖A = max‖Aig‖2 .

√B/t .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 57 / 52

Page 272: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 2)k -sparse x rounded to z1, . . . , zt symmetrized to g.‖x‖∞ < (log n)/kgi ∼ N(0, σ2

i ) for σ2i = #zj at vertex ei/t2

≈ xi/t .‖Aig‖2 is C-Lipschitz with factor

C = ‖Ai‖RIP · ‖σ‖∞Naive bound:

C . ‖Ai‖F ·√‖x‖∞/t

≤√

Bk ·√

log n/(kt) =√

B log n/t

“Very weak” RIP bound:

for some B = logc n,

‖Ai‖RIP . log4 n(√

B +√

k)

≤ ‖Ai‖F/ log n.

GivesC .

√B/(t log n)

So with high probability, ‖Aig‖2 .√

B/t + C√

log n .√

B/t .So E‖g‖A = max‖Aig‖2 .

√B/t .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 57 / 52

Page 273: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 2)k -sparse x rounded to z1, . . . , zt symmetrized to g.‖x‖∞ < (log n)/kgi ∼ N(0, σ2

i ) for σ2i = #zj at vertex ei/t2 ≈ xi/t .

‖Aig‖2 is C-Lipschitz with factor

C = ‖Ai‖RIP · ‖σ‖∞Naive bound:

C . ‖Ai‖F ·√‖x‖∞/t

≤√

Bk ·√

log n/(kt) =√

B log n/t

“Very weak” RIP bound:

for some B = logc n,

‖Ai‖RIP . log4 n(√

B +√

k)

≤ ‖Ai‖F/ log n.

GivesC .

√B/(t log n)

So with high probability, ‖Aig‖2 .√

B/t + C√

log n .√

B/t .So E‖g‖A = max‖Aig‖2 .

√B/t .

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 57 / 52

Page 274: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 2)k -sparse x rounded to z1, . . . , zt symmetrized to g.‖x‖∞ < (log n)/kgi ∼ N(0, σ2

i ) for σ2i = #zj at vertex ei/t2 ≈ xi/t .

‖Aig‖2 is C-Lipschitz with factor

C = ‖Ai‖RIP · ‖σ‖∞

Naive bound:

C . ‖Ai‖F ·√‖x‖∞/t

≤√

Bk ·√

log n/(kt) =√

B log n/t

“Very weak” RIP bound:

for some B = logc n,

‖Ai‖RIP . log4 n(√

B +√

k)

≤ ‖Ai‖F/ log n.

GivesC .

√B/(t log n)

So with high probability, ‖Aig‖2 .√

B/t + C√

log n .√

B/t .So E‖g‖A = max‖Aig‖2 .

√B/t .

`2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 57 / 52

Page 275: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 2)k -sparse x rounded to z1, . . . , zt symmetrized to g.‖x‖∞ < (log n)/kgi ∼ N(0, σ2

i ) for σ2i = #zj at vertex ei/t2 ≈ xi/t .

‖Aig‖2 is C-Lipschitz with factor

C = ‖Ai‖RIP · ‖σ‖∞Naive bound:

C . ‖Ai‖F ·√‖x‖∞/t

≤√

Bk ·√

log n/(kt) =√

B log n/t

“Very weak” RIP bound:

for some B = logc n,

‖Ai‖RIP . log4 n(√

B +√

k)

≤ ‖Ai‖F/ log n.

GivesC .

√B/(t log n)

So with high probability, ‖Aig‖2 .√

B/t + C√

log n .√

B/t .So E‖g‖A = max‖Aig‖2 .

√B/t .

`2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 57 / 52

Page 276: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 2)k -sparse x rounded to z1, . . . , zt symmetrized to g.‖x‖∞ < (log n)/kgi ∼ N(0, σ2

i ) for σ2i = #zj at vertex ei/t2 ≈ xi/t .

‖Aig‖2 is C-Lipschitz with factor

C = ‖Ai‖RIP · ‖σ‖∞Naive bound:

C . ‖Ai‖F ·√‖x‖∞/t ≤

√Bk ·

√log n/(kt) =

√B log n/t

“Very weak” RIP bound:

for some B = logc n,

‖Ai‖RIP . log4 n(√

B +√

k)

≤ ‖Ai‖F/ log n.

GivesC .

√B/(t log n)

So with high probability, ‖Aig‖2 .√

B/t + C√

log n .√

B/t .So E‖g‖A = max‖Aig‖2 .

√B/t .

`2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 57 / 52

Page 277: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 2)k -sparse x rounded to z1, . . . , zt symmetrized to g.‖x‖∞ < (log n)/kgi ∼ N(0, σ2

i ) for σ2i = #zj at vertex ei/t2 ≈ xi/t .

‖Aig‖2 is C-Lipschitz with factor

C = ‖Ai‖RIP · ‖σ‖∞Naive bound:

C . ‖Ai‖F ·√‖x‖∞/t ≤

√Bk ·

√log n/(kt) =

√B log n/t

“Very weak” RIP bound:

for some B = logc n,

‖Ai‖RIP . log4 n(√

B +√

k)

≤ ‖Ai‖F/ log n.

GivesC .

√B/(t log n)

So with high probability, ‖Aig‖2 .√

B/t + C√

log n .√

B/t .So E‖g‖A = max‖Aig‖2 .

√B/t .

`2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 57 / 52

Page 278: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 2)k -sparse x rounded to z1, . . . , zt symmetrized to g.‖x‖∞ < (log n)/kgi ∼ N(0, σ2

i ) for σ2i = #zj at vertex ei/t2 ≈ xi/t .

‖Aig‖2 is C-Lipschitz with factor

C = ‖Ai‖RIP · ‖σ‖∞Naive bound:

C . ‖Ai‖F ·√‖x‖∞/t ≤

√Bk ·

√log n/(kt) =

√B log n/t

“Very weak” RIP bound: for some B = logc n,

‖Ai‖RIP . log4 n(√

B +√

k) ≤ ‖Ai‖F/ log n.

GivesC .

√B/(t log n)

So with high probability, ‖Aig‖2 .√

B/t + C√

log n .√

B/t .So E‖g‖A = max‖Aig‖2 .

√B/t .

`2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 57 / 52

Page 279: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 2)k -sparse x rounded to z1, . . . , zt symmetrized to g.‖x‖∞ < (log n)/kgi ∼ N(0, σ2

i ) for σ2i = #zj at vertex ei/t2 ≈ xi/t .

‖Aig‖2 is C-Lipschitz with factor

C = ‖Ai‖RIP · ‖σ‖∞Naive bound:

C . ‖Ai‖F ·√‖x‖∞/t ≤

√Bk ·

√log n/(kt) =

√B log n/t

“Very weak” RIP bound: for some B = logc n,

‖Ai‖RIP . log4 n(√

B +√

k) ≤ ‖Ai‖F/ log n.

GivesC .

√B/(t log n)

So with high probability, ‖Aig‖2 .√

B/t + C√

log n .√

B/t .So E‖g‖A = max‖Aig‖2 .

√B/t .

`2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 57 / 52

Page 280: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 2)k -sparse x rounded to z1, . . . , zt symmetrized to g.‖x‖∞ < (log n)/kgi ∼ N(0, σ2

i ) for σ2i = #zj at vertex ei/t2 ≈ xi/t .

‖Aig‖2 is C-Lipschitz with factor

C = ‖Ai‖RIP · ‖σ‖∞Naive bound:

C . ‖Ai‖F ·√‖x‖∞/t ≤

√Bk ·

√log n/(kt) =

√B log n/t

“Very weak” RIP bound: for some B = logc n,

‖Ai‖RIP . log4 n(√

B +√

k) ≤ ‖Ai‖F/ log n.

GivesC .

√B/(t log n)

So with high probability, ‖Aig‖2 .√

B/t + C√

log n .√

B/t .

So E‖g‖A = max‖Aig‖2 .√

B/t .

`2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 57 / 52

Page 281: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Bounding the norm in our case (part 2)k -sparse x rounded to z1, . . . , zt symmetrized to g.‖x‖∞ < (log n)/kgi ∼ N(0, σ2

i ) for σ2i = #zj at vertex ei/t2 ≈ xi/t .

‖Aig‖2 is C-Lipschitz with factor

C = ‖Ai‖RIP · ‖σ‖∞Naive bound:

C . ‖Ai‖F ·√‖x‖∞/t ≤

√Bk ·

√log n/(kt) =

√B log n/t

“Very weak” RIP bound: for some B = logc n,

‖Ai‖RIP . log4 n(√

B +√

k) ≤ ‖Ai‖F/ log n.

GivesC .

√B/(t log n)

So with high probability, ‖Aig‖2 .√

B/t + C√

log n .√

B/t .So E‖g‖A = max‖Aig‖2 .

√B/t .

`2

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 57 / 52

Page 282: Princeton MIT Michiganweb.mit.edu/~ecprice/www/slides/fjlt.pdf · Faster Sparse matrices Less robust Often: Recovery algorithms just multiply by ; T (e.g. L1 minimization) Structure-oblivious

Eric Price (MIT) Fast RIP matrices with fewer rows 2013-04-05 58 / 52