Sparse Fourier Transform(lecture 4)
Michael Kapralov1
1IBM Watson → EPFL
St. Petersburg CS ClubNovember 2015
1 / 50
Given x ∈Cn, compute the Discrete Fourier Transform of x :
xi =1n
∑j∈[n]
xjω−ij ,
where ω= e2πi/n is the n-th root of unity.
Goal: find the top k coefficients of x approximately
In previous lectures:
Ï exactly k -sparse: O(k logn) runtime and samples
Ï approximately k -sparse: O(k log2 n(loglogn)) runtime andsamples
This lecture, for approximately k -sparse case:
Ï k logn logO(1) logn samples in k log2 n logO(1) logn time;Ï O(k logn) samples (optimal).
2 / 50
Given x ∈Cn, compute the Discrete Fourier Transform of x :
xi =1n
∑j∈[n]
xjω−ij ,
where ω= e2πi/n is the n-th root of unity.
Goal: find the top k coefficients of x approximately
In previous lectures:
Ï exactly k -sparse: O(k logn) runtime and samples
Ï approximately k -sparse: O(k log2 n(loglogn)) runtime andsamples
This lecture, for approximately k -sparse case:
Ï k logn logO(1) logn samples in k log2 n logO(1) logn time;Ï O(k logn) samples (optimal).
2 / 50
Given x ∈Cn, compute the Discrete Fourier Transform of x :
xi =1n
∑j∈[n]
xjω−ij ,
where ω= e2πi/n is the n-th root of unity.
Goal: find the top k coefficients of x approximately
In previous lectures:
Ï exactly k -sparse: O(k logn) runtime and samples
Ï approximately k -sparse: O(k log2 n(loglogn)) runtime andsamples
This lecture, for approximately k -sparse case:
Ï k logn logO(1) logn samples in k log2 n logO(1) logn time;Ï O(k logn) samples (optimal).
2 / 50
Improvements?List←;For t = 1 to logk
Bt ←Ck/4t
γt ← 1/(C2t)
List ← List +PARTIALRECOVERY(Bt ,γt ,List)End
Summary:Ï independent invocations of PARTIALRECOVERY: use fresh
samples in every iteration
Ï reduce (approximate) sparsity at geometric rate
Ï need sharp filters to reduce sparsity
Ï lose Ω(logn) time and sparsity because of sharpness
3 / 50
Improvements?List←;For t = 1 to logk
Bt ←Ck/4t
γt ← 1/(C2t)
List ← List +PARTIALRECOVERY(Bt ,γt ,List)End
Summary:Ï independent invocations of PARTIALRECOVERY: use fresh
samples in every iteration
Ï reduce (approximate) sparsity at geometric rate
Ï need sharp filters to reduce sparsity
Ï lose Ω(logn) time and sparsity because of sharpness
3 / 50
Improvements?List←;For t = 1 to logk
Bt ←Ck/4t
γt ← 1/(C2t)
List ← List +PARTIALRECOVERY(Bt ,γt ,List)End
Summary:Ï independent invocations of PARTIALRECOVERY: use fresh
samples in every iteration
Ï reduce (approximate) sparsity at geometric rate
Ï need sharp filters to reduce sparsity
Ï lose Ω(logn) time and sparsity because of sharpness
3 / 50
Improvements?List←;For t = 1 to logk
Bt ←Ck/4t
γt ← 1/(C2t)
List ← List +PARTIALRECOVERY(Bt ,γt ,List)End
Summary:Ï independent invocations of PARTIALRECOVERY: use fresh
samples in every iteration
Ï reduce (approximate) sparsity at geometric rate
Ï need sharp filters to reduce sparsity
Ï lose Ω(logn) time and sparsity because of sharpness
3 / 50
Improvements?List←;For t = 1 to logk
Bt ←Ck/4t
γt ← 1/(C2t)
List ← List +PARTIALRECOVERY(Bt ,γt ,List)End
Summary:Ï independent invocations of PARTIALRECOVERY: use fresh
samples in every iteration
Ï reduce (approximate) sparsity at geometric rate
Ï need sharp filters to reduce sparsity
Ï lose Ω(logn) time and sparsity because of sharpness
3 / 50
Why not use simpler filters with smaller support?Let
Gj :=
1/(B+1) if j ∈ [−B/2,B/2]0 o.w.
0
0.2
0.4
0.6
0.8
1
1.2
-1000 -500 0 500 1000
time
-0.2
0
0.2
0.4
0.6
0.8
1
-1000 -500 0 500 1000
frequency
supp(G)=B ≈ k as opposed to B ≈ k logn, but buckets leak
4 / 50
Why not use simpler filters with smaller support?Let
Gj :=
1/(B+1) if j ∈ [−B/2,B/2]0 o.w.
0
0.2
0.4
0.6
0.8
1
1.2
-1000 -500 0 500 1000
time
-0.2
0
0.2
0.4
0.6
0.8
1
-1000 -500 0 500 1000
frequency
supp(G)=B ≈ k as opposed to B ≈ k logn, but buckets leak
Can only identify and approximate elements of value at least≈ ||x ||22/k , and estimate up to ≈ ||x ||22/k additive error, so need to
repeat Ω(logn) times
5 / 50
Sample complexity
Sample complexity=number of samples accessed in time domain.In some applications at least as important as runtime
Shi-Andronesi-Hassanieh-Ghazi-Katabi-Adalsteinsson’
ISMRM’13
Given access to x ∈Cn, find y such that
||x − y ||2 ≤C ·mink−sparse z ||x − z||2
Use smallest possible number of samples?
6 / 50
Sample complexity
Sample complexity=number of samples accessed in time domain.In some applications at least as important as runtime
Shi-Andronesi-Hassanieh-Ghazi-Katabi-Adalsteinsson’
ISMRM’13
Given access to x ∈Cn, find y such that
||x − y ||2 ≤C ·mink−sparse z ||x − z||2
Use smallest possible number of samples?
6 / 50
Uniform bounds (for all):Candes-Tao’06Rudelson-Vershynin’08Cheraghchi-Guruswami-Velingker’12Bourgain’14Haviv-Regev’15
Deterministic, Ω(n) runtime
O(k log2 k logn)
Non-uniform bounds (for each):Goldreich-Levin’89Kushilevitz-Mansour’91, Mansour’92Gilbert-Guha-Indyk-Muthukrishnan-Strauss’02Gilbert-Muthukrishnan-Strauss’05Hassanieh-Indyk-Katabi-Price’12aHassanieh-Indyk-Katabi-Price’12b
Indyk-K.-Price’14
Randomized, O(k ·poly(logn)) runtime
O(k log2 n)
Lower bound: Ω(k log(n/k)) for non-adaptive algorithms Do-Ba-Indyk-Price-Woodruff’10
TheoremThere exists an algorithm for `2/`2 sparse recovery from Fouriermeasurements using O(k logn · logO(1) logn) samples andO(k log2 n · logO(1) logn) runtime.
Optimal up to a poly(loglogn) factors for k ≤ n1−δ.
7 / 50
Uniform bounds (for all):Candes-Tao’06Rudelson-Vershynin’08Cheraghchi-Guruswami-Velingker’12Bourgain’14Haviv-Regev’15
Deterministic, Ω(n) runtime
O(k log2 k logn)
Non-uniform bounds (for each):Goldreich-Levin’89Kushilevitz-Mansour’91, Mansour’92Gilbert-Guha-Indyk-Muthukrishnan-Strauss’02Gilbert-Muthukrishnan-Strauss’05Hassanieh-Indyk-Katabi-Price’12aHassanieh-Indyk-Katabi-Price’12b
Indyk-K.-Price’14
Randomized, O(k ·poly(logn)) runtime
O(k log2 n)
Lower bound: Ω(k log(n/k)) for non-adaptive algorithms Do-Ba-Indyk-Price-Woodruff’10
TheoremThere exists an algorithm for `2/`2 sparse recovery from Fouriermeasurements using O(k logn · logO(1) logn) samples andO(k log2 n · logO(1) logn) runtime.
Optimal up to a poly(loglogn) factors for k ≤ n1−δ.7 / 50
`2/`2 sparse recovery guarantees:
||x − y ||2 ≤C ·mink−sparse z ||x − z||2
|x1| ≥ . . . ≥ |xk | ≥|xk+1| ≥ |xk+2| ≥ . . .
Err2k (x)=
∑nj=k+1 |xj |2
Residual error bounded by noiseenergy Err2
k (x)
Sufficient to ensure that most elements are below average noiselevel:
|xi − yi |2 ≤ c ·Err2k (x)/k =:µ2
8 / 50
`2/`2 sparse recovery guarantees:
||x − y ||2 ≤C ·Err2k (x)
|x1| ≥ . . . ≥ |xk | ≥|xk+1| ≥ |xk+2| ≥ . . .
Err2k (x)=
∑nj=k+1 |xj |2
Residual error bounded by noiseenergy Err2
k (x)
Sufficient to ensure that most elements are below average noiselevel:
|xi − yi |2 ≤ c ·Err2k (x)/k =:µ2
9 / 50
`2/`2 sparse recovery guarantees:
Signal to noise ratio R = ||x − y ||2/Err2k (x)≤C
|x1| ≥ . . . ≥ |xk | ≥|xk+1| ≥ |xk+2| ≥ . . .
Err2k (x)=
∑nj=k+1 |xj |2
Residual error bounded by noiseenergy Err2
k (x)
Sufficient to ensure that most elements are below average noiselevel:
|xi − yi |2 ≤ c ·Err2k (x)/k =:µ2
10 / 50
`2/`2 sparse recovery guarantees:
Signal to noise ratio R = ||x − y ||2/Err2k (x)≤C
|x1| ≥ . . . ≥ |xk | ≥|xk+1| ≥ |xk+2| ≥ . . .
Err2k (x)=
∑nj=k+1 |xj |2
Residual error bounded by noiseenergy Err2
k (x)
Sufficient to ensure that most elements are below average noiselevel:
|xi − yi |2 ≤ c ·Err2k (x)/k =:µ2
10 / 50
Iterative recovery
Many algorithms use the iterative recovery scheme:
Input: x ∈Cn
y0 ← 0For t = 1 to L
z ← PARTIALRECOVERY(x , yt−1) BTakes random samples of x −yUpdate yt ← yt−1 + z
PARTIALRECOVERY(x , y)
return dominant Fourier coefficients z of x −y (approximately)
dominant coefficients≈ |xi − yi |2 ≥µ2(above average noise level)
11 / 50
PARTIALRECOVERY(x , y)
return dominant Fourier coefficients z of x −y (approximately)
dominant coefficients≈ |xi − yi |2 ≥µ2(above average noise level)
Main questions:
Ï How many samples per SNR reduction step?Ï How many iterations?
Summary of techniques from
Gilbert-Guha-Indyk-Muthukrishnan-Strauss’02, Akavia-Goldwasser-Safra’03,
Gilbert-Muthukrishnan-Strauss’05, Iwen’10, Akavia’10, Hassanieh-Indyk-Katabi-Price’12a,
Hassanieh-Indyk-Katabi-Price’12b
12 / 50
PARTIALRECOVERY(x , y)
return dominant Fourier coefficients z of x −y (approximately)
dominant coefficients≈ |xi − yi |2 ≥µ2(above average noise level)
Main questions:
Ï How many samples per SNR reduction step?Ï How many iterations?
Summary of techniques from
Gilbert-Guha-Indyk-Muthukrishnan-Strauss’02, Akavia-Goldwasser-Safra’03,
Gilbert-Muthukrishnan-Strauss’05, Iwen’10, Akavia’10, Hassanieh-Indyk-Katabi-Price’12a,
Hassanieh-Indyk-Katabi-Price’12b
12 / 50
1-sparse recovery from Fourier measurements
−1000 −800 −600 −400 −200 0 200 400 600 800 1000−1.5
−1
−0.5
0
0.5
1
1.5
time
magnitu
de
xa =ωa·f +noise
O(logSNR n) measurements
for random a
−1000 −800 −600 −400 −200 0 200 400 600 800 10000
0.2
0.4
0.6
0.8
1
1.2
1.4
frequency
magn
itude
2πf/n
13 / 50
Reducing k -sparse recovery to 1-sparse recoveryPermute with a random linear transformation and phase shift
−1000 −800 −600 −400 −200 0 200 400 600 800 1000−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
time
ma
gnitu
de
Choose a filter G,G such thatÏ G approximates the bucketsÏ G has small support
−1000 −800 −600 −400 −200 0 200 400 600 800 1000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
magnitu
de
Compute x ∗ G = à(x ·G)
Sample complexity=supp G!
14 / 50
Reducing k -sparse recovery to 1-sparse recoveryPermute with a random linear transformation and phase shift
−1000 −800 −600 −400 −200 0 200 400 600 800 1000−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
time
ma
gnitu
de
Choose a filter G,G such thatÏ G approximates the bucketsÏ G has small support
−1000 −800 −600 −400 −200 0 200 400 600 800 1000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
magnitu
de
Compute x ∗ G = à(x ·G)
Sample complexity=supp G!
15 / 50
Reducing k -sparse recovery to 1-sparse recoveryPermute with a random linear transformation and phase shift
−1000 −800 −600 −400 −200 0 200 400 600 800 1000−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
time
ma
gnitu
de
Choose a filter G,G such thatÏ G approximates the bucketsÏ G has small support
−1000 −800 −600 −400 −200 0 200 400 600 800 1000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
magnitu
de
Compute x ∗ G = à(x ·G)
Sample complexity=supp G!
16 / 50
Reducing k -sparse recovery to 1-sparse recoveryPartition frequency space into B = k/α buckets for constant α ∈ (0,1)
−1000 −800 −600 −400 −200 0 200 400 600 800 1000−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
time
ma
gnitu
de
Choose a filter G,G such thatÏ G approximates the bucketsÏ G has small support
−1000 −800 −600 −400 −200 0 200 400 600 800 1000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
magnitu
de
Compute x ∗ G = à(x ·G)
Sample complexity=supp G!
17 / 50
Reducing k -sparse recovery to 1-sparse recoveryPartition frequency space into B = k/α buckets for constant α ∈ (0,1)
−1000 −800 −600 −400 −200 0 200 400 600 800 1000−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
time
magn
itude
Choose a filter G,G such thatÏ G approximates the bucketsÏ G has small support
−1000 −800 −600 −400 −200 0 200 400 600 800 1000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
magnitu
de
Compute x ∗ G = à(x ·G)
Sample complexity=supp G!
18 / 50
Reducing k -sparse recovery to 1-sparse recoveryPartition frequency space into B = k/α buckets for constant α ∈ (0,1)
−1000 −800 −600 −400 −200 0 200 400 600 800 1000−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
time
magn
itude
Choose a filter G,G such thatÏ G approximates the bucketsÏ G has small support
−1000 −800 −600 −400 −200 0 200 400 600 800 1000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
magnitu
de
Compute x ∗ G = à(x ·G)
Sample complexity=supp G!
19 / 50
Reducing k -sparse recovery to 1-sparse recoveryPartition frequency space into B = k/α buckets for constant α ∈ (0,1)
−1000 −800 −600 −400 −200 0 200 400 600 800 1000−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
time
magn
itude
Choose a filter G,G such thatÏ G approximates the bucketsÏ G has small support
−1000 −800 −600 −400 −200 0 200 400 600 800 1000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
magnitu
de
Compute x ∗ G = à(x ·G)
Sample complexity=supp G!
20 / 50
PARTIALRECOVERY step
PARTIALRECOVERY(x , y)
Ï Make measurements (independent permutation+filtering)Ï Locate and estimate large frequencies (1-sparse
recovery)
return dominant Fourier coefficients z of x −y (approximately)
Sample complexity = support of G
Ï How many measurements do we need?Ï How effective is a refinement step?
Both determined by signal to noise ratio in each bucket – function offilter choice
21 / 50
PARTIALRECOVERY step
PARTIALRECOVERY(x , y)
Ï Make measurements (independent permutation+filtering)Ï Locate and estimate large frequencies (1-sparse
recovery)
return dominant Fourier coefficients z of x −y (approximately)
Sample complexity = support of G
Ï How many measurements do we need?Ï How effective is a refinement step?
Both determined by signal to noise ratio in each bucket – function offilter choice
21 / 50
Time domain:support O(k) [GMS’05]
Frequency domain:
0 200 400 600 800 1000 1200 1400 1600 1800 2000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
magnitude
SNR = O(1)
Reduce SNR by O(1) factor
Ω(k log2 n) samples
Time domain:support Θ(k logn) [HIKP12]
Frequency domain:
0 200 400 600 800 1000 1200 1400 1600 1800 2000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
mag
nitude
SNR = can by poly(n)
Reduce sparsity by O(1) factor
Ω(k log2 n) samples
This paper: interpolate between the two extremes, get all benefits
22 / 50
Time domain:support O(k) [GMS’05]
Frequency domain:
0 200 400 600 800 1000 1200 1400 1600 1800 2000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
magnitude
SNR = O(1)
Reduce SNR by O(1) factor
Ω(k log2 n) samples
Time domain:support Θ(k logn) [HIKP12]
Frequency domain:
0 200 400 600 800 1000 1200 1400 1600 1800 2000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
magnitude
SNR = can by poly(n)
Reduce sparsity by O(1) factor
Ω(k log2 n) samples
This paper: interpolate between the two extremes, get all benefits
23 / 50
Main idea
A new family of filters that adapt to current upper bound on SNR.
Ï Sharp filters initially, more blurred later
0 200 400 600 800 1000 1200 1400 1600 1800 2000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
ma
gn
itu
de
0 200 400 600 800 1000 1200 1400 1600 1800 2000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
ma
gn
itu
de
0 200 400 600 800 1000 1200 1400 1600 1800 2000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
ma
gn
itu
de
24 / 50
0 200 400 600 800 1000 1200 1400 1600 1800 2000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
ma
gn
itu
de
0 200 400 600 800 1000 1200 1400 1600 1800 2000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
ma
gn
itu
de
0 200 400 600 800 1000 1200 1400 1600 1800 2000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
ma
gn
itu
de
When SNR is bounded by R:Ï filter support O(k logR) (≈ convolve boxcar with itself logR
times)
Ï (most) 1-sparse recovery subproblems for dominant frequencieshave high SNR (about R) so O∗(logR n) measurements!
O∗(k logR · logR n)=O∗(k logn) samples per step!
25 / 50
0 200 400 600 800 1000 1200 1400 1600 1800 2000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
ma
gn
itu
de
0 200 400 600 800 1000 1200 1400 1600 1800 2000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
ma
gn
itu
de
0 200 400 600 800 1000 1200 1400 1600 1800 2000−0.2
0
0.2
0.4
0.6
0.8
1
frequency
ma
gn
itu
de
When SNR is bounded by R:Ï filter support O(k logR) (≈ convolve boxcar with itself logR
times)Ï (most) 1-sparse recovery subproblems for dominant frequencies
have high SNR (about R) so O∗(logR n) measurements!
O∗(k logR · logR n)=O∗(k logn) samples per step!
25 / 50
R →R1/2 →R1/4 → . . . →C2 →C︸ ︷︷ ︸O(loglogn) iterations
PARTIALRECOVERY(x , y ,R)
kµ2= tail noise/B
R ·µ2
R12 ·µ2
After O(loglogn) iterations we have ||x ||22/Err2k (x)≤O(1)
26 / 50
R →R1/2 →R1/4 → . . . →C2 →C︸ ︷︷ ︸O(loglogn) iterations
PARTIALRECOVERY(x , y ,R)
kµ2= tail noise/B
R ·µ2
R12 ·µ2
After O(loglogn) iterations we have ||x ||22/Err2k (x)≤O(1)
26 / 50
R →R1/2 →R1/4 → . . . →C2 →C︸ ︷︷ ︸O(loglogn) iterations
PARTIALRECOVERY(x , y ,R12 )
kµ2= tail noise/B
R12 ·µ2
R14 ·µ2
After O(loglogn) iterations we have ||x ||22/Err2k (x)≤O(1)
26 / 50
R →R1/2 →R1/4 → . . . →C2 →C︸ ︷︷ ︸O(loglogn) iterations
PARTIALRECOVERY(x , y ,C2)
kµ2= tail noise/B
C2 ·µ2
C ·µ2
After O(loglogn) iterations we have ||x ||22/Err2k (x)≤C
26 / 50
Algorithm
Input: x ∈Cn
y0 ← 0R0 ← poly(n)For t = 1 to O(loglogn)
z ← PARTIALRECOVERY(x , yt−1,Rt−1) BTakes samples of x −y
Update yt ← yt−1 + z
Rt ←√
Rt−1
PARTIALRECOVERY step:Ï Takes O∗(k logn) samples independent of R
Ï Is very effective: reduces R →R12 , so O(loglogn) iterations
suffice
27 / 50
Partial recovery analysisPARTIALRECOVERY(x , y ,R)
kµ2= tail noise/B
R ·µ2
R12 ·µ2
Ï Need to reduce most ‘large’ frequencies, i.e. |xi |2 ≥pRµ2
Ï Most=1−1/poly(R) fractionÏ Iterative process, O(loglogn) steps
28 / 50
Partial recovery analysisPARTIALRECOVERY(x , y ,R)
kµ2= tail noise/B
R ·µ2
R12 ·µ2
Ï Need to reduce most ‘large’ frequencies, i.e. |xi |2 ≥pRµ2
Ï Most=1−1/poly(R) fraction
Ï Iterative process, O(loglogn) steps
28 / 50
Partial recovery analysisPARTIALRECOVERY(x , y ,R)
kµ2= tail noise/B
R ·µ2
R12 ·µ2
Ï Need to reduce most ‘large’ frequencies, i.e. |xi |2 ≥pRµ2
Ï Most=1−1/poly(R) fractionÏ Iterative process, O(loglogn) steps
28 / 50
kµ2= tail noise/B
R ·µ2
R12 ·µ2
Ï partition elements into geometric weight classesÏ write down recursion that governs the dynamicsÏ top half classes are reduced at double exponentialy rate∗ if we
use Ω(loglogR) levels
Not a problem if number of levels large enough (C loglogR suffices)
29 / 50
Sample optimal algorithm (reusing measurements)
30 / 50
Uniform bounds (for all):Candes-Tao’06Rudelson-Vershynin’08Cheraghchi-Guruswami-Velingker’12Bourgain’14Haviv-Regev’15
Deterministic, Ω(n) runtime
O(k log2 k logn)
Non-uniform bounds (for each):Goldreich-Levin’89Kushilevitz-Mansour’91, Mansour’92Gilbert-Guha-Indyk-Muthukrishnan-Strauss’02Gilbert-Muthukrishnan-Strauss’05Hassanieh-Indyk-Katabi-Price’12aHassanieh-Indyk-Katabi-Price’12bIndyk-K.-Price’14
Randomized, O(k ·poly(logn)) runtime
O(k logn ·(loglogn)C)
Lower bound: Ω(k log(n/k)) for non-adaptive algorithms Do-Ba-Indyk-Price-Woodruff’10
TheoremThere exists an algorithm for `2/`2 sparse recovery from Fouriermeasurements using O(k logn) samples and O(n log3 n) runtime.
Optimal up to constant factors for k ≤ n1−δ.
31 / 50
Uniform bounds (for all):Candes-Tao’06Rudelson-Vershynin’08Cheraghchi-Guruswami-Velingker’12Bourgain’14Haviv-Regev’15
Deterministic, Ω(n) runtime
O(k log2 k logn)
Non-uniform bounds (for each):Goldreich-Levin’89Kushilevitz-Mansour’91, Mansour’92Gilbert-Guha-Indyk-Muthukrishnan-Strauss’02Gilbert-Muthukrishnan-Strauss’05Hassanieh-Indyk-Katabi-Price’12aHassanieh-Indyk-Katabi-Price’12bIndyk-K.-Price’14
Randomized, O(k ·poly(logn)) runtime
O(k logn ·(loglogn)C)
Lower bound: Ω(k log(n/k)) for non-adaptive algorithms Do-Ba-Indyk-Price-Woodruff’10
TheoremThere exists an algorithm for `2/`2 sparse recovery from Fouriermeasurements using O(k logn) samples and O(n log3 n) runtime.
Optimal up to constant factors for k ≤ n1−δ.
31 / 50
Higher dimensional Fourier transform is needed in some applications
Given x ∈C[n]d , N = nd , compute
xj =1pN
∑i∈[n]d
ωiT jxi and xj =1pN
∑i∈[n]d
ω−iT j xi
where ω is the n-th root of unity, and n is a power of 2.
32 / 50
Previous sample complexity bounds:Ï O(k logd N) in sublinear time algorithms
Ï runtime k logO(d)N, for each
Ï O(k log4 N) for any dÏ Ω(N) time, for all
This lecture:
TheoremThere exists an algorithm for `2/`2 sparse recovery from Fouriermeasurements using Od (k logN) samples and O(N log3 N) runtime.
Sample-optimal up to constant factors for any constant d .
33 / 50
`2/`2 sparse recovery guarantees:
||x − y ||2 ≤C ·mink−sparse z ||x − z||2
|x1| ≥ . . . ≥ |xk | ≥|xk+1| ≥ |xk+2| ≥ . . .
Err2k (x)=
∑nj=k+1 |xj |2
Residual error bounded by noiseenergy Err2
k (x)
head
tail
µ≈ tail noise/p
k
34 / 50
`2/`2 sparse recovery guarantees:
||x − y ||2 ≤C ·mink−sparse z ||x − z||2
|x1| ≥ . . . ≥ |xk | ≥|xk+1| ≥ |xk+2| ≥ . . .
Err2k (x)=
∑nj=k+1 |xj |2
Residual error bounded by noiseenergy Err2
k (x)
head
tail
µ≈ tail noise/p
k
35 / 50
`2/`2 sparse recovery guarantees:
||x − y ||2 ≤C ·Err2k (x)
|x1| ≥ . . . ≥ |xk | ≥|xk+1| ≥ |xk+2| ≥ . . .
Err2k (x)=
∑nj=k+1 |xj |2
Residual error bounded by noiseenergy Err2
k (x)
head
tail
µ≈ tail noise/p
k
36 / 50
`2/`2 sparse recovery guarantees:
||x − y ||2 ≤C ·Err2k (x)
|x1| ≥ . . . ≥ |xk | ≥|xk+1| ≥ |xk+2| ≥ . . .
Err2k (x)=
∑nj=k+1 |xj |2
Residual error bounded by noiseenergy Err2
k (x)
head
tail
µ≈ tail noise/p
k
37 / 50
`2/`2 sparse recovery guarantees:
||x − y ||2 ≤C ·Err2k (x)
µ≈ tail noise/p
k
Sufficient to ensure that most elements are below average noiselevel:
|xi − yi |2 ≤ c ·Err2k (x)/k =:µ2
38 / 50
`2/`2 sparse recovery guarantees:
||x − y ||2 ≤C ·Err2k (x)
µ≈ tail noise/p
k
Will ensure that all elements are below average noise level:
||x − y ||2∞ ≤ c ·Err2k (x)/k =:µ2
39 / 50
`∞/`2 sparse recovery guarantees:
||x − y ||2∞ ≤C ·Err2k (x)/k
µ≈ tail noise/p
k
Will ensure that all elements are below average noise level:
||x − y ||2∞ ≤ c ·Err2k (x)/k =:µ2
40 / 50
`∞/`2 sparse recovery guarantees:
||x − y ||2∞ ≤C ·Err2k (x)/k
µ≈ tail noise/p
k
Will ensure that all elements are below average noise level:
||x − y ||2∞ ≤µ2
41 / 50
Iterative recovery
Input: x ∈Cn
y0 ← 0For t = 1 to L
Ï z ← PARTIALRECOVERY(x−yt−1) BTakes random samples of x −yÏ Update yt ← yt−1 + z
In most prior works sampling complexity is
samples per PARTIALRECOVERY step×number of iterations
Lots of work on carefully choosing filters, reducing number ofiterations:Hassanieh-Indyk-Katabi-Price’12,Ghazi-Hassanieh-Indyk-Katabi-Price-Shi’13, Indyk-K.-Price’14
Ï still lose Ω(loglogn) in sample complexity (number of iterations)Ï lose Ω((logn)d−1 loglogn) in higher dimensions
42 / 50
Iterative recovery
Input: x ∈Cn
y0 ← 0For t = 1 to L
Ï z ← PARTIALRECOVERY(x−yt−1) BTakes random samples of x −yÏ Update yt ← yt−1 + z
In most prior works sampling complexity is
samples per PARTIALRECOVERY step×number of iterations
Lots of work on carefully choosing filters, reducing number ofiterations:Hassanieh-Indyk-Katabi-Price’12,Ghazi-Hassanieh-Indyk-Katabi-Price-Shi’13, Indyk-K.-Price’14
Ï still lose Ω(loglogn) in sample complexity (number of iterations)Ï lose Ω((logn)d−1 loglogn) in higher dimensions
42 / 50
Iterative recovery
Input: x ∈Cn
y0 ← 0For t = 1 to L
Ï z ← PARTIALRECOVERY(x−yt−1) BTakes random samples of x −yÏ Update yt ← yt−1 + z
In most prior works sampling complexity is
samples per PARTIALRECOVERY step×number of iterations
Lots of work on carefully choosing filters, reducing number ofiterations:Hassanieh-Indyk-Katabi-Price’12,Ghazi-Hassanieh-Indyk-Katabi-Price-Shi’13, Indyk-K.-Price’14
Ï still lose Ω(loglogn) in sample complexity (number of iterations)Ï lose Ω((logn)d−1 loglogn) in higher dimensions
42 / 50
Iterative recovery
Input: x ∈Cn
y0 ← 0For t = 1 to L
Ï z ← PARTIALRECOVERY(x−yt−1) BTakes random samples of x −yÏ Update yt ← yt−1 + z
Our sampling complexity is
samples per PARTIALRECOVERY step×number of iterations
Do not use fresh randomness in each iteration! In generalchallenging: only one paper Bayati-Montanari’11 gives provable
guarantees, with Gaussians
Can use very simple filters!
43 / 50
Iterative recovery
Input: x ∈Cn
y0 ← 0For t = 1 to L
Ï z ← PARTIALRECOVERY(x−yt−1) BTakes random samples of x −yÏ Update yt ← yt−1 + z
Our sampling complexity is
samples per PARTIALRECOVERY step × number of iterations
Do not use fresh randomness in each iteration! In generalchallenging: only one paper Bayati-Montanari’11 gives provable
guarantees, with Gaussians
Can use very simple filters!
43 / 50
Iterative recovery
Input: x ∈Cn
y0 ← 0For t = 1 to L
Ï z ← PARTIALRECOVERY(x−yt−1) BTakes random samples of x −yÏ Update yt ← yt−1 + z
Our sampling complexity is
samples per PARTIALRECOVERY step × number of iterations
Do not use fresh randomness in each iteration! In generalchallenging: only one paper Bayati-Montanari’11 gives provable
guarantees, with Gaussians
Can use very simple filters!
43 / 50
Our filter=boxcar convolved with itself O(1) times
Filter support is O(k) (=samples per measurement)
O(k logn) samples in PARTIALRECOVERY step
−30 −20 −10 0 10 20 30−0.01
0
0.01
0.02
0.03
0.04
0.05
0.06
time
ma
gnitu
de
−1000 −800 −600 −400 −200 0 200 400 600 800 1000−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
frequency
ma
gnitu
de
Can choose a rather weak filter, but do not need fresh randomness
44 / 50
Our filter=boxcar convolved with itself O(1) times
Filter support is O(k) (=samples per measurement)
O(k logn) samples in PARTIALRECOVERY step
−30 −20 −10 0 10 20 300
0.01
0.02
0.03
0.04
0.05
0.06
time
ma
gnitu
de
−1000 −800 −600 −400 −200 0 200 400 600 800 10000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
frequency
ma
gnitu
de
Can choose a rather weak filter, but do not need fresh randomness
44 / 50
Our filter=boxcar convolved with itself O(1) times
Filter support is O(k) (=samples per measurement)
O(k logn) samples in PARTIALRECOVERY step
−30 −20 −10 0 10 20 300
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
time
ma
gnitu
de
−1000 −800 −600 −400 −200 0 200 400 600 800 1000−0.2
0
0.2
0.4
0.6
0.8
1
1.2
frequency
ma
gnitu
de
Can choose a rather weak filter, but do not need fresh randomness
44 / 50
G ←B∗B∗BLet ym ← (Pmx) ·G
m = 0, . . . ,M =C logn
z0 ← 0For t = 1, . . . ,T =O(logn):
For f ∈ [n]:wf ←median
y1
f , . . . , yMf
If |wf | < 2T−tµ/3 then
wf ← 0End
zt+1 = zt + wym ← ym − (Pmw) ·G
for m = 1, . . . ,MEnd
B Take samples of x
B Loop over thresholds
B Estimate, prune smallelements
B Update samples
Main challenge: lack of fresh randomness. Why does median work?
45 / 50
G ←B∗B∗BLet ym ← (Pmx) ·G
m = 0, . . . ,M =C logn
z0 ← 0For t = 1, . . . ,T =O(logn):
For f ∈ [n]:wf ←median
y1
f , . . . , yMf
If |wf | < 2T−tµ/3 then
wf ← 0End
zt+1 = zt + wym ← ym − (Pmw) ·G
for m = 1, . . . ,MEnd
µ
Main challenge: lack of fresh randomness. Why does median work?
45 / 50
G ←B∗B∗BLet ym ← (Pmx) ·G
m = 0, . . . ,M =C logn
z0 ← 0For t = 1, . . . ,T =O(logn):
For f ∈ [n]:wf ←median
y1
f , . . . , yMf
If |wf | < 2T−tµ/3 then
wf ← 0End
zt+1 = zt + wym ← ym − (Pmw) ·G
for m = 1, . . . ,MEnd
µ
Main challenge: lack of fresh randomness. Why does median work?
45 / 50
G ←B∗B∗BLet ym ← (Pmx) ·G
m = 0, . . . ,M =C logn
z0 ← 0For t = 1, . . . ,T =O(logn):
For f ∈ [n]:wf ←median
y1
f , . . . , yMf
If |wf | < 2T−tµ/3 then
wf ← 0End
zt+1 = zt + wym ← ym − (Pmw) ·G
for m = 1, . . . ,MEnd
µ
Main challenge: lack of fresh randomness. Why does median work?
45 / 50
G ←B∗B∗BLet ym ← (Pmx) ·G
m = 0, . . . ,M =C logn
z0 ← 0For t = 1, . . . ,T =O(logn):
For f ∈ [n]:wf ←median
y1
f , . . . , yMf
If |wf | < 2T−tµ/3 then
wf ← 0End
zt+1 = zt + wym ← ym − (Pmw) ·G
for m = 1, . . . ,MEnd
µ
Main challenge: lack of fresh randomness. Why does median work?
45 / 50
G ←B∗B∗BLet ym ← (Pmx) ·G
m = 0, . . . ,M =C logn
z0 ← 0For t = 1, . . . ,T =O(logn):
For f ∈ [n]:wf ←median
y1
f , . . . , yMf
If |wf | < 2T−tµ/3 then
wf ← 0End
zt+1 = zt + wym ← ym − (Pmw) ·G
for m = 1, . . . ,MEnd
µ
Main challenge: lack of fresh randomness. Why does median work?
45 / 50
Lecture so far
Ï Optimal sample complexity by reusing randomness
Ï Very simple algorithm, can be implemented
Ï Extension to higher dimensions: algorithm is the same,permutations are different.
Ï Choose random invertible linear transformation over Zdn
46 / 50
Experimental evaluation
Problem: recover support of a random k -sparse signal fromFourier
measurements.Parameters: n = 215, k = 10,20, . . . ,100Filter: boxcar filter with support k +1
47 / 50
Comparison to `1-minimization (SPGL1)
O(k log3 k logn) sample complexity, requires LP solve
sparsity
num
ber
of m
easu
rem
ents
n=32768, L1 minimization
10 20 30 40 50 60 70 80 90 100
500
1000
1500
2000
2500
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
sparsity
num
ber
of m
easu
rem
ents
n=32768, B=k, random phase, non−monotone
10 20 30 40 50 60 70 80 90 100
500
1000
1500
2000
2500
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Within a factor of 2 of `1 minimization
48 / 50
Open questions:
Ï O(k logn) in O(k log2 n) time?
Ï O(k logn) runtime?
Ï remove dependence on dimension? Current approaches loseCd in sample complexity, (logn)d in runtime
More on sparse FFT:http://groups.csail.mit.edu/netmit/sFFT/index.html
49 / 50
Open questions:
Ï O(k logn) in O(k log2 n) time?
Ï O(k logn) runtime?
Ï remove dependence on dimension? Current approaches loseCd in sample complexity, (logn)d in runtime
More on sparse FFT:http://groups.csail.mit.edu/netmit/sFFT/index.html
49 / 50
Top Related