1+eps-Approximate Sparse Recovery
Embed Size (px)
description
Transcript of 1+eps-Approximate Sparse Recovery

1+eps-Approximate Sparse Recovery
Eric PriceMIT
David WoodruffIBM Almaden

Compressed Sensing• Choose an r x n matrix A• Given x 2 Rn
• Compute Ax• Output a vector y so that
|x-y|p · (1+ε) |x-xtop k|p
• xtop k is the k-sparse vector of largest magnitude coefficients of x
• p = 1 or p = 2• Minimize number r = r(n, k, ε) of “measurements”
PrA[ ] > 2/3

Previous Work
• p = 1[IR, …] r = O(k log(n/k) / ε) (deterministic A)
• p = 2[GLPS] r = O(k log(n/k) / ε)
In both cases, r = (k log(n/k)) [DIPW]What is the dependence on ε?

Why 1+ε is Important
• Suppose x = ei + u– ei = (0, 0, …, 0, 1, 0, …, 0)
– u is a random unit vector orthogonal to ei
• Consider y = 0n
– |x-y|2 = |x|2 · 21/2 ¢ |x-ei|2It’s a trivial solution!
• (1+ε)-approximate recovery fixes this
In some applications, can have 1/ε = 100, log n = 32

Our Results Vs. Previous Work• p = 1[IR, …] r = O(k log(n/k) / ε) r = O(k log(n/k) ¢ log2(1/ ε) / ε1/2) (randomized) r = (k log(1/ε) / ε1/2)
• p = 2:[GLPS] r = O(k log(n/k) / ε) r = (k log(n/k) / ε)
Previous lower bounds (k log(n/k))Lower bounds for randomized constant probability

Comparison to Deterministic Schemes
• We get r = O~(k/ε1/2) randomized upper bound for p = 1
• We show (k log (n/k) /ε) for p = 1 for deterministic schemes
• So randomized easier than deterministic

Our Sparse-Output Results• Output a vector y from Ax so that
|x-y|p · (1+ε) |x-xtop k|p
• Sometimes want y to be k-sparser = (k/εp)
• Both results tight up to logarithmic factors
• Recall that for non-sparse output r = £~(k/εp/2)

Talk Outline
1. O~(k / ε1/2) upper bound for p = 1
2. Lower bounds

Simplifications• Want O~(k/ε1/2) for p = 1• Replace k with 1
– Sample 1/k fraction of coordinates– Solve the problem for k = 1 on the sample– Repeat O~(k) times independently– Combine the solutions found
ε/k, ε/k, …, ε/k, 1/n, 1/n, …, 1/n
ε/k, 1/n, …, 1/n

k = 1• Assume |x-xtop|1 = 1, and xtop = ε• First attempt
– Use CountMin [CM]– Randomly partition coordinates into B buckets, maintain
sum in each bucket
Σi s.t. h(i) = 2 xi
• The expected l1-mass of “noise” in a bucket is 1/B• If B = £(1/ε), most buckets have count < ε/2, but bucket
that contains xtop has count > ε/2• Repeat O(log n) times

Second Attempt• But we wanted O~(1/ε1/2) measurements
• Error in a bucket is 1/B, need B ¼ 1/ε
• What about CountSketch? [CCF-C]– Give each coordinate i a random ¾(i) 2 {-1,1}– Randomly partition coordinates into B buckets,
maintain Σi s.t. h(i) = j ¾(i)¢xi in j-th bucket
– Bucket error is (Σi top xi2 / B)1/2
– Is this better?
Σi s.t. h(i) = 2 ¾(i)¢xi

CountSketch• Bucket error Err = (Σ i top xi
2 / B)1/2
• All |xi| · ε and |x-xtop|1 = 1
• Σi top xi2 · 1/ ε ¢ ε2 · ε
• So Err · (ε/B)1/2 which needs to be at most ε
• Solving, B ¸ 1/ ε
• CountSketch isn’t better than CountMin

Main Idea
• We insist on using CountSketch with B = 1/ε1/2
• Suppose Err = (Σ i top xi2 / B)1/2 = ε
• This means Σ i top xi2 = ε3/2
• Forget about xtop !
• Let’s make up the mass another way

Main Idea• We have: Σ i top xi
2 = ε3/2
• Intuition: suppose all xi, i top, are the same or 0
• Then: (# non-zero)*value = 1 (# non-zero)*value2 = ε3/2
• Hence, value = ε3/2 and # non-zero = 1/ε3/2
• Sample ε-fraction of coordinates uniformly at random!– value = ε3/2 and # non-zero sampled = 1/ε1/2, so l1-contribution = ε– Find all non-zeros with O~(1/ε1/2) measurements

General Setting• Σ i top xi
2 = ε3/2
• Sj = {i | 1/4j < xi2 · 1/4j-1}
• Σ i top xi2 = ε3/2 implies there is a j for which |Sj|/4j = ~(ε3/2)
ε3/2 , …, ε3/2
4ε3/2 , …, 4ε3/2
16ε3/2 , …, 16ε3/2
…ε3/4

General Setting• If |Sj| < 1/ε1/2, then 1/4j > ε2, so 1/2j > ε, can’t happen
• Else, sample at rate 1/(|Sj| ε1/2) to get 1/ε1/2 elements of |Sj|
• l1-mass of |Sj| in sample is > ε
• Can we find the sampled elements of Sj? Use Σ i top xi2 = ε3/2
• The l22 of the sample is about ε3/2 ¢ 1/(|Sj| ε1/2) = ε/|Sj|
• Using CountSketch with 1/ε1/2 buckets: Bucket error = sqrt{ε1/2¢ε3/2 ¢1/(|Sj| ε1/2)}
= sqrt{ε3/2/|Sj|} < 1/2j since |Sj|/4j > ε3/2

Algorithm Wrapup• Sub-sample O(log 1/ε) times in powers of 2
• In each level of sub-sampling maintain CountSketch with O~(1/ε1/2) buckets
• Find as many heavy coordinates as you can!
• Intuition: if CountSketch fails, there are many heavy elements that can be found by sub-sampling
• Wouldn’t work for CountMin as bucket error could be ε because of n-1 items each of value ε/(n-1)

Talk Outline
1. O~(k / ε1/2) upper bound for p = 1
2. Lower bounds

Our Results
• General results:– ~(k / ε1/2) for p = 1– (k log(n/k) / ε) for p = 2
• Sparse output:– ~(k/ε) for p = 1– ~(k/ε2) for p = 2
• Deterministic:– (k log(n/k) / ε) for p = 1

Simultaneous Communication Complexity
Alice Bob
x
• Alice and Bob send a single message to the referee who outputs f(x,y) with constant probability
• Communication cost CC(f) is maximum message length, over randomness of protocol and all possible inputs
• Parties share randomness
What is f(x,y)?y
MA(x) MB(y)

• Shared randomness decides matrix A• Alice sends Ax to referee• Bob sends Ay to referee• Referee computes A(x+y), uses
compressed sensing recovery algorithm• If output of algorithm solves f(x,y), then
# rows of A * # bits per measurement > CC(f)
Reduction to Compressed Sensing

A Unified View• General results: Direct-Sum Gap-l1
– ~(k / ε1/2) for p = 1 – ~(k / ε) for p = 2
• Sparse output: Indexing– ~(k/ε) for p = 1– ~(k/ε2) for p = 2
• Deterministic: Equality– (k log(n/k) / ε) for p = 1
Tighter log factors achievable by looking at Gaussian channels

General Results: k = 1, p = 1• Alice and Bob have x, y, respectively, in Rm
• There is a unique i* for which (x+y)i* = d For all j i*, (x+y)j 2 {0, c, -c}, where |c| < |d|
• Finding i* requires (m/(d/c)2) communication [SS, BJKS]
• m = 1/ε3/2, c = ε3/2 , d = ε• Need (1/ε1/2) communication

General Results: k = 1, p = 1• But the compressed sensing algorithm doesn’t
need to find i*
• If not then it needs to transmit a lot of information about the tail– Tail a random low-weight vector in {0, ε3/2, - ε3/2}1/ε3
– Uses distributional lower bound and RS codes
• Send a vector y within 1-ε of tail in l1-norm
• Needs 1/ε1/2 communication

General Results: k = 1, p = 2
• Same argument, different parameters
• (1/ε) communication
• What about general k?

Handling General k• Bounded Round Direct Sum Theorem [BR]
(with slight modification) given k copies of a function f, with input pairs independently drawn from ¹, solving a 2/3 fraction needs communication (k¢CC¹ (f))
ε3/2 , …, ε3/2ε1/2
ε1/2
…
ε1/2
ε3/2 , …, ε3/2
ε3/2 , …, ε3/2
} k
Instance for p = 1

Handling General k
• CC = (k/ε1/2) for p = 1
• CC = (k/ε) for p = 2
• What is implied about compressed sensing?

Rounding Matrices [DIPW]• A is a matrix of real numbers
• Can assume orthonormal rows
• Round the entries of A to O(log n) bits, obtaining matrix A’
• Careful– A’x = A(x+s) for “small” s– But s depends on A, no guarantee recovery works– Can be fixed by looking at A(x+s+u) for random u

Lower Bounds for Compressed Sensing
• # rows of A * # bits per measurement > CC(f)
• By rounding, # bits per measurement = O(log n)
• In our hard instances, universe size = poly(k/ε)
• So # rows of A * O(log (k/ε)) > CC(f)
• # rows of A = ~(k/ε1/2) for p = 1• # rows of A = ~(k/ε) for p = 2

Sparse-Output Results
Sparse output: Indexing
– ~(k/ε) for p = 1
– ~(k/ε2) for p = 2

Sparse Output Results - Indexing
x 2 {0,1}n i 2 {1, 2, …, n}
What is xi?
CC(Indexing) = (n)

(1/ε) Bound for k=1, p = 1
x 2 {- ε, ε}1/ε y = ei
• Consider x+y
• If output is required to be 1-sparse must place mass on the i-th coordinate
• Mass must be 1+ε if xi = ε, otherwise 1-ε
Generalizes to k > 1 to give ~(k/ε)
Generalizes to p = 2 to give ~(k/ε2)

Deterministic Results
Deterministic: Equality– (k log(n/k) / ε) for p = 1

Deterministic Results - Equality
x 2 {0,1}n
Is x = y?
Deterministic CC(Equality) = (n)
y 2 {0,1}n

(k log(n/k) / ε) for p = 1
Choose log n signals x1, …, xlog n, each with k/ε values equal to ε/k
x = Σi=1log n 10i xi
Choose log n signals y1, …, ylog n, each with k/ε values equal to ε/k
y = Σi=1log n 10i yi
Consider x-yCompressed sensing output is 0n iff x = y

General Results – Gaussian Channels (k = 1, p = 2)
• Alice has a signal x =ε1/2 ei for random i 2 [n]
• Alice transmits x over a noisy channel with independent N(0, 1/n) noise on each coordinate
• Consider any row vector a of A
• Channel output = <a,x> + <a,y>, where <a,y> is N(0, |a|22/n)
• Ei[<a,x>2] = ε |a|22/n
• Shannon-Hartley Theorem: I(i; <a,x>+<a,y>) = I(<a,x>; <a,x>+<a,y>) · ½ log(1+ ε) = O(ε)

Summary of Results
• General results– £~(k/εp/2)
• Sparse output– £~(k/εp)
• Deterministic– £(k log(n/k) / ε) for p = 1