L p -Sampling

download L p -Sampling

of 21

  • date post

    01-Jan-2016
  • Category

    Documents

  • view

    24
  • download

    0

Embed Size (px)

description

L p -Sampling. David Woodruff IBM Almaden Joint work with Morteza Monemizadeh TU Dortmund. Given a stream of updates (i, a) to coordinates i of an n -dimensional vector x |a| < poly(n) a is an integer stream length < poly(n) - PowerPoint PPT Presentation

Transcript of L p -Sampling

  • Lp-SamplingDavid WoodruffIBM Almaden

    Joint work with Morteza MonemizadehTU Dortmund

  • Given a stream of updates (i, a) to coordinates i of an n-dimensional vector x|a| < poly(n)a is an integerstream length < poly(n)

    Output i with probability |xi|p/Fp, where Fp = |x|pp = i=1n |xi|p

    Easy cases:p = 1 and updates all of the form (i, 1) for some i Solution: choose a random update in the stream, output the coordinate it updates [Alon, Matias, Szegedy]Generalizes to all positive updates

    p = 0 and there are no deletions Solution: min-wise hashing, hash all distinct coordinates as you see them, maintain the minimum hash and item [Broder, Charikar, Frieze, Mitzenmacher] [Indyk] [Cormode, Muthukrishnan]

  • Our main result

    For every 0 p 2, there is an algorithm that with probability n-100 fails, and otherwise outputs an I in [n] for which for all j in [n] Pr[I = j] = (1 )|xj|p/Fp

    Condition on every invocation succeeding in any poly(n)-time algorithm

    Algorithm is 1-pass, poly(-1 log n)-space and update time, and also returns wi = (1 )|xj|p/Fp

    Generalizes to 1-pass n1-2/ppoly(-1 log n)-space for p > 2

    additive-error samplers Pr[I = j] = |xj|p/Fp Fp given explicitly in [Jayram, W] implicitly in [Andoni, DoBa, Indyk, W]

  • Lp-sampling solves and unifies many well-studied streaming problems:

  • Solves Sampling with Deletions:

    [Cormode, Muthukrishnan, Rozenbaum] want importance sampling with deletions: maintain a sample i with probability |xi|/|x|1 Set p = 1 in our theorem

    [Chaudhuri, Motwani, Narasayya] ask to sample from the result of a SQL operation, e.g., self-join Set p = 2 in our theorem

    [Frahling, Indyk, Sohler] study maintaining approximate range spaces and costs of Euclidean spanning treesThey need and obtain a routine to sample a point from a set undergoing insertions and deletions Alternatively, set p = 0 in our theorem

  • Alternative solution to Heavy Hitters Problem for any Fp:

    Output all i for which |xi|p > Fp

    Do not output any i for which |xi|p < (/2) Fp

    Studied by Charikar, Chen, Cormode, Farach-Colton, Ganguly, Muthukrishnan, and many others

    Invoke our algorithm O~(1/) times, use approximations to values

    Optimal up to poly(-1 log n) factors

  • Solves Block Heavy Hitters: given an n x d matrix, return indices i of rows Ri with |Ri|pp > j |Rj|pp

    [Andoni, DoBa, Indyk] study the case p = 1

    Used by [Andoni, Indyk, Kraughtgamer] for constructing a small-size sketch for the Ulam metric under the edit distance

    Treat R as a big (nd)-dimensional vector

    Sample an entry (i, j) using our theorem for general p

    The probability a row i is sampled is |Ri|pp/ j |Rj|pp, so we can recover IDs of all the heavy rows.

    We do not use Cauchy random variables or Nisans pseudorandom generator, could be more practical than [ADI]

  • Alternative Solution to Fk-Estimation for any k 2:

    Optimal up to poly(-1 log n) factors

    Reduction given by [Coppersmith, Kumar]:

    Take r = O(n1-2/k) L2-samples wi1, , wir

    In parallel estimate F2, call it F2

    Output (F2/r) * j wijk-2

    Proof: second moment method

    First algorithm not to use Nisans pseudorandom generator

  • Solves Cascaded Moment Estimation:Given an n x d matrix A, Fk(Fp)(A) = j |Aj|pkp

    Problem initiated by [Cormode, Muthukrishnan] Show F2(F0)(A) uses O(n1/2) space if no deletionsAsk about complexity for other k and p

    For any p in [0,2], gives O(n1-1/k) space for Fk(Fp)(A)We get entry (i, j) with probability |Ai, j|p/ i, j |Ai, j|pProbability row Ai is returned is Fp(Ai)/ j Fp(Aj)If 2 passes allowed, take O(n1-1/k) samples Ai, in 1st pass, compute Fp(Ai) in 2nd pass, and feed into Fk AMS estimatorTo get 1 pass, feed row IDs into an O(n1-1/k)-space algorithm of [Jayram, W] for estimating Fk based only on item IDsAlgorithm is space-optimal [Jayram, W]

    Our theorem with p = 0 gives O(n1/2) space for F2(F0)(A) with deletions

  • Ok, so how does it work?

  • General Framework [Indyk, W]St = {i | |xi| in [t-1, t)} for = 1 + ()

    St contributes if |St|pt Fp(x), where = poly(/log n) assume p > 0 in talk

    Let h:[n] -> [n] be a hash functionCreate log n substreams Stream1, Stream2, , Streamlog nStreamj is stream restricted to updates (i, c) with h(i) n/2j

    Suppose 2j |St|. ThenStreamj contains about 1 item of St Fp(Streamj) Fp(x)/2j|St| pt Fp(x) means pt Fp(Streamj)Can find the item in St in Streamj with Fp-heavy hitters algorithmRepeat the sampling poly(-1log n) times, count number of times there was an item in Streamj from StUse this to estimate sizes of contributing St, and Fp(x) t |St|pt 1. Form streams by subsampling2. Run Heavy hitters algorithm on streams3. Use heavy hitters to estimate contributing St

  • Additive Error Sampler [Jayram, W]For contributing St, we also get poly(-1log n) items from the heavy hitters routine

    If the sub-sampling is sufficiently random (Nisans generator, min-wise independent), these items are random in St

    Since we have (1 )-approximations st to all contributing St, can:Choose a contributing t with probability stpt/t stptOutput a random heavy hitter found in St

    For item i in contributing St, Pr[i output] =[stpt/t stpt] 1/|St| = (1 )|xi|p/Fp

    For item i in non-contributing St,Pr[i output] = 0

  • Relative Error in WordsForce all classes to contributeInject additional coordinates in each class whose purpose is to make every class contributeInject just enough so that overall, Fp does not change by more than a (1+)-factorRun [Jayram, W]-sampling on resulting vectorIf the item sampled is an injected coordinate, forget about itRepeat many times in parallel and take the first repetition that is not an injected coordinateSince injected coordinates only contribute O() to Fp mass, small # of repetitions suffice

  • Some Minor PointsBefore seeing the stream, we dont know which classes contribute, so we inject coordinates into every classFor St = {i | |xi| in [t-1, t)}, inject (Fp/(pt # classes)) coordinates, where # classes = O(-1log n)Need to know Fp - just guess it, verify at end of stream

    For some classes, (Fp/(pt # classes)) < 1, e.g. if t is very large, so we cant inject any new coordinatesFind all elements in these classes and (1 )-approximations to their frequencies separately using a heavy hitters algorithmWhen sampling, either choose a heavy hitter with the appropriate probability, or select from contributing sets using [Jayram, W]

  • There is a ProblemThe [Jayram, W]-sampler fails with probability poly(/log n), in which case it can output any item

    This is due to some of the subroutines of [Indyk, W] that it relies on, which only succeed with this probability

    So the large poly(/log n) additive error is still there

    Cannot repeat [Jayram, W] multiple times for amplification, since we get a collection of samples, and no obvious way of detecting failureOn the other hand, could just repeat [Indyk, W] and take the median for the simpler Fk-estimation problem

    Our solution: Dig into the guts of the [Indyk, W] algorithmAmplify success probability to 1 n-100 of subroutines

  • A Technical Point About [Indyk, W]In [Indyk, W], Create log n substreams Streamj, where Streamj includes each coordinate independently with probability 2-jCan find the items in contributing St in Streamj with Fp-heavy hittersRepeat the sampling poly(-1log n) times, observe the fraction there is an item in Streamj from St

    Can use [Indyk, W] to estimate every |St| since every class contributesIssue of misclassificationSt = {i | |xi| in [t-1, t)}, and Fp-heavy hitters algorithm only reports approximate frequencies of items i it findsIf |xi| = t, it may be classified into St or St+1 it doesnt matterSimpler solution than in [Indyk, W]If item misclassified, just classify it consistently if we see it again Equivalent to sampling from x with |x|p = (1 )|x|p

    Can ensure with probability 1-n-100, we obtain st = (1 )|St| for all t

  • A Technical Point About [Jayram, W]Since we have st = (1 )|St| for all tChoose a class t with probability stpt/t stptOutput a random heavy hitter found in St

    How do we output a random item in St ?Min-wise independent hash function hFor each i in St, h(i) = minj in St h(j) with probability (1 )/|St|h can be an O(log 1/)-wise independent hash function We recover i* in St for which h(i*) is minimumCompatible with sub-sampling, where Streamj is items i for which h(i) n/2jOur goal is to recover i* with probability 1-n-100We have st, and look at the level j* where |St|/2j* = (log n) If h is O(log n)-wise independent, then with probability 1-n-100, i* is in Streamj*A worry: maybe Fp(Streamj*) >> Fp(x)/2j* so Heavy Hitter algorithm doesnt workCan be resolved with enough independent repetitions

  • Beyond the Moraines: Sampling RecordsGiven an n x d matrix M of rows M1, , Mn, sample i with probability |Mi|X/j |Mj|X, where X is a normIf i sampled, return a vector v for which |v|X = (1 )|Mi|XApplicationsEstimating planar EMD [Andoni, DoBa, Indyk, W]Sampling records in a relational databaseDefine classes St = {i | |Mi|X in [t-1, t)} for = 1 + ()If we have a heavy hitters algorithm for rows of a matrix, then we can apply a similar approach as beforeSpace should be dpoly(-1log n)

  • Heavy Hitters for RowsAlgorithm in [Andoni, DoBa, Indyk, W]Partition rows into B bucketsIn each bucket maintain the vector sum of rows hashed to it

    If |Mi|X > j |Mj|X, and if v is the vector in the bucket containing Mi, by the triangle inequality|v|X < |Mi|X + |Noise|X |Mi|X + j |Mj|X/B|v|X > |Mi|X - |Noise|X |Mi|X j |Mj|X/B