Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna...

46
Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy

Transcript of Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna...

Page 1: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Space-Efficient Online Computation of Quantile Summaries

Michael Greenwald & Sanjeev Khanna

University of Pennsylvania

Presented by nir levy

Page 2: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Introduction• The problem

We introduced a very large data sets and we wish to compute Φ-quatiles in a single pass using space-efficient computation .

• Def: The Φ-quantiles of an ordered sequence of N data items is the value with rank ΦN. (the element in the ΦN position)

• We are going to see an online algorithm for computing ε-approximate quatile summaries of a very large data sequence.

• Def: An ε-approximate quantile summaries of a sequence of N elements is a data structure that can answer quantile queries about the sequence to within a precision of εN.

• Def: A quantile summary consists of a small number of points from the input data sequence, and uses those quantile estimates to give approximate responses to any

arbitrary quantile query.

Page 3: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Introduction cont…• EXAMPLE• Input data: 14, 2, 12, 5, 6, 19, 1, 14, 4, 9, 12, 3, 8, 11, 15, 4.

Ordered: 19, 15, 14, 14, 12, 12, 11, 9, 8, 6, 5, 4, 4, 3, 2, 1Rank: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16

what is the 2nd biggest number? (15) What is 25%th number? (14)

Summary: 19, 14, 11, 6, 4, 1 rank: 1 4 7 10 13 16

what is the 2nd biggest number? 2nd 1st (19) What is 25%th number? 16*0.25=4 4th (14)

Page 4: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Quantile estimation for Database Applications

• Estimate the size of intermediate results, to allow query optimizers to estimate the cost of competing plans to resolve database queries.

• Partition data into roughly equal partitions for parallel database.

• Prevent expensive and incorrect queries from being issued By estimate results sizes and give feedback to the users

• Characterize the distribution of real world data sets for database users.

Page 5: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Properties

Properties for quantile estimators

1. provide tunable and explicit guarantees on the precision of the approximation.

That is, for any given rank r, an ε-approximate quantile summary return a value whose rank r’ is guaranteed to be within the interval [r-εN , r+εN].

2. be data independent. That is, neither affected by the arrival order or distribution of the values nor should it require a priori knowledge of the size of the dataset.

3. execute in a single pass over the data.

4. have as small of memory footprints as possible (apply to temporary storage during the computation)

Page 6: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Previous Work

• Mnku, Rajagopalan and Lindsay presented single-pass algorithm, Ɛ-approximate quantile summary, requires O(1/ε * log2(εN) space but need and advanced knowledge of N ( otherwise they provide a probabilistic guarantee on the precision) (MRL).

• Gibson, Matis and Poosala presented multiple pass algorithm with probabilistic guarantee

• Munro and Paterson showed that any algorithm that exactly compute Φ-quantile in in only P passes requires a space of (N1/p)

Page 7: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

This algorithm

• present a worse-case space requirement of O(1/Ɛ*log (ƐN)), thus improving upon the previous best result of O(1/Ɛ*log2(ƐN)).

• in contrast to earlier algorithms, the algorithm doesn’t require a priori knowledge of the length of the input sequence

• based on a novel data structure that effectively maintains the range of possible ranks for each quantile that they store.

• The behavior is based on the fact that no input sequence can be “bad” across the entire distribution that is, the input sequence cannot present new observations that must be stored without deleting old stored observations.

Page 8: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

The Data Structure• Assume w.l.og. That every new observation arrives after each unit of

time.

• Denote n to be the number of observation seen so far as well as the current time.

• Denote ε to be the given precision requirement

• Denote S=S(n) to be the summary data structure at all time. S(n) consists of an ordered sequence elements corresponding to a

subset of the observations seen thus far

• For each observation v in S, maintain an implicit bound on the minimum and the maximum possible rank of v among the first n observations. (Denote by Rmin(v) and Rmax(v))

Page 9: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Data structure cont…

• More formally

let S(n) be the set of tuples t0,t1,…,ts-1 where ti=(Vi,gi,∆i)

Vi – is one of the elements for the data stream

gi – is equal Rmin(Vi) - Rmin(Vi-1)

∆I – is equal Rmax(Vi) - Rmin(Vi)

• ∑j<=I gj = Rmin(Vi) - Rmin(Vi-1) + Rmin(Vi-1) - Rmin(Vi+2) +...+ Rmin(V1)- Rmin(V0)= Rmin(Vi)

• (∑j<=I gi)+∆I = Rmax(Vi) - Rmin(Vi) + Rmin(Vi) = Rmax(Vi)

Page 10: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Data structure cont…

• At all time ensure that V0 and Vs-1 correspond to the minimum and maximum element seen so far.

• gi+∆i-1 is the upper bound on the total number of observations that may have fallen between vi and vi-1

• ∑i gi is the number of observations seen so far

Page 11: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Answering Quantile Queries

• Proposition 1: Given a quantile summary S in the above form a Φ-quantile can

always be identified to within an error of MAXi(gi+∆i)/2.

Proof. let r= Φn and let e=MAXi(gi+∆i)/2.

- search for an index i such that r-e <= Rmin(Vi) and Rmax(vi)<= r+e

V0 Vs-1

ΦnVi

Maxi(gi+∆i)

Rmin(Vi) Rmax(Vi)

vi approximates the Φ-quantile within the claimed error bound.

Page 12: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Answering Quantile Queries cont…

All is left to see is that such an index I must always exist.

V0

r Vs-1n-e

Consider the case r>n-e

We have Rmin(Vs-1)=Rmax(Vs-1)=n and therefore i=s-1 is valid

Otherwise r<=n-e

Choose the smallest j such Rmax(Vj)>r+e it follows that Rmin(Vj-1)>=r-e

Since for Rmin(Vj-1)<r-e we get Rmax(Vj)=Rmin(Vj-1)+gj+∆j > Rmin(Vj-1)+2e

r Vs-1r+er-eRmin(Vj-1)

Rmax(Vj)

V0

Contradiction to the assumption that e=MAXi(gi+∆i)/2

Page 13: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Answering Quantile Queries cont…

• By assumption Rmax(Vj-1)<=r+e therefore j-1 is an example of an index i with the desired property.

• Corollary 1

if at any time n, the summery S(n) satisfied the property that

MAXi(gi+∆i) <=2εn, then we can answer any Φ-quantile query to within an εn precision.

Page 14: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Data structure cont…

At high level• On a new observation – insert in the summary a tuple corresponding to this

observation.

• Periodically, perform a sweep over the summary to “merge” some of the tuples into their neighbors so as to free space

• Maintain several condition in order to bound the space used by S at any time.

• By corollary 1 in suffice to ensure that at all time MAXi(gi+∆i) <=2εn.

• Def: An individual tuple is full if gi+∆i=2εn.

• Def: The capacity of an individual tuple is the maximum number of observations that can be counted by gi before the tuple become full

Page 15: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

BANDS• General strategy: delete tuples with small capacities and preserve tuples with

large capacities.

• In the merge phase, free up space by merging tuples with small capacities into tuples with “similar” or larger capacities.

• We say , two tuples ti and tj have similar capacities, if

log capacity(ti) log capacity(tj)

• This notion of similarity partition the possible values of ∆ into Bands

• we try to divide the ∆’s in bands that lie between elements of

0, ½(2εn), ¾(2εn),…..((2i-1)/2i)(2εn),…, 2εn-1, 2εn

• this boundaries correspond to capacities of 2εn, εn, 1/2εn,…,(1/2i)εn,..8,4,2,1

Page 16: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

BANDS cont…

• Define bandα to be the set of all ∆ such that :

p - 2α - (p mod 2α) < ∆ <= p - 2α-1 – (p mod 2α-1)

where

p=2εn and α = 1 .. log(2εn)

• The above definition ensure that if two ∆s are ever in the same band, they never appear in different bands as n increases

• Define band0 simply to be p

• Consider the first 1/2ε observations, with ∆ = 0 to be in a band of their own.

Page 17: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

BANDS cont…

• Example• Consider ε=1/8.

a b c d e f g• ∆= 0,0,0,0,1,1,1,1,2,2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6

• N=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28

5..89..1213..1617..2021..2425..28

Band0{b}{c}{d}{e}{f}{g}

Band1{b}{b,c}{d}{d,e}{f}

Band2{b,c}{b,c}{b,c,d,e}

Page 18: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

BANDS cont…

Page 19: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

BANDS cont…

• Proposition 2: at any point in time n and for any α>=1 bandα(n) contains either 2α or 2α-1

distinct value of ∆.

PROOFaccording to the upper and lower bounds of bandα

2εn - 2α - (2εn mod 2α) < ∆ <= 2εn - 2α-1 – (2εn mod 2α-1)

If ( 2εn mod 2α ) < 2α-1 then ( 2εn mod 2α ) = ( 2εn mod 2α-1)

|bandα| = 2α - 2α-1 = 2α-1 distinct values of ∆

If ( 2εn mod 2α ) >= 2α-1 then ( 2εn mod 2α ) = 2α-1 + ( 2εn mod 2α-1)

|bandα| = 2α-1 + 2α-1 = 2α distinct values of ∆

Page 20: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

A tree representation

• For S = t0, t1, ….,ts-1 Impose a tree structure T over the tuples of S.

• Assign a special root node R

• for every tuple ti assign a node Vi

• The parent of every node Vi is the node Vj such that j is the least index greater than i with band(tj) > band(ti). If no such j exist than set R to be the parent.

• All children (and all descendants) of a given node Vi have ∆ values larger than ∆I .

Page 21: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

A tree representation

• Proposition 4:

for any node V, the set of all its descendants in T form a contiguous segment in S

• Proposition 3:

the children of any node in T are always arranged in non-increasing order of band in S

Page 22: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Operations• To compute ε-approximate Φ-quantile from S(n) after n observations

• During the operations we wish to maintain correct relationship between gi , ∆I , Rmin and Rmax

• QUANTILE(Φ): compute the rank r=Φn find i such that: r-Rmin(Vi)<= εn and Rmax(Vi)-r<=εn return Vi .

• INSERT(V): find the smallest i such that: Vi-1<= V <Vi and insert the tuple (V,1,2εn) between ti-1 and ti . If V is the new minimum or maximum seen, then insert (v,1,0)

Page 23: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Operations Cont…

• INSERT(V) maintains maintain correct relationship between gi , ∆I , Rmin and Rmax

• If V is inserted before Vi the value of Rmin(V) may be as small as Rmin(Vi-1)+1

similarly Rmax(V) may be as large as the current Rmax(Vi) which is bounded by 2εn.

• Note that Rmin(Vi) and Rmax(Vi) get increased by 1 after insertion.

Page 24: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Operations Cont…

• DELETE(Vi): replace the tuple (Vi,gi,∆i) and (Vi+1,gi+1,∆i+1)

with the new tuple (Vi+1,gi+gi+1,∆i+1).

• Deleting Vi has no effect on Rmin(Vi+1) Rmax(Vi+1) so it should simply preserve them.

• The relationship between Rmin(Vi+1) and Rmax(Vi+1) is preserved as long as ∆i+1 is unchanged .

• since Rmin(Vi+1) = ∑j<=I+1 gi and we deleted gi we must increase gi+1 by gi to keep Rmin(Vi+1).

Page 25: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

COMPRESS

• The operation COMPRESS tries to merge together a node and all its descendents into either its parent node or into its right sibling (by deleting them).

• During compress we must ensure that the tuple results after the merging is not full

• Two adjacent tuples ti,ti+1are mergeable if the resulting tuple is not full and band(ti,n)<=band(ti+1,n).

• Note that pair of tuples that are not mergeable at some point in time may be come so at later point as the term 2εn increases over time.

• Let gi* denote the sum of g-values of tuple ti and all it’s descendents in T .

Page 26: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Operations Cont…

• COMPRESS()

for i from s-2 to 0 do

if(BAND(Δi,2Ɛn) ≤BAND(Δi+1,2Ɛn))

&& (gi*+gi+1+ Δi+1< 2Ɛn) then

delete all descendants of ti and the tuple ti itself

end if

end for

• Compress inspect tuples from right (highest index) to left. it first combine children (and all their subtree of descendents) into their parents and only when the parent is full it combine children.

Page 27: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Operations Cont…

• Initial State S Φ; s=0; n=0.

AlgorithmTo add the n+1st observation, v, to summary S(n):

if(n≡ 0 mod 1/(2Ɛ) ) then COMPRESS(); end if INSERT(v); n=n+1;

Page 28: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Analysis

• The insert and compress operations always ensure that gi+∆i<=2εn

• We will see now that the total number of tuples in the summary S(n) is bounded by (11/(2ε) * log (2εn)).

• Def: coverage – we say that a tuple ti in S(n) covers an observation v at any time n if either the tuple for v had been directly merged into ti or a tuple t that covered v has been merged into ti .

• A tuple always cover itself.

• It is easy to see that the number of observations covered by ti is exactly given by gi=gi(n)

Page 29: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Analysis Cont…

• Lemma 1:

At no point in time a tuple from band α covers an observation from a band > α.

• Lemma 2:

At any point in time n, and for any integer α, the total number of observations covered cumulatively by all tuples with band value in [0..α] is bounded by 2α/ε .

• Lemma 3:

At any time n and for any given α, there are at most 3/2ε nodes in T(n) that have a child with band value of α. That is, there are at most 3/2ε parents of nodes from bandα(n)

Page 30: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Analysis Cont…• PROOF of lemma 4• Let mmin,mmax denote the earliest and the latest time at which a node

from bandα could be seen.• mmin=(2εn-2α-(2εn mod 2α))/2ε• mmax=(2εn-2α-1-(2εn mod 2α-1))/2ε• Choose a child parent pair (Vi,Vj) Vj is in bandα

• Since Vj exist we can show that:

nngngn ii

i

jkk 2)()()(g

1

1

*j

•Since at time mj (when Vj showed up) we had:gi(mj)+∆i<2εmj

Page 31: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Analysis Cont…

• Since for all pairs (v’i,v’j) we have distinct observations

• The number of observations that came after mmin is n-mmin

• We get (n-mmin)/(2ε*(n-mmax))=3/(2ε)

)(2))()(()()(g max

1

1

*j mnmgngngn jii

i

jkk

•Since mj is at most mmax

Page 32: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Analysis Cont…

• Def: Given a full pair of tuples (ti-1,ti), we say that a tuple ti-1 is left partner and ti is right partner in this full pair.

• Lemma 4:

At any time n and for any given α, there are at most 4/ε tuples from bandα(n) that are right partners in a full tuple pair.

• PROOF

• Let ti,ti+1, ,ti+p-1 be the longest contiguous segment of tuples from

bandα(n) in S(n).

• Since they existed after the compress operation in must be the case

g*j-1

+gj+∆j>2εn for all i<=j<i+p

Page 33: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Analysis Cont…

• Summing over all j

npg

therefore

npgg

p

ijj

pi

ijj

p

ijj

p

ijj

pi

ijj

2*2

,

2

111

1

*

11111*1

•According to lemma 2 the first term is bounded by 2α+1/ε

•The second term is bounded by p(2εn-2α-1)

•Summing the two bounds we get p<4/ε

Page 34: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Analysis Cont…• for non- contiguous segments just consider the above summations over all

such segments

• Lemma 5: At any time n and for any given α, the maximum number of tuples

possible from each bandα(n) is 11/2ε .

• Proof• Each node of bandα(n) is either:

1. a right partner in a full pair2. a left partner in a full pair3. not participate in any full pair

• The first case is bounded by 4/ε ( lemma 4)• The last two are bounded by 3/2ε• And the claim follow.

Page 35: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Analysis Cont…

• Theorem 1:

At any time n, the number of tuples stored in S(n) is at most

(11/(2ε) * log (2εn)). • PROOF• There are at most 1+log(2εn) bands at time n• Summing over their sizes we get (11/(2ε) * log (2εn)).

Page 36: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Experiments results

• The experiments were done on 3 different classes of input data

1. Hard Case.- an adversarial manner data sequence that is, place the next observation in the largest current “gap” of the quantile summary.

2. sorted input data.- the data arrives in sorted order.

3. random input data.- select each datum by selecting an element (without replacement) from a uniform distribution of all remaining elements in the data set

Page 37: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Experiments results cont…

• Sorted and random input data are used after the MRL experimental results

• Random input data can give an insight to the behavior of the algorithm on “average” inputs.

• In general, the algorithm used less space than indicated by the analysis. And turned out to be better than the MRL’s space requirement.

Page 38: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Experiments results cont…

• For each case we have 2 different kind of experiment:

1. Adaptive – the regular algorithm ( with a slight variation)

2. Pre-allocated – used the same space as used in the MRL

• We will see that in the later case the observed error is significantly better then the one of the MRL.

• differences in the algorithm used for the experiment :

1. An observation is inserted as a tuple (v,1,gi+Δi-1) and not (v,1,2Ɛn).

the latter is strictly to simplify theoretical analysis.

2. Rather than running the COMPRESS after every 1/2ε observations

for each observation inserted one tuple was deleted when possible.

if no tulpe could be deleted without making is successor full the size of S grew by 1.

Page 39: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Experiments results cont…

• We apply the following measurements:

1. The maximum space used to produce the summary –counting the number of stored tuples ( multiple by 3 for comparison with MRL to account the Rmin and Rmax values stored in each tuple )

2. The observed precision of the results.

Page 40: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Experiments results cont…

• HARD INPUT

•The required number of quantile is approximately a factor of 11 less than the worst case bound of the analysis

•We almost always require less space than the MRL.

•The only exception is in epsilon=.001 and N=105 where MRL require less space

Page 41: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Experiments results cont…

• SORTED INPUT• Fix ε=.001 and construct summaries of sorted sequences of size 105,106 and 107

• Sample 15 quantiles at (qi/16)*N for qi=[1..15] and compute the maximum error over all possible quantile queries.

• Compare 3 algorithms:

1. MRL – preallocated the storage required by MRL as a function of N and ε.

2. pre-allocated – using 1/3 as many stored quantiles as MRL.

3. adaptive – storage allocated for new quantile only if no quantile could be deleted without exceeding a precision of .001n

Page 42: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Experiments results cont…

•|S| - the number of stored quantiles need to achieve the desired precision

•Max ε-the maximum error of all possible quantile queries of the summaries

•The remaining rows lists the approximation error of the response to the query for the qi/16th quantile.

Page 43: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Experiments results cont…

• RANDOM INPUT• Same measurements as in the sorted input (ε and sequence length)

• Run each experiment 50 times and report the max, min, mean and std for every measurement.

Page 44: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Experiments results cont…

Page 45: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Experiments results cont…

Page 46: Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.

Conclusions

• Improves upon the earlier results in two significant ways:

1. It improves the space complexity by a factor of Ω (log(εN)).

2. It doesn’t require a priori knowledge of the parameter N – that is, it allocates more space dynamically as the data sequence grows in size.