Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

42
1 Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

description

Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565. 1. 1. 0. 0. 1. 0. 1. 1. 1. 0. 1. Streams Here, There, Everywhere!. Network Traffic Engineering. Call Record Analysis. Sensor Data Analysis. Medical, Financial Monitoring. Etc, etc, etc. - PowerPoint PPT Presentation

Transcript of Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

Page 1: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

1

Approximating Quantiles over Sliding Windows

Srimathi Harinarayanan

CMPS 565

Page 2: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

2

Streams Here, There, Everywhere!

10

11

1

0

1

0

0

1

1

Network Traffic Engineering.

Call Record Analysis.

Sensor Data Analysis.

Medical, Financial Monitoring.

Etc, etc, etc.

Page 3: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

3

Problem Definition Data Stream Environment

One Pass

Data element is a value

Φ-quantile ( [0,1) )The element with rank Ceiling (ΦN) of an ordered sequence of N data elements.

Page 4: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

4

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15

12

10

11

10

1 10

11

9 6 7 8 11

4 5 2 3

sort

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10, 11, 11, 11, 12

N = 16

0.5 quantile returns element ranked 8 ( 0.5*16)

which is 8

0.75 quantile returns element ranked 12 (0.75*16)

which is 10

Page 5: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

5

3 Models Data Stream Model

Computing Φ-quantile for all the data items seen so far

Sliding Window Model Computing Φ-quantile against the N most

recent elements in a data stream seen so far n of N Model

For any n of N, computing Φ-quantile among the n most recent elements in a data stream seen so far

Page 6: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

6

Sliding Window Model

….1 0 1 0 0 0 1 0 1 1 1 1 1 1 0 0 0 1 0 1 0 0 1 1…

Time Increases

Current Time

Window Size = N

• Most Recent N Elements

Page 7: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

7

Sliding window model

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15

12 10 11 10 1 10 11 9 6 7 8 11 4 5 2 3

1 6 7 8 9 10 10 10 11 11 11 12

1 2 3 4 5 6 7 8 9 10 11 11

Window size = 12 , 0.5-quantile returns 10 at time t11

0.5-quantile returns 6 at time t15

Page 8: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

8

n-of-N model

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15

12 10 11 10 1 10 11 9 6 7 8 11 4 5 2 3

1 6 7 8 9 10 11 11

2 3 4 5

N = 12, 0.5-quantile returns 8 at time t11 for n = 8,

0.5-quantile returns 3 at time t15 for n = 4

Page 9: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

9

Applications - Sliding Window Model in Data Streams

Useful for Network Traffic Management, Sensor Data.

To find out Top Ranked Web pages from Most Recently accessed N pages

In the financial market, investors are often interested in finding out the most recent N bids.

Page 10: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

10

Previous Work on Approximating Quantiles in One Scan of Data G. S. Manku, S. Rajagopalan, and B. G. Lindsay.

Approximate medians and other quantiles in one pass and with limited memory [1/1/єє log² log²єєN]N]

G. S. Manku, S. Rajagopalan, and B. G. Lindsay. Random sampling techniques for space efficient online computation of order statistics of large datasets.

M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. [1/1/єє log log єєN] {GK N] {GK Algorithm}Algorithm}

GK Algorithm MOST EFFICIENT OWING TO LEAST SPACE USAGE + does not require advance knowledge of N

Page 11: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

11

Definitions -Quantile: A -quantile ((0,1]) of an ordered

sequence of N data elements is the element with rank N .

Quantile Query: Given , find the data element with rank N among all elements in the stream. Variation: N recent elements (sliding window

model).

(-approximate): Find the element with rank r within the interval [r-N, r+N].

Page 12: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

12

Computation of Quantile Summaries over Sliding Windows – 2 Methods

Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream, Xuemin Lin, Hongjun Lu, Jian Xu, Jeffrey Xu Yu, 2004 IEEE

Approximating frequency counts and quantiles using sliding window model, Arvind Arasu, Gurmeet Singh Manku,Stanford University, 2004

Page 13: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

13

Computation of Quantile Summaries over Sliding Windows – LLXY04

GK Algorithm + Concept Of Aging (Computing quantiles over a Sliding Window of Most Recent N Elements)

Under sliding window model, a summary is maintained for the most recently seen N data elements.

Eliminate exact out-dated elements requires a space of O(N).

Page 14: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

14

e-approximate

A quantile summary for a data sequence is e- approximate if, for any given rank r, it returns a value whose rank r’ is guaranteed to be within the interval [r -εN , r + εN ]

Example : A data stream with 100 elements,

0.5 – quantile with ε= 0.1 returns a value v.

The true rank of v is within [40,60]

Page 15: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

15

Quantile Sketch Data structure

{ (vi , ri– ,ri

+) : 1 i m}≦ ≦ A value vi is one of the element seen so

far ri

– is the lower bound on the rank of vi

ri+ is the upper bound on the rank of vi

vi <= vi+1 , for 1 i m - 1≦ ≦ ri

– <= ri+1– , for 1 i m – 1≦ ≦

ri– < =ri <= ri

+ , where ri is the rank of vi

Page 16: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

16

Example

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15

12 10 11 10 1 10 11 9 6 7 8 11 4 5 2 3

Quantile sketch consisting of 6 tuples

{(1,1,1), (2,2,9), (3,3,10), (5,4,10), (10,10,10), (12,16,16)}

Page 17: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

17

e - approximate sketch

Theorem 1. r1

+ εN + 1≦ , 2. rm

– (1-ε)N,≧ 3. for 2 i m, ≦ ≦

Sketch S is e - approximate, That is for each Φ(0,1] , there is a (vi , ri

– ,ri+) in S such that

N NrrNN ii

Nrr ii 21

Page 18: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

18

Query

t0 t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 t11 t12 t13 t14 t15

12 10 11 10 1 10 11 9 6 7 8 11 4 5 2 3

Quantile sketch consisting of 6 tuples ε= 0.25

{(1,1,1), (2,2,9), (3,3,10), (5,4,10), (10,10,10), (12,16,16)}

0.5 – quantile return the vi of rank 8 , εN = 4

4848 ii rNrrNr Find the first tuple to satisfy the rule, and return vi

(4,4,10) => return 4

Page 19: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

19

One-Pass summary for sliding windows

Continuously divide a stream into the buckets based on the arrival ordering of data elements

The capacity of each bucket is For each bucket, we maintain an -

approximate continuously by GK-algorithm Once a bucket is full its - approximate

sketch is compressed into an - approximate sketch

The oldest bucket is expired if currently the total number of elements is N+1

2

N

4

4

2

Page 20: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

20

Current bucket

the most recent N elements

elements 2

N

elements 2

N

elements 2

N

elements 2

N ….

expired bucket

Compressed - approximate sketch in each bucket2

GK

Summary Technique

Page 21: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

21

-approximate sketch4

-approximate sketch

2

ExampleN = 8 , ε= 1 , = 4

1 2 3 4

2

N

5 6 7 8 9

Current bucket

Expire

Current bucket Current bucket

Full , compress

-approximate sketch4

-approximate sketch

2

Page 22: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

22

Compress Compress an - approximate sketch

into e-approximate sketch Memory space is most Why not use - approximate sketch in

each bucket directly? Compress technique takes about half

of the number of tuples given by - approximate sketch

2

21

2

2

Page 23: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

23

Merge There are h data stream Di ,and each Di

has Ni data elements. Suppose each Si is an e-approximate sketch of Di.

Smerge is a sketch of

|Smerge| =

Suppose each Si is an e-approximate sketch. Then, Smerge is also an e-approximate sketch

ihi D1

ihi S1

Page 24: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

24

Another Problem

5, 6, 7, 8,1, 2, 3, 4,

Expired

9

Current

ε=1 and N = 8

2

Approximate sketch 3,3,7,1,1,5 1 ,1 ,9

The first tuple in Smerge is , but the rank of 5 is 4. Smerge is not an - approximate sketch

5.3 ,1 ,5

2

Page 25: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

25

Lift To solve the pervious problem, we

use a “lift” operation to lift the value of by for each tuple i

If S is an - approximate sketch, then Slift is an e-approximate sketch

That is why the bucket size is and we maintain - approximate sketch of each bucket summary

ir

2

N

2

N

2

2

Page 26: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

26

Query

Step1. merge the local sketch

…N2

N

2

N

2

N

2

N

2

Smerge

Step2. lift Smerge lift

Slift

Current bucket

Step3. for a given rank r = ,find the first tuple

in Slift such that , return vi

N iii rrv ,,

NrrrNr ii

Page 27: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

27

Space – Sliding Window LLXY ‘04 O(1/1/єє²² +(log (+(log (єє²N)/²N)/єє))))

Reason: Reason: Sketch in each bucket produced by Sketch in each bucket produced by

the GK algorithm takes O (the GK algorithm takes O (log log ((єє²N)/²N)/єє)) space which will be space which will be compressed to O(1/compressed to O(1/єє) once the ) once the bucket is fullbucket is full

O(1/O(1/єє) buckets) buckets

Page 28: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

28

Performance Studies Sliding window model

Compare with the ARS-algorithm Avg Errors Space Consumption Distributions

n-of-N model Compare with the heuristic algorithm nN’

Avg Errors Space Consumption Query performance

Page 29: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

29

Conclusion

This work presented is among the first attempts to develop space efficient, one pass, deterministic quantile summary algorithms with performance guarantees under the sliding window model of data streams

Page 30: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

30

Approximating quantiles using sliding window model - Manku’s

Approximating Quantiles: GK Algorithm + Concept of Aging Improves over [ LLXY `04 ]

[LLXY `04] space: O(1/1/єє²² +(log (+(log (єє²N)/ ²N)/ єє)))) Manku’s Space: Manku’s Space: O(1/1/єє(log (1/(log (1/єє log N))) log N))) The space complexity is achieved by minimising The space complexity is achieved by minimising

the space used for maintaining the state the space used for maintaining the state at any point in time,e-approximate quantiles, for any (0; 1]) over the current contents of the sliding window can be computed using the maintained state.

The goal is to minimize the space required for maintaining the state.

Page 31: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

31

Overview

N

Page 32: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

32

Overview

N

Page 33: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

33

Overview

N

Page 34: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

34

Overview

N

Page 35: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

35

Overview

N

Page 36: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

36

Overview

N

Page 37: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

37

Overview

N

Page 38: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

38

Overview

N

Page 39: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

39

Details

N

єN4

log ( )є1є0

є2

= O(єN)

Page 40: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

40

Space Requirement O(1/1/єє(log (log (1/(1/єє log N))) log N)))

Space required for level-ℓ blocks:

1єℓ

xN

Nℓ

Size of a quantile sketchNumber of “active” blocks

N

єN / log(1є )

= =

log( ) x

Space required for GK Algorithm = 1/1/єє log log єєNN

1/1/єє log log єєN =N =O(1/1/єє(log (1/(log (1/єє log N))) log N)))

log( )

Page 41: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

41

Conclusion

The work presented is better than the first method with respect to space.

This paper also provides a randomized quantile finding algorithm with further improvement in space.

Page 42: Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565

42

Any Question?