Bandwidth-Efficient Continuous Query Processing over DHTs

of 28/28
Bandwidth-Efficient Continuous Query Processing over DHTs Yingwu Zhu
  • date post

    19-Jan-2016
  • Category

    Documents

  • view

    51
  • download

    0

Embed Size (px)

description

Bandwidth-Efficient Continuous Query Processing over DHTs. Yingwu Zhu. Background. Instantaneous Query Continuous Query. Instantaneous Query (1). Documents are indexed Node responsible for keyword t stores the IDs of documents containing that term (i.e., inverted lists) - PowerPoint PPT Presentation

Transcript of Bandwidth-Efficient Continuous Query Processing over DHTs

  • Bandwidth-Efficient Continuous Query Processing over DHTs

    Yingwu Zhu

  • BackgroundInstantaneous QueryContinuous Query

  • Instantaneous Query (1)Documents are indexedNode responsible for keyword t stores the IDs of documents containing that term (i.e., inverted lists)Retrieve one-time relevant docsLatency is a top priorityQuery Q = t1 t2 Fetch lists of doc IDs stored under t1, t2 .Intersect these listsE.g.: Google search engine

  • Instantaneous Query (2)ABDCcat:1,4,7,19,20dog:1,5,7,26cow:2,4,8,18bat: 1,8,31Send Result:Docs 1,7

  • Continuous Query (1)Reverse the role of documents and queriesQueries are indexedQuery Q = t1 t2 stored at one of the terms t1, t2 Question 1: How is the index term selected? (query indexing)Push new relevant docs (incrementally) Enabled by long-lived queriesE.g.: Google New Alert feature

  • Continuous Query (2)Upon a new doc D = t1 t2 (insertion)Contacts the nodes responsible for the inverted query lists of Ds keywords t1 and t2Question 2: How to locate the nodes (query nodes QN)? (document announcement)Resolve the query lists the final list of satisfied queries (by D)Question 3: What is the resolution strategy? (query resolution)E.g., Term Dialogue, Bloom filters (Infocom06)Notify owners of satisfied queries

  • Query Resolution: Term DialogueABcat (query):doghorse & doghorse & cowInver. list for catNotify owner of Q1CDInver. list for dog Inver. list for cow

  • Query Resolution: Bloom filtersABcat (query):doghorse & doghorse & cowInver. list for catNotify owner of Q1CDInver. list for dog Inver. list for cow

  • MotivationLatency is not the primary concern, but bandwidth can be one of the important design issues Various query indexing schemes incur different cost Various query resolution strategies cause different costs Design a bandwidth-efficient continuous query system with proper query indexing (Question #1), document announcement (Question #2), and query resolution (Question #3) approaches

  • ContributionsNovel query indexing schemes Question #1Focus of this talk!Multicast-based document announcement Question #2In the paper Adaptive query resolution Question #3Make intelligent decisions in resolving query termsMinimize the bandwidth costIn the full tech. report paper

  • DesignFocus on simple keyword queries, e.g., Q = t1 t2 tnLeverage DHTsLocation & storage of documents and continuous queriesQuery indexingHow to choose index terms for queries?Doc. announcement, query resolutionNot covered in this talk!

  • Current Indexing SchemesRandom Indexing (RI)Optimal Indexing (OI)

  • Random Indexing (RI)Randomly chooses a term as index termQ = t1 tmIndex term ti is randomly selectedQ is indexed in a DHT node responsible for tiPros: simpleCons:Popular terms are more likely to be index terms for queriesLoad imbalanceIntroduce many irrelevant queries in query resolution, wasting bandwidth

  • Optimal Indexing (OI)Q = t1 tmIndex term ti is deterministically chosen, the most selective term, i.e., with the least frequencyQ is indexed in a DHT node responsible for tiPros: Maximize load balance & minimize bandwidth costCons:Assume perfect knowledge of term statisticsImpractical, e.g., due to large number of documents, node churn, continuous doc updates, .

  • Solution 1: MHIMinimum Hash IndexingOrder query terms by their hashesSelect the term with minimum hash as the index termQ = t1 tmIndex term ti is deterministically chosen, s.t. h(ti) < h(tx) (for all xi)Q is indexed in a DHT node responsible for ti

  • RI v.s. MHIt1t2t3t4t5t6t7D = {t2, t4, t5, t6}Where h(ti) < h(tj) for i < j. 3 queries, irrelevant to D:Q1= t1 t2 t4Q2= t3 t4 t5Q3= t3 t5 t6

    (1) RI: Q1, Q2, and Q3 will be considered in query resolution each withprobability of 67% (need to resolve terms t1,t2,t3,t4,t5,and t6)(2) MHIAll of them will be filtered out! bandwidth savings!How?

  • MHI: filtering irrelevant queries!BGFED = {t2, t4, t5, t6}t2:nonet1:Q1t3:Q2, Q3t6:noneCDt5:nonet4:noneNo actionNo actionNo actionNo actionAQ1= t1 t2 t4Q2= t3 t4 t5Q3= t3 t5 t6

  • MHIPros:Simple and deterministic Does not require term statsSaves bandwidth over RI (up to 39.3% saving for various query types)Cons:Some popular terms can be index terms by their minimum hashes in their queries!Load imbalance & irrelevant queries to process

  • Solution 2: SAP-MHIMHI is good but may still index queries under popular terms SAmPling-based MHI(SAP-MHI)Sampling (synopsis of K popular terms) + MHIAvoid indexing queries under K popular termsChallenge: support duplicate-sensitive aggregates of popular terms as synopses may be gossiped over multiple DHT overlay links and term frequencies may be overestimated! Borrow idea from duplicate-sensitive aggregation in sensor networks

  • SAP-MHIDuplicate-sensitive aggregationGoal: a synopsis of K popular termsBased on coin tossing experiment CT(y)Toss a fair coin until either the first head occurs or y coin tosses end up with no head, and return the number of tossesEach node aProduce a local synopsis Sa containing K popular terms (the terms with the highest values of CT(y))Gossip Sa to its neighbor nodesUpon receiving a synopsis Sb from a neigbor b, aggregate Sa and Sb, producing a new synopsis Sa (max() operations)Thus, each node has a synopsis of K popular terms after a sufficient number of gossip roundsIntuition: If a term appears in more documents then its value produced by CT(y) will be larger than the values of rare terms

  • SAP-MHI: Indexing ExampleQuery Q=t1 t2 t3 t4 t5, where h(t1)
  • Simulations

  • SAP-MHI v.s. MHISAP-MHI improves load balance over MHI with increasing synopsis size K, for Skew queries.

  • SAP-MHI v.s. MHIBloom filters are used in query resolution.

    Chart1

    17.71765008692.6499302656.8806419258

    28.14168972417.777777777818.555667001

    25.950711022525.504416550433.0792377131

    33.060935589736.42956764343.4002006018

    2.606009909316.55044165553.3600802407

    20.349398365632.701069270162.5877632899

    Skew

    Uniform

    InverSkew

    Synopsis size K

    Bandwidth saving (%)

    TermDialogue

    33.212887684414.90945530030.0678733032

    41.37485165727.163409920514.0045248869

    48.804222682738.007300837425.2036199095

    55.252670173847.319447426835.0226244344

    60.909364649255.149953475143.8009049774

    Skew

    Uniform

    InverSkew

    # of successor term IDs

    Bandwith saving (%)

    BloomFilters

    17.678343689839.336247555521.7956852792

    21.292703443442.275393995229.1719543147

    22.587034976846.324629011835.2315989848

    25.959894716849.672150005842.3699238579

    30.876726453753.111699068248.0647208122

    Skew

    Uniform

    InverSkew

    # of successor terms

    Bandwidth saving (%)

    QueryNumberFiltered

    3.27217482764.59770114945.6451612903

    46.348019165617.81609195418.5483870968

    56.865723968733.620689655233.064516129

    68.446885590744.540229885142.7419354839

    77.059717190654.310344827651.6129032258

    78.941217716564.36781609260.4838709677

    Skew

    Uniform

    InverSkew

    Synopsis size

    % of queries filtered

    SAPMHI_BF

    17.71765008692.6499302656.8806419258

    28.14168972417.777777777818.555667001

    25.950711022525.504416550433.0792377131

    33.060935589736.42956764343.4002006018

    2.606009909316.55044165553.3600802407

    20.349398365632.701069270162.5877632899

    Skew

    Uniform

    InverSkew

    Synopsis size

    Bandwidth saving (%)

    SAPMHI_TD

    4.38885464714.78114478115.8061634658

    39.751014856218.223905723919.3389906208

    53.293235000333.922558922633.6534167039

    65.707955197644.680134680143.2335864225

    76.802299632354.402356902452.9700759268

    80.64175044164.579124579162.3715944618

    Skew

    Uniform

    InverSkew

    Synopsis size

    Bandwdith saving (%)

    SAPMHI_FULL

    25.20104895142.5349650356.5472027972

    11.450381679429.770992366414.5038167939

    2.419354838730.645161290316.935483871

    17.630714152120.87686018862.4793609233

    39.360110484514.53561974916.7671768903

    22.099841521425.031695721110.0713153724

    33.098483905935.65398219086.8741446421

    14.942364144128.853726641414.6058566621

    0.643443532333.436875970717.0401597515

    MHI

    Sampling (1000)

    Piggybacking (10)

    % of reduction

    SAPMHI_PG

    % of reduction

    Sheet1

    Term Dialogue

    RIMHIMHI+5MHI+10MHI+15MHI+20

    Skew282285188530165490144518126315110347

    Uniform139711188810176866173606266

    InverSkew442044173801330628722484

    05101520

    Skew33.212887684441.37485165748.804222682755.252670173860.9093646492

    Uniform14.909455300327.163409920538.007300837447.319447426855.1499534751

    InverSkew0.067873303214.004524886925.203619909535.022624434443.8009049774

    Bloom Filters

    RIMHIMHI+5MHI+10MHI+15MHI+20

    Skew368533033829006285292728625474

    Uniform173861054710036933287508152

    InverSkew1260898608930816672666548

    05101520

    Skew17.678343689821.292703443422.587034976825.959894716830.8767264537

    Uniform39.336247555542.275393995246.324629011849.672150005853.1116990682

    InverSkew21.795685279229.171954314735.231598984842.369923857948.0647208122

    #of queries to resolve for SAP-MHI with respect to synopsis size

    MHISAP=1005001000150020003000

    Skew8557827745913691270019631802

    Uniform348332286231193159124

    InverSkew12411710183716049

    1005001000150020003000

    Skew3.272174827646.348019165656.865723968768.446885590777.059717190678.9412177165

    Uniform4.597701149417.81609195433.620689655244.540229885154.310344827664.367816092

    InverSkew5.645161290318.548387096833.06451612942.741935483951.612903225860.4838709677

    Bandwdith Cost in Bytes for SAP-MHIBloomFilter used

    MHISAP=1005001000150020003000

    Skew310822557522335230162080630272247572208828.9363618815

    Uniform107551047088438012683789757238683636.4388656439

    InverSkew9970928481206672564346503730540145.8274824473

    1005001000150020003000

    Skew17.717650086928.14168972425.950711022533.06093558972.606009909320.349398365668566

    Uniform2.64993026517.777777777825.504416550436.42956764316.55044165532.70106927015810

    InverSkew6.880641925818.55566700133.079237713143.400200601853.360080240762.58776328992203

    Bandwdith Cost in Bytes for SAP-MHITerm Dialogue used

    MHISAP=1005001000150020003000

    Skew188204179944113391879046453943659364336856663.5682557225

    Uniform118801131297157850657254174208581051.0942760943

    InverSkew4478421836122971254221061685220350.803930326

    1005001000150020003000

    Skew4.388854647139.751014856253.293235000365.707955197676.802299632380.641750441

    Uniform4.781144781118.223905723933.922558922644.680134680154.402356902464.5791245791

    InverSkew5.806163465819.338990620833.653416703943.233586422552.970075926862.3715944618

    SAP-MHI+Piggybacking

    MHISAP-MHISAP-MHI+Piggybacking

    Skew8557369156.8657239687294265.6187916326

    Uniform34823133.620689655217450

    InverSkew1248333.0645161296250

    SkewUniformInverSkew

    Synopsis (1000)Synopisis(1000)10:102013

    Piggybacking(10)13:141012

    The below data is outdated due to new experiments!!!

    # of queries

    RIMHISAP-MHISAP-MHI+Piggybacking

    Skew11440855736912942

    Uniform393348231174

    InverSkew1211248362

    Bandwidth in BF

    RIMHISAP-MHISAP-MHI+Piggybacking

    Skew31148310822301622088

    Uniform216231075580126836

    InverSkew12546997066725401

    Bandwidth in TD

    RIMHISAP-MHISAP-MHI+Piggybacking

    Skew2813151882048790468566

    Uniform139671188078505810

    InverSkew4507447829712203

    Sheet1

    000

    000

    000

    000

    000

    000

    Skew

    Uniform

    InverSkew

    Synopsis size K

    Bandwidth saving (%)

    no_use1

    56.933.633

    2625.533.1

    53.333.933.7

    65.65050

    28.936.445.8

    63.651.150.8

    Skew

    Uniform

    InverSkew

    Chart4

    56.933.633

    2625.533.1

    53.333.933.7

    65.65050

    28.936.445.8

    63.651.150.8

    Skew

    Uniform

    InverSkew

    % of reduction

    Sheet2

    # of queries

    RIMHISAP-MHISAP-MHI+Piggybacking

    Skew11440855736912942

    Uniform393348231174

    InverSkew1241218362

    Bandwidth in BF

    RIMHISAP-MHISAP-MHI+Piggybacking

    Skew37429308302301622088

    Uniform173781053880126836

    InverSkew12620983166725401

    Bandwidth in TD

    RIMHISAP-MHISAP-MHI+Piggybacking

    Skew2813151882048790468566

    Uniform139671188078505810

    InverSkew4507447829712203

    # of queriesBW in BFBW in TD

    SkewUniformInverSkewSkewUniformInverSkewSkewUniformInverSkew

    MHI25.20104895111.45038167942.419354838717.630714152139.360110484522.099841521433.098483905914.94236414410.6434435323

    Sampling (1000)42.53496503529.770992366430.645161290320.876860188614.535619749125.031695721135.653982190828.853726641433.4368759707

    Piggybacking (10)6.547202797214.503816793916.9354838712.47936092336.767176890310.07131537246.874144642114.605856662117.0401597515

    Sheet3

    Results (BW) for Adaptive Query Resolution, 04/16/2007. Adaptive threshold r = 0.5

    Skew

    MHIMHI w/ PBSAP-MHISAP-MHI w/ PB

    BF30338285292301622088

    TD1885301445188790468566

    Adaptive30612273412252119857

    UniformMHIMHI w/ PBSAP-MHISAP-MHI w/ PB

    BF10547933280126836

    TD11888866178505810

    Adaptive8731676959194649

    InverSkewMHIMHI w/ PBSAP-MHISAP-MHI w/ PB

    BF9860816666725401

    TD4417330629712203

    Adaptive4433333029272188

    Impact of re-sampling and re-indexing

    SkewUniformInverSkew

    1

    2

    3

    4

    5

    6

    7

    8

    9

  • SAP-MHI v.s. MHITerm Dialogue is used in query resolution.

    Chart2

    4.38885464714.78114478115.8061634658

    39.751014856218.223905723919.3389906208

    53.293235000333.922558922633.6534167039

    65.707955197644.680134680143.2335864225

    76.802299632354.402356902452.9700759268

    80.64175044164.579124579162.3715944618

    Skew

    Uniform

    InverSkew

    Synopsis size K

    Bandwidth saving (%)

    TermDialogue

    33.212887684414.90945530030.0678733032

    41.37485165727.163409920514.0045248869

    48.804222682738.007300837425.2036199095

    55.252670173847.319447426835.0226244344

    60.909364649255.149953475143.8009049774

    Skew

    Uniform

    InverSkew

    # of successor term IDs

    Bandwith saving (%)

    BloomFilters

    17.678343689839.336247555521.7956852792

    21.292703443442.275393995229.1719543147

    22.587034976846.324629011835.2315989848

    25.959894716849.672150005842.3699238579

    30.876726453753.111699068248.0647208122

    Skew

    Uniform

    InverSkew

    # of successor terms

    Bandwidth saving (%)

    QueryNumberFiltered

    3.27217482764.59770114945.6451612903

    46.348019165617.81609195418.5483870968

    56.865723968733.620689655233.064516129

    68.446885590744.540229885142.7419354839

    77.059717190654.310344827651.6129032258

    78.941217716564.36781609260.4838709677

    Skew

    Uniform

    InverSkew

    Synopsis size

    % of queries filtered

    SAPMHI_BF

    17.71765008692.6499302656.8806419258

    28.14168972417.777777777818.555667001

    25.950711022525.504416550433.0792377131

    33.060935589736.42956764343.4002006018

    2.606009909316.55044165553.3600802407

    20.349398365632.701069270162.5877632899

    Skew

    Uniform

    InverSkew

    Synopsis size

    Bandwidth saving (%)

    SAPMHI_TD

    4.38885464714.78114478115.8061634658

    39.751014856218.223905723919.3389906208

    53.293235000333.922558922633.6534167039

    65.707955197644.680134680143.2335864225

    76.802299632354.402356902452.9700759268

    80.64175044164.579124579162.3715944618

    Skew

    Uniform

    InverSkew

    Synopsis size

    Bandwdith saving (%)

    SAPMHI_FULL

    25.20104895142.5349650356.5472027972

    11.450381679429.770992366414.5038167939

    2.419354838730.645161290316.935483871

    17.630714152120.87686018862.4793609233

    39.360110484514.53561974916.7671768903

    22.099841521425.031695721110.0713153724

    33.098483905935.65398219086.8741446421

    14.942364144128.853726641414.6058566621

    0.643443532333.436875970717.0401597515

    MHI

    Sampling (1000)

    Piggybacking (10)

    % of reduction

    SAPMHI_PG

    % of reduction

    Sheet1

    Term Dialogue

    RIMHIMHI+5MHI+10MHI+15MHI+20

    Skew282285188530165490144518126315110347

    Uniform139711188810176866173606266

    InverSkew442044173801330628722484

    05101520

    Skew33.212887684441.37485165748.804222682755.252670173860.9093646492

    Uniform14.909455300327.163409920538.007300837447.319447426855.1499534751

    InverSkew0.067873303214.004524886925.203619909535.022624434443.8009049774

    Bloom Filters

    RIMHIMHI+5MHI+10MHI+15MHI+20

    Skew368533033829006285292728625474

    Uniform173861054710036933287508152

    InverSkew1260898608930816672666548

    05101520

    Skew17.678343689821.292703443422.587034976825.959894716830.8767264537

    Uniform39.336247555542.275393995246.324629011849.672150005853.1116990682

    InverSkew21.795685279229.171954314735.231598984842.369923857948.0647208122

    #of queries to resolve for SAP-MHI with respect to synopsis size

    MHISAP=1005001000150020003000

    Skew8557827745913691270019631802

    Uniform348332286231193159124

    InverSkew12411710183716049

    1005001000150020003000

    Skew3.272174827646.348019165656.865723968768.446885590777.059717190678.9412177165

    Uniform4.597701149417.81609195433.620689655244.540229885154.310344827664.367816092

    InverSkew5.645161290318.548387096833.06451612942.741935483951.612903225860.4838709677

    Bandwdith Cost in Bytes for SAP-MHIBloomFilter used

    MHISAP=1005001000150020003000

    Skew310822557522335230162080630272247572208828.9363618815

    Uniform107551047088438012683789757238683636.4388656439

    InverSkew9970928481206672564346503730540145.8274824473

    1005001000150020003000

    Skew17.717650086928.14168972425.950711022533.06093558972.606009909320.349398365668566

    Uniform2.64993026517.777777777825.504416550436.42956764316.55044165532.70106927015810

    InverSkew6.880641925818.55566700133.079237713143.400200601853.360080240762.58776328992203

    Bandwdith Cost in Bytes for SAP-MHITerm Dialogue used

    MHISAP=1005001000150020003000

    Skew188204179944113391879046453943659364336856663.5682557225

    Uniform118801131297157850657254174208581051.0942760943

    InverSkew4478421836122971254221061685220350.803930326

    1005001000150020003000

    Skew4.388854647139.751014856253.293235000365.707955197676.802299632380.641750441

    Uniform4.781144781118.223905723933.922558922644.680134680154.402356902464.5791245791

    InverSkew5.806163465819.338990620833.653416703943.233586422552.970075926862.3715944618

    SAP-MHI+Piggybacking

    MHISAP-MHISAP-MHI+Piggybacking

    Skew8557369156.8657239687294265.6187916326

    Uniform34823133.620689655217450

    InverSkew1248333.0645161296250

    SkewUniformInverSkew

    Synopsis (1000)Synopisis(1000)10:102013

    Piggybacking(10)13:141012

    The below data is outdated due to new experiments!!!

    # of queries

    RIMHISAP-MHISAP-MHI+Piggybacking

    Skew11440855736912942

    Uniform393348231174

    InverSkew1211248362

    Bandwidth in BF

    RIMHISAP-MHISAP-MHI+Piggybacking

    Skew31148310822301622088

    Uniform216231075580126836

    InverSkew12546997066725401

    Bandwidth in TD

    RIMHISAP-MHISAP-MHI+Piggybacking

    Skew2813151882048790468566

    Uniform139671188078505810

    InverSkew4507447829712203

    Sheet1

    000

    000

    000

    000

    000

    000

    Skew

    Uniform

    InverSkew

    Synopsis size K

    Bandwidth saving (%)

    no_use1

    000

    000

    000

    000

    000

    000

    Skew

    Uniform

    InverSkew

    Synopsis size K

    Bandwidth saving (%)

    Chart4

    56.933.633

    2625.533.1

    53.333.933.7

    65.65050

    28.936.445.8

    63.651.150.8

    Skew

    Uniform

    InverSkew

    Sheet2

    56.933.633

    2625.533.1

    53.333.933.7

    65.65050

    28.936.445.8

    63.651.150.8

    Skew

    Uniform

    InverSkew

    % of reduction

    Sheet3

    # of queries

    RIMHISAP-MHISAP-MHI+Piggybacking

    Skew11440855736912942

    Uniform393348231174

    InverSkew1241218362

    Bandwidth in BF

    RIMHISAP-MHISAP-MHI+Piggybacking

    Skew37429308302301622088

    Uniform173781053880126836

    InverSkew12620983166725401

    Bandwidth in TD

    RIMHISAP-MHISAP-MHI+Piggybacking

    Skew2813151882048790468566

    Uniform139671188078505810

    InverSkew4507447829712203

    # of queriesBW in BFBW in TD

    SkewUniformInverSkewSkewUniformInverSkewSkewUniformInverSkew

    MHI25.20104895111.45038167942.419354838717.630714152139.360110484522.099841521433.098483905914.94236414410.6434435323

    Sampling (1000)42.53496503529.770992366430.645161290320.876860188614.535619749125.031695721135.653982190828.853726641433.4368759707

    Piggybacking (10)6.547202797214.503816793916.9354838712.47936092336.767176890310.07131537246.874144642114.605856662117.0401597515

    Results (BW) for Adaptive Query Resolution, 04/16/2007. Adaptive threshold r = 0.5

    Skew

    MHIMHI w/ PBSAP-MHISAP-MHI w/ PB

    BF30338285292301622088

    TD1885301445188790468566

    Adaptive30612273412252119857

    UniformMHIMHI w/ PBSAP-MHISAP-MHI w/ PB

    BF10547933280126836

    TD11888866178505810

    Adaptive8731676959194649

    InverSkewMHIMHI w/ PBSAP-MHISAP-MHI w/ PB

    BF9860816666725401

    TD4417330629712203

    Adaptive4433333029272188

    Impact of re-sampling and re-indexing

    SkewUniformInverSkew

    1

    2

    3

    4

    5

    6

    7

    8

    9

  • SAP-MHI v.s. MHIThis shows why SAP-MHI saves bandwidth over MHI!

    Chart3

    3.27217482764.59770114945.6451612903

    46.348019165617.81609195418.5483870968

    56.865723968733.620689655233.064516129

    68.446885590744.540229885142.7419354839

    77.059717190654.310344827651.6129032258

    78.941217716564.36781609260.4838709677

    Skew

    Uniform

    InverSkew

    Synopsis size K

    % of queries filtered

    TermDialogue

    33.212887684414.90945530030.0678733032

    41.37485165727.163409920514.0045248869

    48.804222682738.007300837425.2036199095

    55.252670173847.319447426835.0226244344

    60.909364649255.149953475143.8009049774

    Skew

    Uniform

    InverSkew

    # of successor term IDs

    Bandwith saving (%)

    BloomFilters

    17.678343689839.336247555521.7956852792

    21.292703443442.275393995229.1719543147

    22.587034976846.324629011835.2315989848

    25.959894716849.672150005842.3699238579

    30.876726453753.111699068248.0647208122

    Skew

    Uniform

    InverSkew

    # of successor terms

    Bandwidth saving (%)

    QueryNumberFiltered

    3.27217482764.59770114945.6451612903

    46.348019165617.81609195418.5483870968

    56.865723968733.620689655233.064516129

    68.446885590744.540229885142.7419354839

    77.059717190654.310344827651.6129032258

    78.941217716564.36781609260.4838709677

    Skew

    Uniform

    InverSkew

    Synopsis size

    % of queries filtered

    SAPMHI_BF

    17.71765008692.6499302656.8806419258

    28.14168972417.777777777818.555667001

    25.950711022525.504416550433.0792377131

    33.060935589736.42956764343.4002006018

    2.606009909316.55044165553.3600802407

    20.349398365632.701069270162.5877632899

    Skew

    Uniform

    InverSkew

    Synopsis size

    Bandwidth saving (%)

    SAPMHI_TD

    4.38885464714.78114478115.8061634658

    39.751014856218.223905723919.3389906208

    53.293235000333.922558922633.6534167039

    65.707955197644.680134680143.2335864225

    76.802299632354.402356902452.9700759268

    80.64175044164.579124579162.3715944618

    Skew

    Uniform

    InverSkew

    Synopsis size

    Bandwdith saving (%)

    SAPMHI_FULL

    25.20104895142.5349650356.5472027972

    11.450381679429.770992366414.5038167939

    2.419354838730.645161290316.935483871

    17.630714152120.87686018862.4793609233

    39.360110484514.53561974916.7671768903

    22.099841521425.031695721110.0713153724

    33.098483905935.65398219086.8741446421

    14.942364144128.853726641414.6058566621

    0.643443532333.436875970717.0401597515

    MHI

    Sampling (1000)

    Piggybacking (10)

    % of reduction

    SAPMHI_PG

    % of reduction

    Sheet1

    Term Dialogue

    RIMHIMHI+5MHI+10MHI+15MHI+20

    Skew282285188530165490144518126315110347

    Uniform139711188810176866173606266

    InverSkew442044173801330628722484

    05101520

    Skew33.212887684441.37485165748.804222682755.252670173860.9093646492

    Uniform14.909455300327.163409920538.007300837447.319447426855.1499534751

    InverSkew0.067873303214.004524886925.203619909535.022624434443.8009049774

    Bloom Filters

    RIMHIMHI+5MHI+10MHI+15MHI+20

    Skew368533033829006285292728625474

    Uniform173861054710036933287508152

    InverSkew1260898608930816672666548

    05101520

    Skew17.678343689821.292703443422.587034976825.959894716830.8767264537

    Uniform39.336247555542.275393995246.324629011849.672150005853.1116990682

    InverSkew21.795685279229.171954314735.231598984842.369923857948.0647208122

    #of queries to resolve for SAP-MHI with respect to synopsis size

    MHISAP=1005001000150020003000

    Skew8557827745913691270019631802

    Uniform348332286231193159124

    InverSkew12411710183716049

    1005001000150020003000

    Skew3.272174827646.348019165656.865723968768.446885590777.059717190678.9412177165

    Uniform4.597701149417.81609195433.620689655244.540229885154.310344827664.367816092

    InverSkew5.645161290318.548387096833.06451612942.741935483951.612903225860.4838709677

    Bandwdith Cost in Bytes for SAP-MHIBloomFilter used

    MHISAP=1005001000150020003000

    Skew310822557522335230162080630272247572208828.9363618815

    Uniform107551047088438012683789757238683636.4388656439

    InverSkew9970928481206672564346503730540145.8274824473

    1005001000150020003000

    Skew17.717650086928.14168972425.950711022533.06093558972.606009909320.349398365668566

    Uniform2.64993026517.777777777825.504416550436.42956764316.55044165532.70106927015810

    InverSkew6.880641925818.55566700133.079237713143.400200601853.360080240762.58776328992203

    Bandwdith Cost in Bytes for SAP-MHITerm Dialogue used

    MHISAP=1005001000150020003000

    Skew188204179944113391879046453943659364336856663.5682557225

    Uniform118801131297157850657254174208581051.0942760943

    InverSkew4478421836122971254221061685220350.803930326

    1005001000150020003000

    Skew4.388854647139.751014856253.293235000365.707955197676.802299632380.641750441

    Uniform4.781144781118.223905723933.922558922644.680134680154.402356902464.5791245791

    InverSkew5.806163465819.338990620833.653416703943.233586422552.970075926862.3715944618

    SAP-MHI+Piggybacking

    MHISAP-MHISAP-MHI+Piggybacking

    Skew8557369156.8657239687294265.6187916326

    Uniform34823133.620689655217450

    InverSkew1248333.0645161296250

    SkewUniformInverSkew

    Synopsis (1000)Synopisis(1000)10:102013

    Piggybacking(10)13:141012

    The below data is outdated due to new experiments!!!

    # of queries

    RIMHISAP-MHISAP-MHI+Piggybacking

    Skew11440855736912942

    Uniform393348231174

    InverSkew1211248362

    Bandwidth in BF

    RIMHISAP-MHISAP-MHI+Piggybacking

    Skew31148310822301622088

    Uniform216231075580126836

    InverSkew12546997066725401

    Bandwidth in TD

    RIMHISAP-MHISAP-MHI+Piggybacking

    Skew2813151882048790468566

    Uniform139671188078505810

    InverSkew4507447829712203

    Sheet1

    000

    000

    000

    000

    000

    000

    Skew

    Uniform

    InverSkew

    Synopsis size K

    Bandwidth saving (%)

    no_use1

    000

    000

    000

    000

    000

    000

    Skew

    Uniform

    InverSkew

    Synopsis size K

    Bandwidth saving (%)

    Chart4

    000

    000

    000

    000

    000

    000

    Skew

    Uniform

    InverSkew

    Synopsis size K

    % of queries filtered

    Sheet2

    56.933.633

    2625.533.1

    53.333.933.7

    65.65050

    28.936.445.8

    63.651.150.8

    Skew

    Uniform

    InverSkew

    Sheet3

    56.933.633

    2625.533.1

    53.333.933.7

    65.65050

    28.936.445.8

    63.651.150.8

    Skew

    Uniform

    InverSkew

    % of reduction

    # of queries

    RIMHISAP-MHISAP-MHI+Piggybacking

    Skew11440855736912942

    Uniform393348231174

    InverSkew1241218362

    Bandwidth in BF

    RIMHISAP-MHISAP-MHI+Piggybacking

    Skew37429308302301622088

    Uniform173781053880126836

    InverSkew12620983166725401

    Bandwidth in TD

    RIMHISAP-MHISAP-MHI+Piggybacking

    Skew2813151882048790468566

    Uniform139671188078505810

    InverSkew4507447829712203

    # of queriesBW in BFBW in TD

    SkewUniformInverSkewSkewUniformInverSkewSkewUniformInverSkew

    MHI25.20104895111.45038167942.419354838717.630714152139.360110484522.099841521433.098483905914.94236414410.6434435323

    Sampling (1000)42.53496503529.770992366430.645161290320.876860188614.535619749125.031695721135.653982190828.853726641433.4368759707

    Piggybacking (10)6.547202797214.503816793916.9354838712.47936092336.767176890310.07131537246.874144642114.605856662117.0401597515

    Results (BW) for Adaptive Query Resolution, 04/16/2007. Adaptive threshold r = 0.5

    Skew

    MHIMHI w/ PBSAP-MHISAP-MHI w/ PB

    BF30338285292301622088

    TD1885301445188790468566

    Adaptive30612273412252119857

    UniformMHIMHI w/ PBSAP-MHISAP-MHI w/ PB

    BF10547933280126836

    TD11888866178505810

    Adaptive8731676959194649

    InverSkewMHIMHI w/ PBSAP-MHISAP-MHI w/ PB

    BF9860816666725401

    TD4417330629712203

    Adaptive4433333029272188

    Impact of re-sampling and re-indexing

    SkewUniformInverSkew

    1

    2

    3

    4

    5

    6

    7

    8

    9

  • SummaryFocus on a simple keyword query modelBandwidth is a top priorityQuery indexing impacts bandwidth costGoal: Sift out as many irrelevant queries as possible!MHI and SAP-MHISAP-MHI is a more viable solutionLoad is more balanced, more bandwidth saving!Sampling cost is controlled# of popular terms is relatively lowMemberships of popular terms do not change rapidlyDocument announcement & adaptive query resolution further cut down bandwidth consumption (not covered in this talk)

  • Thank You!

    ICPP'08