Massive Data Algorithmics - SHARIF UNIVERSITY...

32
Massive Data Algorithmics Lecture 8: Range Searching Massive Data Algorithmics Lecture 8: Range Searching

Transcript of Massive Data Algorithmics - SHARIF UNIVERSITY...

Page 1: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

Massive Data Algorithmics

Lecture 8: Range Searching

Massive Data Algorithmics Lecture 8: Range Searching

Page 2: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

Range Searching

We have now discussed structures for special cases oftwo-dimensional range searching

- Space: O(N/B)

- Query: O(logB N +T/B)

- Updates: O(logB N)

Cannot be obtained for general (4-sided) 2d range searching:

- O(logcB N) query requires Ω(N

BlogB N

logB logB N ) space

- O(N/B) space requires Ω(√

N/B) query

Massive Data Algorithmics Lecture 8: Range Searching

Page 3: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

External Range Tree

Base tree: Weight balanced tree with branching parameter Θ(logB N)and leaf parameter B on x-coordinates⇒ O(loglogBN N) = O( logB N

logB logB N ) height

Points below each node stored in 4 linear space secondary structures:

- Right priority search tree

- Left priority search tree

- B-tree on y-coordinates

- Interval (priority search) tree

⇓O(N

BlogB N

logB logB N )

Massive Data Algorithmics Lecture 8: Range Searching

Page 4: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

External Range Tree

Secondary interval tree:

Points below each node stored in 4 linear space secondary structures:

- Connect points in each slab in y-order

- Project obtained segments in y-axis

- Intervals stored in interval tree

* Interval augmented with pointer to corresponding points in y-coordinateB-tree in corresponding child node

Massive Data Algorithmics Lecture 8: Range Searching

Page 5: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

External Range Tree

Query with (q1,q2,q3,q4) answered in top node with q1 and q2 indifferent slabs v1 and v2

Found with 3-sided query in v1using right priority search tree

Found with 3-sided query in v2using left priority search tree

Points in slabs between v1 and v2

- Answer stabbing query with q3 using interval tree ⇒ first point aboveq3 in each of the O(logB N) slabs

- Find points using y-coordinate B-tree in O(logB N) slabs

Massive Data Algorithmics Lecture 8: Range Searching

Page 6: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

External Range Tree

Query with (q1,q2,q3,q4) answered in top node with q1 and q2 indifferent slabs v1 and v2

Found with 3-sided query in v1using right priority search tree

Found with 3-sided query in v2using left priority search tree

Points in slabs between v1 and v2

- Answer stabbing query with q3 using interval tree ⇒ first point aboveq3 in each of the O(logB N) slabs

- Find points using y-coordinate B-tree in O(logB N) slabs

Massive Data Algorithmics Lecture 8: Range Searching

Page 7: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

External Range Tree

Query with (q1,q2,q3,q4) answered in top node with q1 and q2 indifferent slabs v1 and v2

Found with 3-sided query in v1using right priority search tree

Found with 3-sided query in v2using left priority search tree

Points in slabs between v1 and v2

- Answer stabbing query with q3 using interval tree ⇒ first point aboveq3 in each of the O(logB N) slabs

- Find points using y-coordinate B-tree in O(logB N) slabs

Massive Data Algorithmics Lecture 8: Range Searching

Page 8: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

External Range Tree

Query with (q1,q2,q3,q4) answered in top node with q1 and q2 indifferent slabs v1 and v2

Found with 3-sided query in v1using right priority search tree

Found with 3-sided query in v2using left priority search tree

Points in slabs between v1 and v2

- Answer stabbing query with q3 using interval tree ⇒ first point aboveq3 in each of the O(logB N) slabs

- Find points using y-coordinate B-tree in O(logB N) slabs

Massive Data Algorithmics Lecture 8: Range Searching

Page 9: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

External Range Tree

Query analysis:

- O(logB N) I/Os to find relevant node- O(logB N +T/B) I/Os to answer two 3-sided queries- O(logB N + logB N/B) = O(logB N) I/Os to query interval tree- O(logB N +T/B) I/Os to traverse O(logB N) B-trees

⇒ O(logB N +T/B) I/Os

Massive Data Algorithmics Lecture 8: Range Searching

Page 10: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

External Range Tree

Insert:- Insert x-coordinate in weight-balanced B-tree

* Split of v can be performed in O(w(v) logB w(v)) I/Os

⇒ O(log2

B NlogB logB N ) I/Os

- Update secondary structures in all O(logB N

logB logB N ) nodes on one root-leaf

path

* Update priority search trees* Update interval tree* Update B-tree

⇒ O(log2

B NlogB logB N ) I/Os

Delete:

- Similar and using global rebuilding

Massive Data Algorithmics Lecture 8: Range Searching

Page 11: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

Summary: External Range Tree

2d range searching in O(NB

logB NlogB logB N ) space

- O(logB N +T/B) I/O query

- O(log2

B NlogB logB N ) update

Optimal among O(logB N +T/B) query structures

Massive Data Algorithmics Lecture 8: Range Searching

Page 12: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

kdB-tree

kd-tree

- Recursive subdivision of point-set into two half usingvertical/horizontal line

- Horizontal line on even levels, vertical on uneven levels- One point in each leaf

⇒ Linear space

Massive Data Algorithmics Lecture 8: Range Searching

Page 13: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

kdB-tree

kd-tree

- Recursive subdivision of point-set into two half usingvertical/horizontal line

- Horizontal line on even levels, vertical on uneven levels- One point in each leaf

⇒ Linear space

Massive Data Algorithmics Lecture 8: Range Searching

Page 14: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

kdB-tree

kd-tree

- Recursive subdivision of point-set into two half usingvertical/horizontal line

- Horizontal line on even levels, vertical on uneven levels- One point in each leaf

⇒ Linear space

Massive Data Algorithmics Lecture 8: Range Searching

Page 15: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

kdB-tree

kd-tree

- Recursive subdivision of point-set into two half usingvertical/horizontal line

- Horizontal line on even levels, vertical on uneven levels- One point in each leaf

⇒ Linear space

Massive Data Algorithmics Lecture 8: Range Searching

Page 16: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

kdB-tree

kd-tree

- Recursive subdivision of point-set into two half usingvertical/horizontal line

- Horizontal line on even levels, vertical on uneven levels- One point in each leaf

⇒ Linear space

Massive Data Algorithmics Lecture 8: Range Searching

Page 17: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

kdB-tree

Query

- Recursively visit nodes corresponding to regions intersecting query- Report point in trees/nodes completely contained in query

Query analysis

- Horizontal line intersect Q(N) = 2+2Q(N/4) = O(√

N) regions- Query covers T regions

⇒ O(√

N +T) I/Os

Massive Data Algorithmics Lecture 8: Range Searching

Page 18: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

kdB-tree

Query

- Recursively visit nodes corresponding to regions intersecting query- Report point in trees/nodes completely contained in query

Query analysis

- Horizontal line intersect Q(N) = 2+2Q(N/4) = O(√

N) regions- Query covers T regions

⇒ O(√

N +T) I/Os

Massive Data Algorithmics Lecture 8: Range Searching

Page 19: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

kdB-tree

kdB-tree:

- Stop subdivision when leaf contains between B/2 and B points- BFS-blocking of internal nodes

Query as before

- Analysis as before but each region now contains Θ(B) points⇒ O(

√N/B+T/B) I/Os

Massive Data Algorithmics Lecture 8: Range Searching

Page 20: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

Construction of kdB-tree

Simple O(N/B log2 N/B) algorithm

- Find median of y-coordinates (construct root)- Distribute point based on median- Recursively build subtrees- Construct BFS-blocking top-down

Idea in improved O(N/B logM/B N/B) algorithm

- Construct√

M/B levels at a time using O(N/B) I/Os

Massive Data Algorithmics Lecture 8: Range Searching

Page 21: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

Construction of kdB-tree

Sort N points by x- and by y-coordinates using O(N/B logM/B N/B) I/Os

Building√

M/B levels (√

M/B nodes) in O(N/B) I/Os:

1. Construct√

M/B×√

M/B grid

with O(N/√

M/B) points in eachslab

2. Count number of points in eachgrid cell and store in memory

3. Find slab s with medianx-coordinate

4. Scan slab s to find median x-coordinate and construct node

5. Split slab containing median x-coordinate and update counts

6. Recurse on each side of median x-coordinate using grid (step 3)

⇒ Grid grows to M/B+√

M/B.√

M/B = Θ(M/B) during algorithm⇒ Each node constructed in O(N(

√M/B.B) I/Os

Massive Data Algorithmics Lecture 8: Range Searching

Page 22: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

Construction of kdB-tree

Sort N points by x- and by y-coordinates using O(N/B logM/B N/B) I/Os

Building√

M/B levels (√

M/B nodes) in O(N/B) I/Os:

1. Construct√

M/B×√

M/B grid

with O(N/√

M/B) points in eachslab

2. Count number of points in eachgrid cell and store in memory

3. Find slab s with medianx-coordinate

4. Scan slab s to find median x-coordinate and construct node

5. Split slab containing median x-coordinate and update counts

6. Recurse on each side of median x-coordinate using grid (step 3)

⇒ Grid grows to M/B+√

M/B.√

M/B = Θ(M/B) during algorithm⇒ Each node constructed in O(N(

√M/B.B) I/Os

Massive Data Algorithmics Lecture 8: Range Searching

Page 23: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

Construction of kdB-tree

Sort N points by x- and by y-coordinates using O(N/B logM/B N/B) I/Os

Building√

M/B levels (√

M/B nodes) in O(N/B) I/Os:

1. Construct√

M/B×√

M/B grid

with O(N/√

M/B) points in eachslab

2. Count number of points in eachgrid cell and store in memory

3. Find slab s with medianx-coordinate

4. Scan slab s to find median x-coordinate and construct node

5. Split slab containing median x-coordinate and update counts

6. Recurse on each side of median x-coordinate using grid (step 3)

⇒ Grid grows to M/B+√

M/B.√

M/B = Θ(M/B) during algorithm⇒ Each node constructed in O(N(

√M/B.B) I/Os

Massive Data Algorithmics Lecture 8: Range Searching

Page 24: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

Construction of kdB-tree

Sort N points by x- and by y-coordinates using O(N/B logM/B N/B) I/Os

Building√

M/B levels (√

M/B nodes) in O(N/B) I/Os:

1. Construct√

M/B×√

M/B grid

with O(N/√

M/B) points in eachslab

2. Count number of points in eachgrid cell and store in memory

3. Find slab s with medianx-coordinate

4. Scan slab s to find median x-coordinate and construct node

5. Split slab containing median x-coordinate and update counts

6. Recurse on each side of median x-coordinate using grid (step 3)

⇒ Grid grows to M/B+√

M/B.√

M/B = Θ(M/B) during algorithm⇒ Each node constructed in O(N(

√M/B.B) I/Os

Massive Data Algorithmics Lecture 8: Range Searching

Page 25: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

Construction of kdB-tree

Sort N points by x- and by y-coordinates using O(N/B logM/B N/B) I/Os

Building√

M/B levels (√

M/B nodes) in O(N/B) I/Os:

1. Construct√

M/B×√

M/B grid

with O(N/√

M/B) points in eachslab

2. Count number of points in eachgrid cell and store in memory

3. Find slab s with medianx-coordinate

4. Scan slab s to find median x-coordinate and construct node

5. Split slab containing median x-coordinate and update counts

6. Recurse on each side of median x-coordinate using grid (step 3)

⇒ Grid grows to M/B+√

M/B.√

M/B = Θ(M/B) during algorithm⇒ Each node constructed in O(N(

√M/B.B) I/Os

Massive Data Algorithmics Lecture 8: Range Searching

Page 26: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

Construction of kdB-tree

Sort N points by x- and by y-coordinates using O(N/B logM/B N/B) I/Os

Building√

M/B levels (√

M/B nodes) in O(N/B) I/Os:

1. Construct√

M/B×√

M/B grid

with O(N/√

M/B) points in eachslab

2. Count number of points in eachgrid cell and store in memory

3. Find slab s with medianx-coordinate

4. Scan slab s to find median x-coordinate and construct node

5. Split slab containing median x-coordinate and update counts

6. Recurse on each side of median x-coordinate using grid (step 3)

⇒ Grid grows to M/B+√

M/B.√

M/B = Θ(M/B) during algorithm⇒ Each node constructed in O(N(

√M/B.B) I/Os

Massive Data Algorithmics Lecture 8: Range Searching

Page 27: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

Construction of kdB-tree

Sort N points by x- and by y-coordinates using O(N/B logM/B N/B) I/Os

Building√

M/B levels (√

M/B nodes) in O(N/B) I/Os:

1. Construct√

M/B×√

M/B grid

with O(N/√

M/B) points in eachslab

2. Count number of points in eachgrid cell and store in memory

3. Find slab s with medianx-coordinate

4. Scan slab s to find median x-coordinate and construct node

5. Split slab containing median x-coordinate and update counts

6. Recurse on each side of median x-coordinate using grid (step 3)

⇒ Grid grows to M/B+√

M/B.√

M/B = Θ(M/B) during algorithm⇒ Each node constructed in O(N(

√M/B.B) I/Os

Massive Data Algorithmics Lecture 8: Range Searching

Page 28: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

kdB-tree

kdB-tree:

- Linear space- Query in O(

√N/B+T/B) I/Os

- Construction in O(N/B logM/B N/B) I/Os- Point search in O(logB N) I/Os

Dynamic:

- Deletions relatively easily in O(log2B N) I/Os (partial rebuilding)

Massive Data Algorithmics Lecture 8: Range Searching

Page 29: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

kdB-tree Insertion using Logarithmic Method

Partition pointset S into subsets S0, · · · ,SlogN , |Si|= 2i or |Si|= 0

Build kdB-tree Di on Si

Query: Query each Di ⇒ ∑logNi=0 O(

√2i/B+Ti/B) = O(

√N/B+T/B)

Insert: Find first empty Di and construct Di out of 1+∑i−1j=0 2j = 2i

elements in S0,S1, · · · ,Si−1

- O(2i/B logM/B(N/B logN) I/Os ⇒ O(1/B logM/B N/B) per moved point- Point moved O(logN) times

⇒ O(1/B logM/B(N/B logN)) = O(log2B N) I/Os amortized

Massive Data Algorithmics Lecture 8: Range Searching

Page 30: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

kdB-tree Insertion and Deletion

Insert: Use logarithmic method ignoring deletes

Delete: Simply delete point p from relevant Di

- i can be calculated based on # insertions since p was inserted- # insertions calculated by storing insertion number of each point in

separate B-tree

⇒ O(logB N) extra update cost

To maintain O(logN) structures Di

- Perform global rebuild after every Θ(N) updates

⇒ O(1/B logM/B N/B) = O(logB N) extra update cost

Massive Data Algorithmics Lecture 8: Range Searching

Page 31: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

Summary: kdB-tree

2d range searching in O(N/B) space

- Query in O(√

N/B+T/B) I/Os- Construction in O(N/B logM/B N/B) I/Os

- Updates in O(log2B N) I/Os

Optimal query among linear space structures

Massive Data Algorithmics Lecture 8: Range Searching

Page 32: Massive Data Algorithmics - SHARIF UNIVERSITY …ce.sharif.edu/courses/94-95/2/ce686-1/resources/root/...2d range searchingin O(N B log B N log B log B N) space - O(log BN+T=B) I/O

Summary/Conclusion: Tools and Techniques

Tools

- B-trees- Persistent B-trees- Buffer trees- Logarithmic method- Weight-balanced B-trees- Global rebuilding

Techniques:

- Bootstrapping- Filtering

Massive Data Algorithmics Lecture 8: Range Searching