Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree...

36
Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n: E.G.M. Petrakis B-trees 1 MST of order n: Each node has n or fewer sub-trees S1 S2…. Sm , m n Each node has n-1 or fewer keys K1 Κ2 …Κm-1 : m-1 keys in ascending order K(Si) Κi K(Si+1) , K(Sm-1) < K(Sm)

Transcript of Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree...

Page 1: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

Multiway Search Tree (MST)

� Generalization of BSTs� Suitable for disk� MST of order n:

E.G.M. Petrakis B-trees 1

� MST of order n: � Each node has n or fewer sub-trees� S1 S2…. Sm , m ≤ n

� Each node has n-1 or fewer keys

� K1 Κ2 …Κm-1 : m-1 keys in ascending order K(Si) ≤ Κi ≤ K(Si+1) , K(Sm-1) < K(Sm)

Page 2: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

n=4

n=3

E.G.M. Petrakis B-trees 2

Page 3: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

MSTs for Disk

� Nodes correspond to disk pages

� Pros:� tree height is low for large n

E.G.M. Petrakis B-trees 3

� tree height is low for large n

� fewer disk accesses

� Cons:� low space utilization if non-full

� MSTs are non-balanced in general!

Page 4: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

4000 keys, n=5

� At least 4000/(5-1) nodes (pages)� 1st level (root): 1 node, 4 keys, 5 sub-trees +� 2nd level: 5 nodes, 20 keys, 25 sub-trees +� 3rd level: 25 nodes, 100 keys, 125 sub-trees +

E.G.M. Petrakis B-trees 4

� 3rd level: 25 nodes, 100 keys, 125 sub-trees +� 4th level: 125 nodes, 500 keys,525 sub-trees +� 5th level: 525 nodes, 2100 keys, 2625 sub-

tress +� 6th level: 2625 nodes, 10500 keys, …. � tree height = 6 (including root)

� If n = 11 at least 400 nodes and tree height = 3

Page 5: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

Operations on MSTs

� Search: returns pointer to node containing the key and position of key in the node

E.G.M. Petrakis B-trees 5

key in the node

� Insert new key if not already in the tree

� Delete existing key

Page 6: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

Important Issues

� Keep MST balanced after insertions or deletions

� Balanced MSTs: B-trees, B+-trees

E.G.M. Petrakis B-trees 6

� Balanced MSTs: B-trees, B+-trees

� Reduce number of disk accesses

� Data storage: two alternatives 1. inside nodes: less sub-trees, nodes

2. pointers from the nodes to data pages

Page 7: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

Search Algorithm

MST *search (MST *tree, int n, keytype key, int *position) {

MST *p = tree;

while (p != null) { // search a node

i = nodesearch(p, key);

E.G.M. Petrakis B-trees 7

i = nodesearch(p, key);

if (i < numtrees(p) – 1 && key == key(p, i))

{ *position = i; return p; }

p = son(p, i);

}

position = -1;

return(-1);

}

Page 8: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

E.G.M. Petrakis B-trees 8

Page 9: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

Insertions

� Search & insert key it if not there a) the tree grows at the leafs => the tree

becomes imbalanced => not equal number of disk accesses for every key

E.G.M. Petrakis B-trees 9

becomes imbalanced => not equal number of disk accesses for every key

b) the shape of the tree depends on the insertion sequence

c) low storage utilization, many non-full nodes

Page 10: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

70 75 82

n = 4

max 3 keys/node

insert: 70 75 82 77 71 73 84 86 87 85

E.G.M. Petrakis B-trees 10

70 75 82

84 86 87

85

7771 73

Page 11: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

MST Deletions

� Find and delete a key from a node� free the node if empty

� If the key has right or/and left sub-trees => find its successor or

E.G.M. Petrakis B-trees 11

trees => find its successor orpredecessor � min value of right subtree or max value of

left subtree

� put this in the position of the key and delete the successor or predecessor

� The tree becomes imbalanced

Page 12: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

60 180 300

30 50 150 170 220 280

n=4

153 162 187 202173 178

E.G.M. Petrakis B-trees 12

delete 150, 180

60 187 300

30 50 153 170 220 280

162 202173 178

Page 13: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

Definition of n-Order B-Tree

� Every path from the root to a leaf has the same length h >= 0

� Each node except the root and the leafs has at least sub-trees and keysn

1n

E.G.M. Petrakis B-trees 13

has at least sub-trees and keys

� Each node has at most n sub-trees and n-1 keys

� The root has at least 1 node and 2 sub-trees or it is a leaf

2n

12

n−

Page 14: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

22

9 17 30 40

4 8 12 1318 12 13 12 13

n=3

13 25 70n=5

E.G.M. Petrakis B-trees 14

4 8 9 12 17 18 22 27 30 32

4 8 9 12 13 17 22 25 27 30 32

18 50n=7

Page 15: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

B-Tree Insertions

� Find leaf to insert new node� If not full, insert key in proper position else

� Create a new node and split contents of old node� left and right node

E.G.M. Petrakis B-trees 15

old node� left and right node� the n/2 lower keys go into the left node� the n/2 larger keys go into the right node� the separator key goes up to the father node

� If n is even: one more key in left (left bias) or right node (right bias)

Page 16: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

n=5

E.G.M. Petrakis B-trees 16

Page 17: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

n=4

E.G.M. Petrakis B-trees 17

Page 18: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

B-Tree Insertions (cont.)

� If the father node is full: split it and proceed the same way, the new father may split again if it is full …

� The splits may reach the root which

E.G.M. Petrakis B-trees 18

� The splits may reach the root which may also split creating a new root

� The changes are propagated from the leafs to the root

� The tree remains balanced

Page 19: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

n=5

E.G.M. Petrakis B-trees 19

Page 20: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

Advantages

� The tree remains balanced!!

� Good storage utilization: about 50%

� Low tree height, equal for all leafs

E.G.M. Petrakis B-trees 20

� Low tree height, equal for all leafs => same number of disk accesses for all leafs

� B-trees: MSTs with additional properties� special insertion & deletion algorithms

Page 21: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

B-Tree Deletions

Two cases:a) Merging: If less than n/2 keys in left and right

node then, sum of keys + separator key in father node < n� the two nodes are merged,

E.G.M. Petrakis B-trees 21

� the two nodes are merged, � one node is freed

b) Borrowing: if after deletion the node has less than n/2 keys and its left (or right) node has more than n/2 keys� find successor of separator key � put separator in the place of the deleted key � put the successor as separator key

Page 22: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

n = 4

E.G.M. Petrakis B-trees 22

n = 5

Page 23: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

n = 5

E.G.M. Petrakis B-trees 23

n = 5

Page 24: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

Performance

discaccesses

n N

450

4 65 73

3

2 43

100 2 32 4

10

3

103 104 105 106 107

E.G.M. Petrakis B-trees 24

[logn (N+1) ] ≤ h ≤[log(n+1)/2 ((N+1)/2) ] +1

� Search: the cost increases with the lognN

� Tree height ~ lognN

3

3

100 2 32 4

2 432150

Page 25: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

Performance (cont.)

� Insertions/Deletions: Cost proportional to 2h� the changes propagate towards the

root

E.G.M. Petrakis B-trees 25

root

� Observation: The cost drops with n� larger size of track (page) but, in fact

the cost increases with the amount of information which is transferred from the disk into the main memory

Page 26: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

Problem

� B-trees are good for randomaccesses � E.g., search(30)

E.G.M. Petrakis B-trees 26

� E.g., search(30)

� But not for range queries: require multiple tree traversals� search(20-30)

Page 27: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

B+-Trees

� At most n sub-trees and n-1 keys

� At least sub-trees and keys

� Root: at least 2 sub-trees and 1 key

12

n−

2n

E.G.M. Petrakis B-trees 27

� Root: at least 2 sub-trees and 1 key

� The keys can be repeated in non-leaf nodes

� Only the leafs point to data pages

� The leafs are linked together with pointers

Page 28: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

index

E.G.M. Petrakis B-trees 28

data pages

Page 29: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

Performance

� Better than B-trees for range searching� B-trees are better for random accesses� The search must reach a leaf before it is

confirmed

E.G.M. Petrakis B-trees 29

confirmed� internal keys may not correspond to actual

record information (can be separator keys only)

� insertions: leave middle key in father node� deletions: do not always delete key from

internal node (if it is a separator key)

Page 30: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

Insertions

� Find appropriate leaf node to insert key� If it is full:� allocate new node� split its contents and �

E.G.M. Petrakis B-trees 30

� split its contents and � insert separator key in father node

� If the father is full� allocate new node and split the same way� continue upwards if necessary � if the root is split create new root with two

sub-trees

Page 31: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

Deletions

� Find and delete key from the leaf� If the leaf has < n/2 keys

a) borrowing if its neighbor leaf has more than n/2 keys update father node (the separator

E.G.M. Petrakis B-trees 31

n/2 keys update father node (the separator key may change) or

b) merging with neighbor if both have < n keys � causes deletion of separator in father node� update father node

� Continue upwards if father node is not the root and has less than n/2 keys

Page 32: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

46 69

19 31 46 61 69 82

1319

2831

4146

505361

6669

757982

8488

index

E.G.M. Petrakis B-trees 32

19 31 46 5361

69 7982

88

data pages

data pages: min 2, max 3 records

index pages: min 2, max 3 pointers

B+ order: n = 4, n/2 =2

data pages don’t store pointers as non-leaf nodes do but they are kinked together

Page 33: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

6953 61

5053 55

61

insert 55 with page splitting

delete 84 with borrowing:

E.G.M. Petrakis B-trees 33

delete 84 with borrowing:

79

7579 82

88

Page 34: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

delete 31 with merging

19 46

284146

freepage

E.G.M. Petrakis B-trees 34

46 page

Page 35: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

B-trees and Variants

� Many variants: � B-trees: Bayer and Mc Greight, 1971� B+-trees, B*-trees the most successful

variants

� The most successful data organization

E.G.M. Petrakis B-trees 35

� The most successful data organization for secondary storage (based on primary key)� very good performance, comparable to

hashing� fully dynamic behavior� good space utilization (69% on the average)

Page 36: Multiway Search Tree (MST) - TUCpetrakis/courses/datastructures/btrees.pdf · Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n : E.G.M. Petrakis

B+/B-Trees Comparison

� B-trees: no key repetition, better for random accesses (do not always reach a leaf), data pages on any node

E.G.M. Petrakis B-trees 36

reach a leaf), data pages on any node

� B+-trees: key repetition, data page on leaf nodes only, better for range queries, easier implementation