Exercise Session 4 – Associative Data Structures

42
Exercise Session 4 – Associative Data Structures Computer Science II, D-ITET, ETH Zurich

Transcript of Exercise Session 4 – Associative Data Structures

Page 1: Exercise Session 4 – Associative Data Structures

Exercise Session 4 – Associative DataStructuresComputer Science II, D-ITET, ETH Zurich

Page 2: Exercise Session 4 – Associative Data Structures

Program Today

Feedback of last exercise

Repetition theoryAVL ConditionAVL InsertHashing

Programming Task

1

Page 3: Exercise Session 4 – Associative Data Structures

1. Feedback of last exercise

2

Page 4: Exercise Session 4 – Associative Data Structures

2. Repetition theory

3

Page 5: Exercise Session 4 – Associative Data Structures

Comparison of binary Trees

Search trees HeapsMin- / Max- Heap

Balanced trees AVL,red-black tree

in C++: std::make_heap std::map

3

4

5

7

9

16

1

2

235

7 9

16

1

4

2

3

4

5

7

9

16

1

Insertion Θ(h(T )) Θ(log n) Θ(log n)Search Θ(h(T )) Θ(n) (!!) Θ(log n)

Deletion Θ(h(T )) Search + Θ(log n) Θ(log n)

Recall: Θ(log n)≤ Θ(h(T ))≤ Θ(n)

4

Page 6: Exercise Session 4 – Associative Data Structures

Comparison of binary Trees

Search trees HeapsMin- / Max- Heap

Balanced trees AVL,red-black tree

in C++: std::make_heap std::map

3

4

5

7

9

16

1

2

235

7 9

16

1

4

2

3

4

5

7

9

16

1

Insertion Θ(h(T )) Θ(log n) Θ(log n)Search Θ(h(T )) Θ(n) (!!) Θ(log n)

Deletion Θ(h(T )) Search + Θ(log n) Θ(log n)Recall: Θ(log n)≤ Θ(h(T ))≤ Θ(n)

4

Page 7: Exercise Session 4 – Associative Data Structures

AVL Condition

AVL Condition: for eacn node v of a treebal(v) ∈ {−1, 0, 1}

v

Tl(v)

Tr(v)

h h + 1

h + 2

5

Page 8: Exercise Session 4 – Associative Data Structures

Balance at Insertion Point

=⇒

+1 0p p

n

case 1: bal(p) = +1

=⇒

−1 0p p

n

case 2: bal(p) = −1

Finished in both cases because the subtree height did not change

6

Page 9: Exercise Session 4 – Associative Data Structures

Balance at Insertion Point

=⇒

0 +1p p

n

case 3.1: bal(p) = 0 right

=⇒

0 −1p p

n

case 3.2: bal(p) = 0, left

Not finished in both case. Call of upin(p)

7

Page 10: Exercise Session 4 – Associative Data Structures

upin(p) - invariant

When upin(p) is called it holds thatthe subtree from p is grown andbal(p) ∈ {−1, +1}

8

Page 11: Exercise Session 4 – Associative Data Structures

upin(p)

Assumption: p is left son of pp1

=⇒

pp +1 pp 0

p p

case 1: bal(pp) = +1, done.

=⇒

pp 0 pp −1

p p

case 2: bal(pp) = 0, upin(pp)

In both cases the AVL-Condition holds for the subtree from pp

1If p is a right son: symmetric cases with exchange of +1 and −19

Page 12: Exercise Session 4 – Associative Data Structures

upin(p)

Assumption: p is left son of pp

pp −1

p

case 3: bal(pp) = −1,

This case is problematic: adding n to the subtree from pp has violated theAVL-condition. Re-balance!Two cases bal(p) = −1, bal(p) = +1

10

Page 13: Exercise Session 4 – Associative Data Structures

Rotationscase 1.1 bal(p) = −1. 2

y

x

t1

t2

t3

pp −2

p −1

h

h− 1

h− 1

h + 2 h

=⇒rotation

right

x

y

t1 t2 t3

pp 0

p 0

h h− 1 h− 1

h + 1 h + 1

2p right son: ⇒ bal(pp) = bal(p) = +1, left rotation11

Page 14: Exercise Session 4 – Associative Data Structures

Rotationscase 1.1 bal(p) = −1. 3

z

x

y

t1t2 t3

t4

pp −2

p +1

h −1/ + 1

h− 1

h− 1h− 2

h− 2h− 1

h− 1

h + 2 h

=⇒doublerotationleft-right

y

x z

t1

t2 t3t4

pp 0

0/− 1 +1/0

h− 1 h− 1h− 2

h− 2h− 1

h− 1

h + 1

3p right son⇒ bal(pp) = +1, bal(p) = −1, double rotation right left12

Page 15: Exercise Session 4 – Associative Data Structures

Quiz

In the following AVL tree, insert key 12 and rebalance (as shown in class).What does the AVL tree look like after the operation that has been shownin class?

30

10

3

1

17

14 19

50

40 60

13

Page 16: Exercise Session 4 – Associative Data Structures

Solution

17

10

3

1

14

12

30

19 50

40 60

14

Page 17: Exercise Session 4 – Associative Data Structures

Hashing well-done

Useful Hashing. . .distributes the keys as uniformly as possible in the hash table.avoids probing over long areas of used entries(e.g. primary clustering).

15

Page 18: Exercise Session 4 – Associative Data Structures

Hashing Examples

Insert the keys 25, 4, 17, 45 into the hash table, using the functionh(k) = k mod 7 and probing to the right, h(k) + s(j, k):

linear probing,s(j, k) = j.Double Hashing,s(j, k) = j · (1 + (k mod 5)).

0 1 2 3 4 5 6

25 417 45

254 17 45

16

Page 19: Exercise Session 4 – Associative Data Structures

Hashing Examples

Insert the keys 25, 4, 17, 45 into the hash table, using the functionh(k) = k mod 7 and probing to the right, h(k) + s(j, k):

linear probing,s(j, k) = j.Double Hashing,s(j, k) = j · (1 + (k mod 5)).

0 1 2 3 4 5 6

25

417 45

254 17 45

16

Page 20: Exercise Session 4 – Associative Data Structures

Hashing Examples

Insert the keys 25, 4, 17, 45 into the hash table, using the functionh(k) = k mod 7 and probing to the right, h(k) + s(j, k):

linear probing,s(j, k) = j.Double Hashing,s(j, k) = j · (1 + (k mod 5)).

0 1 2 3 4 5 6

25 4

17 45

254 17 45

16

Page 21: Exercise Session 4 – Associative Data Structures

Hashing Examples

Insert the keys 25, 4, 17, 45 into the hash table, using the functionh(k) = k mod 7 and probing to the right, h(k) + s(j, k):

linear probing,s(j, k) = j.Double Hashing,s(j, k) = j · (1 + (k mod 5)).

0 1 2 3 4 5 6

25 417

45

254 17 45

16

Page 22: Exercise Session 4 – Associative Data Structures

Hashing Examples

Insert the keys 25, 4, 17, 45 into the hash table, using the functionh(k) = k mod 7 and probing to the right, h(k) + s(j, k):

linear probing,s(j, k) = j.Double Hashing,s(j, k) = j · (1 + (k mod 5)).

0 1 2 3 4 5 6

25 417 45

254 17 45

16

Page 23: Exercise Session 4 – Associative Data Structures

Hashing Examples

Insert the keys 25, 4, 17, 45 into the hash table, using the functionh(k) = k mod 7 and probing to the right, h(k) + s(j, k):

linear probing,s(j, k) = j.Double Hashing,s(j, k) = j · (1 + (k mod 5)).

0 1 2 3 4 5 6

25 417 45

25

4 17 45

16

Page 24: Exercise Session 4 – Associative Data Structures

Hashing Examples

Insert the keys 25, 4, 17, 45 into the hash table, using the functionh(k) = k mod 7 and probing to the right, h(k) + s(j, k):

linear probing,s(j, k) = j.Double Hashing,s(j, k) = j · (1 + (k mod 5)).

0 1 2 3 4 5 6

25 417 45

254

17 45

16

Page 25: Exercise Session 4 – Associative Data Structures

Hashing Examples

Insert the keys 25, 4, 17, 45 into the hash table, using the functionh(k) = k mod 7 and probing to the right, h(k) + s(j, k):

linear probing,s(j, k) = j.Double Hashing,s(j, k) = j · (1 + (k mod 5)).

0 1 2 3 4 5 6

25 417 45

254 17

45

16

Page 26: Exercise Session 4 – Associative Data Structures

Hashing Examples

Insert the keys 25, 4, 17, 45 into the hash table, using the functionh(k) = k mod 7 and probing to the right, h(k) + s(j, k):

linear probing,s(j, k) = j.Double Hashing,s(j, k) = j · (1 + (k mod 5)).

0 1 2 3 4 5 6

25 417 45

254 17 45

16

Page 27: Exercise Session 4 – Associative Data Structures

3. Programming Task

17

Page 28: Exercise Session 4 – Associative Data Structures

Finding a Sub-Array

Given: two integer arrays A = (a0, . . . , an−1) and B = (b0, . . . , bk−1)Task: Find position of B in A.

Naive: Loop through A, check whether the following k entries match B.

O(nk) comparison operations

Solution using hashing: Calculate hash h(B) and compare it toh((ai, ai+1, . . . , ai+k−1)).Avoid re-computing h((ai, ai+1, . . . , ai + k − 1) for each i =⇒ O(n)expected

18

Page 29: Exercise Session 4 – Associative Data Structures

Finding a Sub-Array

Given: two integer arrays A = (a0, . . . , an−1) and B = (b0, . . . , bk−1)Task: Find position of B in A.Naive: Loop through A, check whether the following k entries match B.

O(nk) comparison operations

Solution using hashing: Calculate hash h(B) and compare it toh((ai, ai+1, . . . , ai+k−1)).Avoid re-computing h((ai, ai+1, . . . , ai + k − 1) for each i =⇒ O(n)expected

18

Page 30: Exercise Session 4 – Associative Data Structures

Finding a Sub-Array

Given: two integer arrays A = (a0, . . . , an−1) and B = (b0, . . . , bk−1)Task: Find position of B in A.Naive: Loop through A, check whether the following k entries match B.

O(nk) comparison operations

Solution using hashing: Calculate hash h(B) and compare it toh((ai, ai+1, . . . , ai+k−1)).Avoid re-computing h((ai, ai+1, . . . , ai + k − 1) for each i =⇒ O(n)expected

18

Page 31: Exercise Session 4 – Associative Data Structures

Finding a Sub-Array

Given: two integer arrays A = (a0, . . . , an−1) and B = (b0, . . . , bk−1)Task: Find position of B in A.Naive: Loop through A, check whether the following k entries match B.

O(nk) comparison operations

Solution using hashing: Calculate hash h(B) and compare it toh((ai, ai+1, . . . , ai+k−1)).Avoid re-computing h((ai, ai+1, . . . , ai + k − 1) for each i =⇒ O(n)expected

18

Page 32: Exercise Session 4 – Associative Data Structures

Sliding Window Hash

Possible hash function: sum of all elements:

Can be updated easily: subtract ai and add ai+k.However: bad hash function

Better:

Hc,m((ai, · · · , ai+k−1)) =k−1∑

j=0ai+j · ck−j−1

mod m

c = 1021 prime numberm = 215 int, no overflows at calculations

19

Page 33: Exercise Session 4 – Associative Data Structures

Sliding Window Hash

Possible hash function: sum of all elements:

Can be updated easily: subtract ai and add ai+k.However: bad hash function

Better:

Hc,m((ai, · · · , ai+k−1)) =k−1∑

j=0ai+j · ck−j−1

mod m

c = 1021 prime numberm = 215 int, no overflows at calculations

19

Page 34: Exercise Session 4 – Associative Data Structures

Computing with Modulo

(a + b) mod m = ((a mod m) + (b mod m)) mod m

(a− b) mod m = ((a mod m)− (b mod m) + m) mod m

(a · b) mod m = ((a mod m) · (b mod m)) mod m

Exercise: Compute

12746357 mod 11

20

Page 35: Exercise Session 4 – Associative Data Structures

Computing Modulo

Exercise: Compute

12746357 mod 11

= (7 + 5 · 10 + 3 · 102 + 6 · 103 + 4 · 104 + 7 · 105 + 2 · 106 + 1 · 107) mod 11= (7 + 50 + 3 + 60 + 4 + 70 + 2 + 10) mod 11= (7 + 6 + 3 + 5 + 4 + 4 + 2 + 10) mod 11= 8 mod 11.

For the second equality we used the fact that 102 mod 11 = 1.

21

Page 36: Exercise Session 4 – Associative Data Structures

Computing Modulo

Exercise: Compute

12746357 mod 11= (7 + 5 · 10 + 3 · 102 + 6 · 103 + 4 · 104 + 7 · 105 + 2 · 106 + 1 · 107) mod 11

= (7 + 50 + 3 + 60 + 4 + 70 + 2 + 10) mod 11= (7 + 6 + 3 + 5 + 4 + 4 + 2 + 10) mod 11= 8 mod 11.

For the second equality we used the fact that 102 mod 11 = 1.

21

Page 37: Exercise Session 4 – Associative Data Structures

Computing Modulo

Exercise: Compute

12746357 mod 11= (7 + 5 · 10 + 3 · 102 + 6 · 103 + 4 · 104 + 7 · 105 + 2 · 106 + 1 · 107) mod 11= (7 + 50 + 3 + 60 + 4 + 70 + 2 + 10) mod 11

= (7 + 6 + 3 + 5 + 4 + 4 + 2 + 10) mod 11= 8 mod 11.

For the second equality we used the fact that 102 mod 11 = 1.

21

Page 38: Exercise Session 4 – Associative Data Structures

Computing Modulo

Exercise: Compute

12746357 mod 11= (7 + 5 · 10 + 3 · 102 + 6 · 103 + 4 · 104 + 7 · 105 + 2 · 106 + 1 · 107) mod 11= (7 + 50 + 3 + 60 + 4 + 70 + 2 + 10) mod 11= (7 + 6 + 3 + 5 + 4 + 4 + 2 + 10) mod 11

= 8 mod 11.

For the second equality we used the fact that 102 mod 11 = 1.

21

Page 39: Exercise Session 4 – Associative Data Structures

Computing Modulo

Exercise: Compute

12746357 mod 11= (7 + 5 · 10 + 3 · 102 + 6 · 103 + 4 · 104 + 7 · 105 + 2 · 106 + 1 · 107) mod 11= (7 + 50 + 3 + 60 + 4 + 70 + 2 + 10) mod 11= (7 + 6 + 3 + 5 + 4 + 4 + 2 + 10) mod 11= 8 mod 11.

For the second equality we used the fact that 102 mod 11 = 1.

21

Page 40: Exercise Session 4 – Associative Data Structures

Sliding Window Hash

template<typename It1, typename It2>It1 findOccurrence(const It1 from, const It1 to,

const It2 begin, const It2 end){

const unsigned k = end - begin;const unsigned M = 32768;const unsigned C = 1021;

// your code here// ...

22

Page 41: Exercise Session 4 – Associative Data Structures

Sliding Window Hash

// elements can be compared using std::equal:if(std::equal(window_left, window_right, begin, end))

return current;

// if no occurrence is found return end of arrayreturn to;

}

23

Page 42: Exercise Session 4 – Associative Data Structures

Sliding Window Hash

Make sure thatthe algorithm computes ck only once,all computations are modulo m for all values in order not to get anoverflow (recall the rules of modular arithmetic), andthe values are always positive (e.g., by adding multiples of m).

24