data structures and algorithms 2020 09 17 lecture 6femke/courses/ds/slides/6.pdf · 2020. 9....

data structures and algorithms2020 09 17

lecture 6

http://www.cs.vu.nl/~tcs/ds

heaps and heapsort on n elements

height of a heap is in Θ(log n)

building a heap bottum-up in O(n)

analysis via picture (learn) or using summations (know result)

building a heap one-by-one in O(n log n)

maxheapify in O(log n) (also Θ(log n))

heapsort in O(n log n); also in Θ(n log n)?

yes, see worst-case input or see lower-bound

more: see smooth sort

https://en.wikipedia.org/wiki/Smoothsort

use of priority queue: Dijkstra’s algorithminput: directed graph G = (V ,E ) with positive weights w and start vertexs in G

Algorithm Dijkstra(G ,w , s):

initialize(G , s)

S := ∅Q := G .V

while Q 6= ∅ dou := extractMin(Q)

S := S ∪ {u}for each v ∈ G .Adj [u] do

relax(u, v ,w)

the set S contains vertices for which the weight of a shortest path has beenfoundif priority queue implemented with heap then in O(|E | · log |V |)

quicksort on n elements

worst-case in Θ(n2)

best-case in Θ(n log n)

balanced-case in O(n log n)

average case (assuming all n! possible inputs are equally likely) in O(n log n)

expected case (using randomization) in O(n log n)

more: replace small recursive calls by insertion sort and sorting networks

https://arxiv.org/abs/1505.01962

overview

lower bound on sorting

counting sort

radix sort

bucket sort

recall: bounds

asymptotic upper bound f ∈ O(g)f is eventually bound from above by g up to a constant

asymptotic tight bound f ∈ Θ(g)f is eventually sandwiched by g up to two constants

asymptotic lower bound f ∈ Ω(g)f is eventually bound from below by g up to a constant

(there are more bounds!)

upper and lower bound

f , g : N→ R+

asymptotic upper bound: f (n) ∈ O(g(n)) if

∃c > 0 ∈ R+∃n0 > 0 ∈ N∀n ∈ N : n ≥ n0 ⇒ f (n) ≤ c · g(n)

asymptotic lower bound: f (n) ∈ Ω(g(n)) if

∃c > 0 ∈ R+∃n0 > 0 ∈ N∀n ∈ N : n ≥ n0 ⇒ f (n) ≥ c · g(n)

question

comparison-based sorting: the crucial tests are

a < b, a ≤ b, a = b, a ≥ b, a > b

which are all decidable in elementary time

so far we have seen only comparison-based sorting algorithms

is O(n log n) the best worst-case that we have?

lower bound on running time

we consider sorting algorithms based on comparisons k < k ′

examples: merge sort, heapsort, quicksort

non-examples: counting sort, bucket sort

Ω(n log n) comparisons are needed in the worst case

so merge sort, heapsort are asymptotically optimal

lowerbound: proof

we consider decision trees

node contains a comparison a < b?

leaf corresponds to a permutation of {1, . . . .n}

every comparison-based sorting algorithm has for every number of inputs adecision tree

execution of that sorting algorithm corresponds to a path in the tree

every possible permutation (of total n!) must occur

every permutation must be reachable

decision tree: example

a c b c a b b a c b c a

c b a

a < b?

b < c? b < c?

a < c? a < c?a b c

we analyze the height of a decision tree with n nodes

the number of comparisons is height h of the tree

we have at least n! (because decision tree),

and at most 2h (because binary tree) leaves,

hence:

n! ≤ #leaves ≤ 2h

n! ≤ 2h

log n! ≤ hh ≥ log n!

we continue to find a bound on the height

we have h ≥ dlog n!e

omitting floors and ceilings we find:

h ≥ log(n!)= log(1 · 2 · . . . · (n2 − 1) ·

n2 . . . · n)

≥ log(1 · 1 · . . . · 1 · n2 . . . ·n2 )

= log((n2 )n2 )

= n2 log(n2 )

hence h ∈ Ω(n log n)(see also 3.19 in the book that uses Stirling’s approximation)

results

Theorem: any comparison-based sorting algorithm uses Ω(n log n)comparisons in the worst case.

Consequence: heapsort and merge sort are asymptotically optimal

However: for specific inputs there are linear sorting algorithms,

not based on comparisons

back to puzzle from Knuth

can we sort 5 elements in dlog 5!e = 7 comparisons?

overview


counting sort

radix sort

bucket sort

counting sort

assumption: numbers in input come from fixed range {0, . . . , k}.

algorithm idea: count the number of occurrences of each i from 0 to k

time complexity: in Θ(n + k) for a input-array of length n

drawback: fixed range, and requires additional counting array C andoutputarray B

counting sort is a stable sorting algorithm

what happens if the last loop is done for j := 1 up to j := A.length?

counting sort: pseudo-code

input array A, output array B, range from 0 up to k

Algorithm countingSort(A,B, k):

new arrayC [0 . . . k]

for i := 0 to k do

C [i ] := 0

for j := 1 to A.length do

C [A[j ]] := C [A[j ]] + 1

for i := 1 to k do

C [i ] := C [i ] + C [i − 1]for j := A.length downto 1 do

B[C [A[j ]]] := A[j ]

C [A[j ]] := C [A[j ]]− 1

overview


counting sort

radix sort

bucket sort

radix sort

radix sort: intuition

the old days: used for sorting punched cards

80 columns with one hole in one of the 12 possible places

our use: sorting numbers considered as tuples

number of columns: (fixed) amout of digits used

number of places: 10 for decimal numbers

329 457 329 329457 657 839 457657 329 457 657839 839 657 839

recall the lexicographic ordering for tuples: (x1, . . . , xd) < (y1, . . . , yd) if(x1 < y1) or ((x1 = y1) and (x2, . . . , xn) < (y2, . . . , yn))

radix sort: ‘pseudo-pseudocode’

intuition: sort per dimension using a stable sorting algorithm

Algorithm radixSort(A, d):

for i := 1 to d do

use some stable sort on digit d

1 is the lowest-order digit and d is the highest-order digit

what happens if we take the other order?

radix sort: time complexity

if counting sort (stable!) per dimension is in Θ(n + k)

then radix sort is in Θ(d(n + k))

if d is constant and k is in O(n) then radix sort is in linear time

is radix sort preferable to comparison-based sorting?

application: order {0, . . . , 8} in representation with basis 3:

00, 01, 02, 10, 11, 12, 20, 21, 22

overview


counting sort

radix sort

bucket sort

bucket sort

similar to counting sort

assumption for correctness: keys in [0, 1)

assumption for time complexity: key uniformly distributed over [0, 1)

an elementary operation on the key gives the index

several keys can belong to the same index

bucket sort: pseudo-pseudocode

array A with 0 ≤ A[i ] < 1 for i = 1, . . . ,A.length

Algorithm bucketSort(A):

n := A.length

new arrayB[0 . . . n − 1]for i := 0 to n − 1 do

makeB[i ] an empty list init B

for i := 1 to n do

insertA[i ] into listB[bn · A[i ]c]for i := 0 to n − 1 do

insertionSort(B[i ])

concatenate B[0],B[1], . . . ,B[n − 1]

bucket sort: average-case time complexity

bucket sort on an input-array A of length n

elements of A in [0, 1)

elements of A uniformly distributed over [0, 1)

average-case time complexity (without proof): in Θ(n)

worst-case time complexity in Θ(n2); why ?

can we improve the worst-case time complexity ?

lecture 6lower bound on sortingcounting sortradix sortbucket sort

data structures and algorithms 2020 09 17 lecture 6femke/courses/ds/slides/6.pdf · 2020. 9....

Documents

Transcript of data structures and algorithms 2020 09 17 lecture 6femke/courses/ds/slides/6.pdf · 2020. 9....