data structures and algorithms 2020 09 17 lecture 6femke/courses/ds/slides/6.pdf · 2020. 9....
Transcript of data structures and algorithms 2020 09 17 lecture 6femke/courses/ds/slides/6.pdf · 2020. 9....
-
data structures and algorithms2020 09 17
lecture 6
http://www.cs.vu.nl/~tcs/ds
-
heaps and heapsort on n elements
height of a heap is in Θ(log n)
building a heap bottum-up in O(n)
analysis via picture (learn) or using summations (know result)
building a heap one-by-one in O(n log n)
maxheapify in O(log n) (also Θ(log n))
heapsort in O(n log n); also in Θ(n log n)?
yes, see worst-case input or see lower-bound
more: see smooth sort
https://en.wikipedia.org/wiki/Smoothsort
-
use of priority queue: Dijkstra’s algorithminput: directed graph G = (V ,E ) with positive weights w and start vertexs in G
Algorithm Dijkstra(G ,w , s):
initialize(G , s)
S := ∅Q := G .V
while Q 6= ∅ dou := extractMin(Q)
S := S ∪ {u}for each v ∈ G .Adj [u] do
relax(u, v ,w)
the set S contains vertices for which the weight of a shortest path has beenfoundif priority queue implemented with heap then in O(|E | · log |V |)
-
quicksort on n elements
worst-case in Θ(n2)
best-case in Θ(n log n)
balanced-case in O(n log n)
average case (assuming all n! possible inputs are equally likely) in O(n log n)
expected case (using randomization) in O(n log n)
more: replace small recursive calls by insertion sort and sorting networks
https://arxiv.org/abs/1505.01962
-
overview
lower bound on sorting
counting sort
radix sort
bucket sort
-
overview
lower bound on sorting
counting sort
radix sort
bucket sort
-
recall: bounds
asymptotic upper bound f ∈ O(g)f is eventually bound from above by g up to a constant
asymptotic tight bound f ∈ Θ(g)f is eventually sandwiched by g up to two constants
asymptotic lower bound f ∈ Ω(g)f is eventually bound from below by g up to a constant
(there are more bounds!)
-
upper and lower bound
f , g : N→ R+
asymptotic upper bound: f (n) ∈ O(g(n)) if
∃c > 0 ∈ R+∃n0 > 0 ∈ N∀n ∈ N : n ≥ n0 ⇒ f (n) ≤ c · g(n)
asymptotic lower bound: f (n) ∈ Ω(g(n)) if
∃c > 0 ∈ R+∃n0 > 0 ∈ N∀n ∈ N : n ≥ n0 ⇒ f (n) ≥ c · g(n)
-
question
comparison-based sorting: the crucial tests are
a < b, a ≤ b, a = b, a ≥ b, a > b
which are all decidable in elementary time
so far we have seen only comparison-based sorting algorithms
is O(n log n) the best worst-case that we have?
-
lower bound on running time
we consider sorting algorithms based on comparisons k < k ′
examples: merge sort, heapsort, quicksort
non-examples: counting sort, bucket sort
Ω(n log n) comparisons are needed in the worst case
so merge sort, heapsort are asymptotically optimal
-
lowerbound: proof
we consider decision trees
node contains a comparison a < b?
leaf corresponds to a permutation of {1, . . . .n}
every comparison-based sorting algorithm has for every number of inputs adecision tree
execution of that sorting algorithm corresponds to a path in the tree
every possible permutation (of total n!) must occur
every permutation must be reachable
-
decision tree: example
a c b c a b b a c b c a
c b a
a < b?
b < c? b < c?
a < c? a < c?a b c
-
we analyze the height of a decision tree with n nodes
the number of comparisons is height h of the tree
we have at least n! (because decision tree),
and at most 2h (because binary tree) leaves,
hence:
n! ≤ #leaves ≤ 2h
n! ≤ 2h
log n! ≤ hh ≥ log n!
-
we continue to find a bound on the height
we have h ≥ dlog n!e
omitting floors and ceilings we find:
h ≥ log(n!)= log(1 · 2 · . . . · (n2 − 1) ·
n2 . . . · n)
≥ log(1 · 1 · . . . · 1 · n2 . . . ·n2 )
= log((n2 )n2 )
= n2 log(n2 )
hence h ∈ Ω(n log n)(see also 3.19 in the book that uses Stirling’s approximation)
-
results
Theorem: any comparison-based sorting algorithm uses Ω(n log n)comparisons in the worst case.
Consequence: heapsort and merge sort are asymptotically optimal
However: for specific inputs there are linear sorting algorithms,
not based on comparisons
-
back to puzzle from Knuth
can we sort 5 elements in dlog 5!e = 7 comparisons?
-
overview
lower bound on sorting
counting sort
radix sort
bucket sort
-
counting sort
assumption: numbers in input come from fixed range {0, . . . , k}.
algorithm idea: count the number of occurrences of each i from 0 to k
time complexity: in Θ(n + k) for a input-array of length n
drawback: fixed range, and requires additional counting array C andoutputarray B
counting sort is a stable sorting algorithm
what happens if the last loop is done for j := 1 up to j := A.length?
-
counting sort: pseudo-code
input array A, output array B, range from 0 up to k
Algorithm countingSort(A,B, k):
new arrayC [0 . . . k]
for i := 0 to k do
C [i ] := 0
for j := 1 to A.length do
C [A[j ]] := C [A[j ]] + 1
for i := 1 to k do
C [i ] := C [i ] + C [i − 1]for j := A.length downto 1 do
B[C [A[j ]]] := A[j ]
C [A[j ]] := C [A[j ]]− 1
-
overview
lower bound on sorting
counting sort
radix sort
bucket sort
-
radix sort
-
radix sort: intuition
the old days: used for sorting punched cards
80 columns with one hole in one of the 12 possible places
our use: sorting numbers considered as tuples
number of columns: (fixed) amout of digits used
number of places: 10 for decimal numbers
329 457 329 329457 657 839 457657 329 457 657839 839 657 839
recall the lexicographic ordering for tuples: (x1, . . . , xd) < (y1, . . . , yd) if(x1 < y1) or ((x1 = y1) and (x2, . . . , xn) < (y2, . . . , yn))
-
radix sort: ‘pseudo-pseudocode’
intuition: sort per dimension using a stable sorting algorithm
Algorithm radixSort(A, d):
for i := 1 to d do
use some stable sort on digit d
1 is the lowest-order digit and d is the highest-order digit
what happens if we take the other order?
-
radix sort: time complexity
if counting sort (stable!) per dimension is in Θ(n + k)
then radix sort is in Θ(d(n + k))
if d is constant and k is in O(n) then radix sort is in linear time
is radix sort preferable to comparison-based sorting?
application: order {0, . . . , 8} in representation with basis 3:
00, 01, 02, 10, 11, 12, 20, 21, 22
-
overview
lower bound on sorting
counting sort
radix sort
bucket sort
-
bucket sort
similar to counting sort
assumption for correctness: keys in [0, 1)
assumption for time complexity: key uniformly distributed over [0, 1)
an elementary operation on the key gives the index
several keys can belong to the same index
-
bucket sort: pseudo-pseudocode
array A with 0 ≤ A[i ] < 1 for i = 1, . . . ,A.length
Algorithm bucketSort(A):
n := A.length
new arrayB[0 . . . n − 1]for i := 0 to n − 1 do
makeB[i ] an empty list init B
for i := 1 to n do
insertA[i ] into listB[bn · A[i ]c]for i := 0 to n − 1 do
insertionSort(B[i ])
concatenate B[0],B[1], . . . ,B[n − 1]
-
bucket sort: average-case time complexity
bucket sort on an input-array A of length n
elements of A in [0, 1)
elements of A uniformly distributed over [0, 1)
average-case time complexity (without proof): in Θ(n)
worst-case time complexity in Θ(n2); why ?
can we improve the worst-case time complexity ?
lecture 6lower bound on sortingcounting sortradix sortbucket sort