data structures and algorithms 2020 09 17 lecture 6femke/courses/ds/slides/6.pdf · 2020. 9....

28
data structures and algorithms 2020 09 17 lecture 6

Transcript of data structures and algorithms 2020 09 17 lecture 6femke/courses/ds/slides/6.pdf · 2020. 9....

  • data structures and algorithms2020 09 17

    lecture 6

    http://www.cs.vu.nl/~tcs/ds

  • heaps and heapsort on n elements

    height of a heap is in Θ(log n)

    building a heap bottum-up in O(n)

    analysis via picture (learn) or using summations (know result)

    building a heap one-by-one in O(n log n)

    maxheapify in O(log n) (also Θ(log n))

    heapsort in O(n log n); also in Θ(n log n)?

    yes, see worst-case input or see lower-bound

    more: see smooth sort

    https://en.wikipedia.org/wiki/Smoothsort

  • use of priority queue: Dijkstra’s algorithminput: directed graph G = (V ,E ) with positive weights w and start vertexs in G

    Algorithm Dijkstra(G ,w , s):

    initialize(G , s)

    S := ∅Q := G .V

    while Q 6= ∅ dou := extractMin(Q)

    S := S ∪ {u}for each v ∈ G .Adj [u] do

    relax(u, v ,w)

    the set S contains vertices for which the weight of a shortest path has beenfoundif priority queue implemented with heap then in O(|E | · log |V |)

  • quicksort on n elements

    worst-case in Θ(n2)

    best-case in Θ(n log n)

    balanced-case in O(n log n)

    average case (assuming all n! possible inputs are equally likely) in O(n log n)

    expected case (using randomization) in O(n log n)

    more: replace small recursive calls by insertion sort and sorting networks

    https://arxiv.org/abs/1505.01962

  • overview

    lower bound on sorting

    counting sort

    radix sort

    bucket sort

  • overview

    lower bound on sorting

    counting sort

    radix sort

    bucket sort

  • recall: bounds

    asymptotic upper bound f ∈ O(g)f is eventually bound from above by g up to a constant

    asymptotic tight bound f ∈ Θ(g)f is eventually sandwiched by g up to two constants

    asymptotic lower bound f ∈ Ω(g)f is eventually bound from below by g up to a constant

    (there are more bounds!)

  • upper and lower bound

    f , g : N→ R+

    asymptotic upper bound: f (n) ∈ O(g(n)) if

    ∃c > 0 ∈ R+∃n0 > 0 ∈ N∀n ∈ N : n ≥ n0 ⇒ f (n) ≤ c · g(n)

    asymptotic lower bound: f (n) ∈ Ω(g(n)) if

    ∃c > 0 ∈ R+∃n0 > 0 ∈ N∀n ∈ N : n ≥ n0 ⇒ f (n) ≥ c · g(n)

  • question

    comparison-based sorting: the crucial tests are

    a < b, a ≤ b, a = b, a ≥ b, a > b

    which are all decidable in elementary time

    so far we have seen only comparison-based sorting algorithms

    is O(n log n) the best worst-case that we have?

  • lower bound on running time

    we consider sorting algorithms based on comparisons k < k ′

    examples: merge sort, heapsort, quicksort

    non-examples: counting sort, bucket sort

    Ω(n log n) comparisons are needed in the worst case

    so merge sort, heapsort are asymptotically optimal

  • lowerbound: proof

    we consider decision trees

    node contains a comparison a < b?

    leaf corresponds to a permutation of {1, . . . .n}

    every comparison-based sorting algorithm has for every number of inputs adecision tree

    execution of that sorting algorithm corresponds to a path in the tree

    every possible permutation (of total n!) must occur

    every permutation must be reachable

  • decision tree: example

    a c b c a b b a c b c a

    c b a

    a < b?

    b < c? b < c?

    a < c? a < c?a b c

  • we analyze the height of a decision tree with n nodes

    the number of comparisons is height h of the tree

    we have at least n! (because decision tree),

    and at most 2h (because binary tree) leaves,

    hence:

    n! ≤ #leaves ≤ 2h

    n! ≤ 2h

    log n! ≤ hh ≥ log n!

  • we continue to find a bound on the height

    we have h ≥ dlog n!e

    omitting floors and ceilings we find:

    h ≥ log(n!)= log(1 · 2 · . . . · (n2 − 1) ·

    n2 . . . · n)

    ≥ log(1 · 1 · . . . · 1 · n2 . . . ·n2 )

    = log((n2 )n2 )

    = n2 log(n2 )

    hence h ∈ Ω(n log n)(see also 3.19 in the book that uses Stirling’s approximation)

  • results

    Theorem: any comparison-based sorting algorithm uses Ω(n log n)comparisons in the worst case.

    Consequence: heapsort and merge sort are asymptotically optimal

    However: for specific inputs there are linear sorting algorithms,

    not based on comparisons

  • back to puzzle from Knuth

    can we sort 5 elements in dlog 5!e = 7 comparisons?

  • overview

    lower bound on sorting

    counting sort

    radix sort

    bucket sort

  • counting sort

    assumption: numbers in input come from fixed range {0, . . . , k}.

    algorithm idea: count the number of occurrences of each i from 0 to k

    time complexity: in Θ(n + k) for a input-array of length n

    drawback: fixed range, and requires additional counting array C andoutputarray B

    counting sort is a stable sorting algorithm

    what happens if the last loop is done for j := 1 up to j := A.length?

  • counting sort: pseudo-code

    input array A, output array B, range from 0 up to k

    Algorithm countingSort(A,B, k):

    new arrayC [0 . . . k]

    for i := 0 to k do

    C [i ] := 0

    for j := 1 to A.length do

    C [A[j ]] := C [A[j ]] + 1

    for i := 1 to k do

    C [i ] := C [i ] + C [i − 1]for j := A.length downto 1 do

    B[C [A[j ]]] := A[j ]

    C [A[j ]] := C [A[j ]]− 1

  • overview

    lower bound on sorting

    counting sort

    radix sort

    bucket sort

  • radix sort

  • radix sort: intuition

    the old days: used for sorting punched cards

    80 columns with one hole in one of the 12 possible places

    our use: sorting numbers considered as tuples

    number of columns: (fixed) amout of digits used

    number of places: 10 for decimal numbers

    329 457 329 329457 657 839 457657 329 457 657839 839 657 839

    recall the lexicographic ordering for tuples: (x1, . . . , xd) < (y1, . . . , yd) if(x1 < y1) or ((x1 = y1) and (x2, . . . , xn) < (y2, . . . , yn))

  • radix sort: ‘pseudo-pseudocode’

    intuition: sort per dimension using a stable sorting algorithm

    Algorithm radixSort(A, d):

    for i := 1 to d do

    use some stable sort on digit d

    1 is the lowest-order digit and d is the highest-order digit

    what happens if we take the other order?

  • radix sort: time complexity

    if counting sort (stable!) per dimension is in Θ(n + k)

    then radix sort is in Θ(d(n + k))

    if d is constant and k is in O(n) then radix sort is in linear time

    is radix sort preferable to comparison-based sorting?

    application: order {0, . . . , 8} in representation with basis 3:

    00, 01, 02, 10, 11, 12, 20, 21, 22

  • overview

    lower bound on sorting

    counting sort

    radix sort

    bucket sort

  • bucket sort

    similar to counting sort

    assumption for correctness: keys in [0, 1)

    assumption for time complexity: key uniformly distributed over [0, 1)

    an elementary operation on the key gives the index

    several keys can belong to the same index

  • bucket sort: pseudo-pseudocode

    array A with 0 ≤ A[i ] < 1 for i = 1, . . . ,A.length

    Algorithm bucketSort(A):

    n := A.length

    new arrayB[0 . . . n − 1]for i := 0 to n − 1 do

    makeB[i ] an empty list init B

    for i := 1 to n do

    insertA[i ] into listB[bn · A[i ]c]for i := 0 to n − 1 do

    insertionSort(B[i ])

    concatenate B[0],B[1], . . . ,B[n − 1]

  • bucket sort: average-case time complexity

    bucket sort on an input-array A of length n

    elements of A in [0, 1)

    elements of A uniformly distributed over [0, 1)

    average-case time complexity (without proof): in Θ(n)

    worst-case time complexity in Θ(n2); why ?

    can we improve the worst-case time complexity ?

    lecture 6lower bound on sortingcounting sortradix sortbucket sort