Non-Comparison Based Sorting

21
Non-Comparison Based Sorting

description

Non-Comparison Based Sorting. How fast can we sort?. Insertion Sort: O(n 2 ) Merge Sort, Quick Sort (expected), Heap Sort: O(nlgn) Can we sort faster than O(nlgn)?. Comparison-Based Sort. Can prove that any comparison-based sort MUST make Ω (nlgn) comparisons. - PowerPoint PPT Presentation

Transcript of Non-Comparison Based Sorting

Page 1: Non-Comparison Based Sorting

Non-Comparison Based Sorting

Page 2: Non-Comparison Based Sorting

How fast can we sort?

• Insertion Sort: O(n2)

• Merge Sort, Quick Sort (expected), Heap Sort: O(nlgn)

• Can we sort faster than O(nlgn)?

Page 3: Non-Comparison Based Sorting

Comparison-Based Sort

• Can prove that any comparison-based sort MUST make Ω(nlgn) comparisons. – Comparison-based sort: algorithm directly compares

elements of the array to one another, makes some decision based upon <, =, >

– This means that heapsort and mergesort are optimal.– Not always possible to prove that your algorithm is the

best possible!• Proof idea

– Use a decision tree to represent comparisons of a sorting algorithm

– Actual sorting algorithm maps to the decision tree

Page 4: Non-Comparison Based Sorting

Decision Tree Example

• Three elements to sort: a1, a2, a3

<= >

<= >

<= >

<= >

<= >

a1 ,a2 ,a3

a1 ,a3 ,a2 a3 ,a1 ,a2

a1 ,a3

a2 ,a3

a2 ,a1 ,a3

a2 ,a3 ,a1 a3 ,a2 ,a1

a2 ,a3

a1 ,a3

a1 ,a2

Decision tree must have n! leaves, one for each permutation of the input

Page 5: Non-Comparison Based Sorting

Decision Tree

• For any comparison-based sorting algorithm– One tree exists for each input of length n– An algorithm “splits” at each decision/comparison

unwinding the actual execution into a tree path– The tree is for all possible execution traces for an

input of length n• Height of the tree tells us the minimum number

of comparisons necessary to sort the input– There must be n! leaves– But we also have a binary tree; binary tree of height h

has no more than 2h leaves– This means that n! 2h

Page 6: Non-Comparison Based Sorting

Height of Tree

• n! 2h

• lg(n!) lg(2h)

• Stirlings approximation says:

• Giving us:n n n e n en n! ( / ) ( / ) 2

hennn

henn

henn

hen n

lglg

)lg(lg

)/lg(

)/lg(

This means h is Ω(nlgn)

Page 7: Non-Comparison Based Sorting

Better than O(nlgn)?

• This means any comparison-based sorting algorithm must require at least time O(nlgn)

• But we can do better if we use a non-comparison-based sorting algorithm!– Count Sort– Radix Sort– Bucket Sort– Postman Sort

Page 8: Non-Comparison Based Sorting

Count Sort

• May run in O(n) time– But requires assumptions about the size and nature of

the input• Input: A[1..n] where a[i] is an integer in the range

1..k• Output: B[1..n], array input values sorted• Uses: C[1..k] auxiliary storage

• Idea: Using the random access of the array C, count up the number of times each input element appears and then collect them together

Page 9: Non-Comparison Based Sorting

Count Sort Algorithm

Count-Sort(A,n)for i=1 to k do C[i]=0 ; Initialize to 0for j=1 to n do C[A[j]] ++ ; Countj=1for i=1 to k do

if (C[i]>0) then for z=1 to C[i] do

B[j]=ij++

Ex: A=[1 5 3 2 2 4 9] ; Initial values C=[0 0 0 0 0 0 0 0 0] ; Initialize C to 0

C=[1 2 1 1 1 0 0 0 1] ; After count loopB=[1 2 2 3 4 5 9] ; After last loop

Runtime: O(n+k). If k is close to n, then runs in O(n) time.A bad example would be a list such as: A[1,2,999999999999]

Page 10: Non-Comparison Based Sorting

Stable Count Sort

• A sorting algorithm is stable if the occurrence of a value I appears in the same value in the output as they do in the input– Ties broken by the rule of whichever appeared first in

the input appears first in the output– Desired for various algorithms where sorting a key of

a record (e.g. a zip code)• Example:

– A[3,5a,9,2,4,5b,6]– Sorts to : A[2,3,4,5a,5b,6,9] not A[2,3,4,5b,5a,6,9]

• Prior algorithm was not stable but we can modify to make it stable

Page 11: Non-Comparison Based Sorting

Stable Count Sort Stable-Count-Sort(A,n) for I 1 to k do C[I] 0 ; Initialize to 0 for j 1 to n do C[A[j]] ++ ; Count for I 2 to k do C[I] C[I]+C[I-1] ; Sum elements so far ; C[I] contains num elements <= I for j n downto 1 do B[C[A[j]]] A[j] C[A[j]] C[A[j]]-1

Key idea: fetch from original array to make it stable.

Example: A=[1 5 3 2 2 4 9] C=[0 0 0 0 0 0 0 0 0]

C=[1 2 1 1 1 0 0 0 1] C=[1 3 4 5 6 6 6 6 7] B=[ . . . . . . 9] C=[1 3 4 5 6 6 6 6 6] B=[ . . . . 4 . 9] … B=[1 2 2 3 4 5 9]

Page 12: Non-Comparison Based Sorting

Radix Sort

• Works like the punch card readers of the early 1900’s

• Only works on input items with digits– E.g. numbers, letters

• Idea: Sort on the least significant digit first

; Array A of n elements, each element d digitsRadix-Sort(A,d,n)

for i = 1 to dStable-Sort(A) with digit i as the key

Page 13: Non-Comparison Based Sorting

Radix Sort Example

A 492 031 102 031 299 492 204 102 102 102 031 204 031 204 835 299 996 835 492 492 204 996 996 835 835 299 299 996

Page 14: Non-Comparison Based Sorting

Radix Sort Notes

• Internal sort must be stable to preserve order of elements. Based on the observation that if all the lower order digits are already sorted, then we only need to sort on the highest order digit.

• Good choice to use if each digit is small in magnitude– If k is the maximum value of a digit– If using stable count sort internally– Stable-Count-Sort takes Θ(k+n) time for one pass– With d passes total, runtime is Θ(dk+dn)– Runtime is Θ(n) if d is a constant, k is same

magnitude as n

Page 15: Non-Comparison Based Sorting

Bucket Sort

• Bucket sort works similarly to count sort, but uses a “bucket” to hold a range of inputs. This allows it to operate on real numbers.

• Runs in O(n) time with the following assumptions:– Input elements are in the interval [0..1].

• If not, in many cases we can divide by some “max” value to scale the input to be between 0 and 1

– Input elements are evenly distributed• Uniform probability over [0..1]

Page 16: Non-Comparison Based Sorting

Bucket Sort Algorithm

• Idea:– Divide [0..1] into n equal sized “buckets”– Put each of the n inputs into the proper bucket; some may be

empty and others may have more than 1 element– Sort each bucket using insertion sort– To produce output, go through the buckets in order, listing the

elements in each• Algorithm:

Bucket-Sort(A,n)for i=1 to n do ; implicitly make buckets 0..n-1

Insert A[i] into bucket B[n*A[i]]for i =0 to n-1 do

sort bucket B[i] using insertion sort ; < O(n) if small bucketconcatenate buckets B[0], B[1], .. B[n-1] in order

Page 17: Non-Comparison Based Sorting

Bucket Sort Example

A=[0.44 0.12 0.73 0.29 0.67 0.49] Bucket I will get the values between I/n and (I+1)/n since buckets are numbered from 0 to n-1. B 0..0.16 0.12 0.16..0.33 0.29 0.33..0.50 0.44 0.49 0.50..0.66 0.66..0.83 0.73 0.67 0.83..1 Sort the buckets with insertion sort and then combine buckets to get:

0.12 0.29 0.44 0.49 0.67 0.73

Page 18: Non-Comparison Based Sorting

Bucket Sort Notes

• Expected runtime:– If any element in A comes from [0..1] with equal

probability, then the probability that an element e is in bucket B[i] is 1/n

– Expect on average each bucket to have a single element

– Call to insertion sort for each bucket will be extremely fast; essentially constant time

– Rest of the algorithm runs in O(n)– Overall runtime O(n) if the uniform probabilty

assumption holds• Worst case O(n2) if everything ends up in one

bucket

Page 19: Non-Comparison Based Sorting

Postman Sort

• Algorithm proposed by Robert Ramey, August 1992 issue of C Programming Journal

• Analysis left as a homework exercise• From the article:

– When a postal clerk receives a huge bag of letters he distributes them into other bags by state. Each bag gets sent to the indicated state. Upon arrival, another clerk distributes the letters in his bag into other bags by city. So the process continues until the bags are the size one man can carry and deliver. This is the basis for my sorting method which I call the postman's sort.

Page 20: Non-Comparison Based Sorting

Postman Sort Idea

• Given a large list of records to sort alphabetically– Make one pass, reading each record into one of 26

lists depending on the first letter in the field• First list contains all records starting with “A”• Second list contains all records starting with “B”, etc.

– Recurse on each of the 26 sublists if it contains more than one element, but this time using the next letter as the field to split up the sublists

– In-order traversal of the resulting tree gives a sorted list

Page 21: Non-Comparison Based Sorting

Postman Example

A = [bob, bill, boy, cow, dog]

A = [bob, bill, boy] A = [cow] A = [dog] Empty for all other letters

A = [bill] A = [bob, boy] Empty for all other letters

A = [bob] A = [boy] Empty for all other letters

b c d

i o

b y

Runtime? Left as a homework exercise