Big-Oh notation classes Advanced Algorithmics (6EAP) … · Class Informal Intuition Analogy f(n)...

32
14.09.17 1 Advanced Algorithmics (6EAP) MTAT.03.238 Linear structures, sorting, searching, etc Jaak Vilo 2017 Fall 1 Jaak Vilo Big-Oh notation classes Class Informal Intuition Analogy f(n) ο ( g(n) ) f is dominated by g Strictly below < f(n) O( g(n) ) Bounded from above Upper bound f(n) Θ( g(n) ) Bounded from above and below “equal to” = f(n) Ω( g(n) ) Bounded from below Lower bound f(n) ω( g(n) ) f dominates g Strictly above > Conclusions Algorithm complexity deals with the behavior in the long-term worst case -- typical average case -- quite hard best case -- bogus, cheatingIn practice, long-term sometimes not necessary E.g. for sorting 20 elements, you dont need fancy algorithms… Linear, sequential, ordered, list … Memory, disk, tape etc – is an ordered sequentially addressed media. Physical ordered list ~ array Memory /address/ Garbage collection Files (character/byte list/lines in text file,…) Disk Disk fragmentation Linear data structures: Arrays Array Bidirectional map Bit array Bit field Bitboard Bitmap Circular buffer Control table Image Dynamic array Gap buffer Hashed array tree Heightmap Lookup table Matrix Parallel array Sorted array Sparse array Sparse matrix Iliffe vector Variable-length array

Transcript of Big-Oh notation classes Advanced Algorithmics (6EAP) … · Class Informal Intuition Analogy f(n)...

14.09.17

1

AdvancedAlgorithmics(6EAP)MTAT.03.238

Linearstructures,sorting,searching,etc

JaakVilo2017Fall

1JaakVilo

Big-Ohnotationclasses

Class Informal Intuition Analogy

f(n)∈ ο (g(n)) fisdominatedbyg Strictlybelow <f(n)∈O(g(n)) Boundedfromabove Upperbound ≤f(n)∈Θ(g(n)) Boundedfrom

aboveand below“equalto” =

f(n)∈Ω(g(n)) Boundedfrombelow Lowerbound ≥f(n)∈ω(g(n)) fdominatesg Strictlyabove >

Conclusions

• Algorithmcomplexitydealswiththebehaviorinthelong-term– worstcase -- typical– averagecase -- quitehard– bestcase -- bogus,“cheating”

• Inpractice,long-termsometimesnotnecessary– E.g.forsorting20elements,youdon’tneedfancyalgorithms…

Linear,sequential,ordered,list…

Memory, disk, tape etc – is an ordered sequentially addressed media.

Physicalorderedlist~array

• Memory/address/– Garbagecollection

• Files(character/bytelist/linesintextfile,…)

• Disk– Diskfragmentation

Lineardatastructures:Arrays• Array• Bidirectionalmap• Bitarray• Bitfield• Bitboard• Bitmap• Circularbuffer• Controltable• Image• Dynamicarray• Gapbuffer

• Hashedarraytree• Heightmap• Lookuptable• Matrix• Parallelarray• Sortedarray• Sparsearray• Sparsematrix• Iliffevector• Variable-lengtharray

14.09.17

2

Lineardatastructures:Lists

• Doublylinkedlist• Arraylist• Linkedlist• Self-organizinglist• Skiplist• Unrolledlinkedlist• VList

• Xorlinkedlist• Zipper• Doublyconnectededgelist

• Differencelist

Lists:Array

3 6 7 5 2

0 1 size MAX_SIZE-1

L = int[MAX_SIZE]

L[2]=7

Lists:Array

3 6 7 5 2

0 1 size MAX_SIZE-1

L = int[MAX_SIZE]

L[2]=7 L[size++] = new

3 6 7 5 2

1 2 size MAX_SIZE

L[3]=7 L[++size] = new

Multiplelists,2-D-arrays,etc…

2Darray

& A[i,j] = A + i*(nr_el_in_row*el_size) + j*el_size

LinearLists

• Operationswhichonemaywanttoperformonalinearlistofn elementsinclude:

– gainaccess tothekthelementofthelisttoexamine and/orchange thecontents

– insert anewelementbeforeorafterthekthelement

–delete thekthelementofthelist

Reference:Knuth,TheArtofComptuerProgramming,Vol1,FundamentalAlgorithms,3rd Ed.,p.238-9

14.09.17

3

AbstractDataType(ADT)• High-leveldefinitionofdatatypes

• AnADTspecifies– Acollection ofdata– Asetofoperations onthedataorsubsetsofthedata

• ADTdoesnotspecifyhowtheoperationsshouldbeimplemented

• Examples– vector,list,stack,queue,deque,priorityqueue,table(map),

associativearray,set,graph,digraph

ADT• Adatatype isasetofvaluesandanassociatedsetof

operations• Adatatypeisabstract iffitiscompletelydescribedbyitsset

ofoperationsregradlessofitsimplementation• Thismeansthatitispossibletochangetheimplementation

ofthedatatypewithoutchangingitsuse• Thedatatypeisthusdescribedbyasetofprocedures• Theseoperationsaretheonlythingthatauserofthe

abstractioncanassume

Primitive&compositetypesPrimitivetypes• Boolean (forboolean values

True/False)• Char (forcharactervalues)• int (forintegralorfixed-precision

values)• Float (forstoringrealnumber

values)• Double (alargersizeoftypefloat)• String (forstringofchars)• Enumerated type

Compositetypes• Array• Record (alsocalledtupleorstruct)

• Union• Taggedunion(alsocalledavariant,variantrecord,discriminatedunion,ordisjointunion)

• Plainolddatastructure

AbstractDataTypes(ADT)• SomecommonADTs,whichhaveprovedusefulinagreatvarietyofapplications,are

• EachoftheseADTsmaybedefinedinmanywaysandvariants,notnecessarilyequivalent.

ContainerListSetMultisetMapMultimap

StackGraphQueuePriority queueDouble-ended queueDouble-ended priority queue

Abstractdatatypes:

• Dictionary (key,value)• Stack(LIFO)• Queue(FIFO)• Queue(double-ended)• Priorityqueue(fetchhighest-priorityobject)• ...

Dictionary

• Containerofkey-element(k,e)pairs• Requiredoperations:

– insert(k,e),– remove(k),– find(k),– isEmpty()

• Mayalsosupport(whenanorderisprovided):– closestKeyBefore(k),– closestElemAfter(k)

• Note:Noduplicatekeys

14.09.17

4

Somedatastructuresfor DictionaryADT• Unordered

– Array– Sequence/list

• Ordered– Array– Sequence(SkipLists)– BinarySearchTree(BST)– AVLtrees,red-blacktrees– (2;4)Trees– B-Trees

• Valued– HashTables– ExtendibleHashing

Lineardatastructures

Arrays• Array• Bidirectional

map• Bitarray• Bitfield• Bitboard• Bitmap• Circularbuffer• Controltable• Image• Dynamicarray• Gapbuffer

• Hashedarraytree

• Heightmap• Lookuptable• Matrix• Parallelarray• Sortedarray• Sparsearray• Sparsematrix• Iliffe vector• Variable-length

array

Lists• Doublylinkedlist• Linkedlist• Self-organizinglist• Skiplist• Unrolledlinkedlist• VList• Xor linkedlist• Zipper• Doublyconnectededgelist

Trees…Binarytrees• AAtree• AVLtree• Binarysearch

tree• Binarytree• Cartesiantree• Pagoda• Randomized

binarysearchtree

• Red-blacktree• Rope• Scapegoattree• Self-balancing

binarysearchtree

• Splaytree• T-tree• Tangotree• Threadedbinary

tree• Toptree

• Treap• Weight-balanced

tree

B-trees• B-tree• B+tree• B*-tree• Bsharptree• Dancingtree• 2-3tree• 2-3-4tree• Queap• Fusiontree• Bx-tree

Heaps• Heap• Binaryheap• Binomialheap• Fibonacciheap• AF-heap• 2-3heap

• Softheap• Pairingheap• Leftistheap• Treap• Beap• Skewheap• Ternaryheap• D-ary heap•• Tries• Trie• Radixtree• Suffixtree• Suffixarray• Compressed

suffixarray• FM-index• Generalised

suffixtree• B-trie• Judyarray• X-fasttrie• Y-fasttrie

• Ctrie

Multiway trees• Ternarysearch

tree• And–ortree• (a,b)-tree• Link/cuttree• SPQR-tree• Spaghettistack• Disjoint-setdata

structure• Fusiontree• Enfilade• Exponentialtree• Fenwicktree• VanEmde Boas

tree

Space-partitioningtrees• Segmenttree

• Intervaltree• Rangetree• Bin• Kd-tree• Implicitkd-tree• Min/maxkd-tree• Adaptivek-dtree• Kdb tree• Quadtree• Octree• Linearoctree• Z-order• UB-tree• R-tree• R+tree• R*tree• HilbertR-tree• X-tree• Metrictree• Covertree• M-tree• VP-tree

• BK-tree• Bounding

intervalhierarchy

• BSPtree• Rapidly-exploring

randomtree

Application-specifictrees• Syntaxtree• Abstractsyntax

tree• Parsetree• Decisiontree• Alternating

decisiontree• Minimax tree• Expectiminimax

tree• Fingertree

Hashes,Graphs,Other• Hashes• Bloomfilter• Distributedhashtable• Hasharraymapped

trie• Hashlist• Hashtable• Hashtree• Hashtrie• Koorde• Prefixhashtree•

Graphs• Adjacencylist• Adjacencymatrix• Graph-structured

stack• Scenegraph• Binarydecision

diagram• Zerosuppressed

decisiondiagram• And-invertergraph• Directedgraph• Directedacyclicgraph

• Propositionaldirectedacyclicgraph

• Multigraph• Hypergraph

Other• Lightmap• Wingededge• Quad-edge• Routingtable• Symboltable

Lists:Array

3 6 7 5 2

0 1 size MAX_SIZE-1

3 6 7 8 2

0 1 size

5 2 Insert 8 after L[2]

3 6 7 8 2

0 1 size

5 2 Delete last

*array (memory address)sizeMAX_SIZE

Lists:Array

3 6 7 8 2

0 1 size

5 2 Insert 8 after L[2]

3 6 7 8 2

0 1 size

5 2 Delete last

• Access i O(1)• Insert to end O(1)• Delete from end O(1)• Insert O(n) • Delete O(n)• Search O(n)

14.09.17

5

LinearLists

• Otheroperationsonalinearlistmayinclude:– determinethenumberofelements– searchthelist– sortalist– combinetwoormorelinearlists– splitalinearlistintotwoormorelists– makeacopyofalist

Stack

• push(x) -- addtoend(addtotop)• pop() -- fetchfromend(top)

• O(1)inallreasonablecasesJ

• LIFO– LastIn,FirstOut

Linkedlistshead tail

head tail

Singly linked

Doubly linked

Linkedlists:addhead tail

head tail

size

Linkedlists:delete(+garbagecollection?)

head tail

head tail

size

Operations

• Arrayindexedfrom0 ton – 1:

• Singly-linkedlistwithheadandtailpointers

1 undertheassumptionwehaveapointertothekthnode,O(n) otherwise

k = 1 1 < k < n k = naccess/change the kth element O(1) O(1) O(1)

insert before or after the kth element O(n) O(n) O(1)delete the kth element O(n) O(n) O(1)

k = 1 1 < k < n k = naccess/change the kth element O(1) O(n) O(1)

insert before or after the kth element O(1) O(n) O(1)1 O(n) O(1)delete the kth element O(1) O(n) O(n)

14.09.17

6

ImprovingRun-TimeEfficiency

• Wecanimprovetherun-timeefficiencyofalinkedlistbyusingadoubly-linkedlist:

Singly-linkedlist:

Doubly-linkedlist:

– Improvementsatoperationsrequiringaccesstothepreviousnode

– Increasesmemoryrequirements...

ImprovingEfficiency

Singly-linkedlist:

Doubly-linkedlist:

1 undertheassumptionwehaveapointertothekthnode,O(n) otherwise

k = 1 1 < k < n k = naccess/change the kth element O(1) O(n) O(1)

insert before or after the kth element O(1) O(1)1 O(1)delete the kth element O(1) O(1)1 O(1)

k = 1 1 < k < n k = naccess/change the kth element O(1) O(n) O(1)

insert before or after the kth element O(1) O(n) O(1)1 O(n) O(1)delete the kth element O(1) O(n) O(n)

• Arrayindexedfrom0 ton – 1:

• Singly-linkedlistwithheadandtailpointers

• Doublylinkedlist

k = 1 1 < k < n k = naccess/change the kth element O(1) O(1) O(1)insert before or after the kth element O(n) O(n) O(1)

delete the kth element O(n) O(n) O(1)

k = 1 1 < k < n k = naccess/change the kth element O(1) O(n) O(1)insert before or after the kth element O(1) O(n) O(1)1 O(n) O(1)delete the kth element O(1) O(n) O(n)

k = 1 1 < k < n k = naccess/change the kth element O(1) O(n) O(1)

insert before or after the kth element O(1) O(1)1 O(1)

delete the kth element O(1) O(1)1 O(1)

Introductiontolinkedlists• Considerthefollowingstructdefinition

struct node {string word;int num;node *next; //pointer for the next node

};node *p = new node;

? ?

num word next

p ?

Introductiontolinkedlists:insertinganode• node *p;

• p = new node;

• p->num = 5; • p->word = "Ali";• p->next = NULL

•5 Ali

num word next

p

Introductiontolinkedlists:addinganewnode

• Howcanyouaddanothernodethatispointedbyp->link?

• node *p;• p = new node;• p->num = 5; • p->word = "Ali";• p->next = NULL;• node *q;

5 Ali

num word link

?p

q

14.09.17

7

Introductiontolinkedlistsnode *p;p = new node;p->num = 5; p->word = "Ali";p->next = NULL;

node *q;q = new node;

5 Ali

num word link

? ? ?

num word link

?

q

p

Introductiontolinkedlistsnode *p, *q;p = new node;p->num = 5; p->word = "Ali";p->next = NULL;

q = new node;q->num=8;q->word = "Veli";

5 Ali

num word next

? 8 Veli

num word next

?p

q

Introductiontolinkedlistsnode *p, *q;p = new node;p->num = 5; p->word = "Ali";p->next = NULL;

q = new node;q->num=8;q->word = "Veli";p->next = q;q->next = NULL;

5 Ali

num word link

? 8 Veli

num word link

p

q

PointersinC/C++

p=newnode;deletep;p=newnode[20];

p=malloc(sizeof(node));freep;

p=malloc(sizeof(node)*20);(p+10)->next=NULL;/*11thelements*/

Book-keeping

• malloc,new – “remember” whathasbeencreatedfree(p),delete (C/C++)

• Whenyouneedmanysmallareastobeallocated,reserveabigchunk(array)andmaintainyourownsetoffreeobjects

• Elementsofarrayofobjects canbepointedbythepointertoanobject.

Object

• Object=newobject_type;

• Equalstocreatinganewobjectwithnecessarysizeofallocatedmemory(deletecanfreeit)

14.09.17

8

Somelinks

• http://en.wikipedia.org/wiki/Pointer

• Pointerbasics:http://cslibrary.stanford.edu/106/

• C++MemoryManagement:Whatisthedifferencebetweenmalloc/freeandnew/delete?– http://www.codeguru.com/forum/showthread.php?t=401848

Alternative:arraysandintegers

• Ifyouwanttotestpointersandlinkedlistetc.datastructures,butdonothavepointersfamiliar(yet)

• Usearraysandindexestoarrayelementsinstead…

Replacingpointerswitharrayindex

/

7

5

5

8

/

1

4

3

1 2 3 4 5 6 7head=3

nextkeyprev

8 4 7

head

Maintaininglistoffreeobjects

/

7

5

5

8

/

1

4

3

1 2 3 4 5 6 7head=3

nextkeyprev

8 4 7

head

/

7

5

4 5

8

/

/ 1

4

3

7 21 2 3 4 5 6 7

head=3

nextkeyprev

free=6 free = -1 => array is full

allocate object:new = free;free = next[free] ;

free object xnext[x]= freefree = x

Multiplelists,singlefreelist

/

7

5

4 5

8

/

/ 1

4

3

7

3

/

/

9

6

1 2 3 4 5 6 7nextkeyprev

head1=3 => 8, 4, 7head2=6 => 3, 9free =2 (2)

Hack:allocatemorearrays…

1 2 3 4 5 6 7

8 9 10 11 12 13 14

15 16 17 18 19 20 21

AA

AA[ (i-1)/7 ] -> [ (i -1) % 7 ]

LIST(10) = AA[ 1 ][ 2 ]

LIST(19) = AA[ 2 ][ 5 ]

use integer division and mod

14.09.17

9

Queue

• enqueue(x) - addtoend• dequeue() - fetchfrombeginning

• FIFO– FirstInFirstOut

• O(1)inallreasonablecasesJ

Queue(FIFO)

3 6 7 5 2

F L

Queue(basicidea,doesnotcontainallcontrols!)

3 6 7 5 2

F L MAX_SIZE-1

7 5 2

F L MAX_SIZE-1

First = List[F]

Last = List[L-1]

Pop_first : { return List[F++] }

Pop_last : { return List[--L] }

Full: return ( L==MAX_SIZE )Empty: F< 0 or F >= L

Circularbuffer

• Acircularbuffer orringbuffer isadatastructure thatusesasingle,fixed-sizebufferasifitwereconnectedend-to-end.Thisstructurelendsitselfeasilytobufferingdatastreams.

CircularQueue

3 6

FL

First = List[F]

Last = List[ (L-1+MAX_SIZE) % MAX_SIZE ]

Full: return ( (L+1)%MAX_SIZE == F )Empty: F==L

7 5 2

MAX_SIZE-1

Add_to_end( x ) : { List[L]=x ; L= (L+1) % MAX_SIZE ] } // % = modulo

3 6

FL,

7 5 2

MAX_SIZE-1

14.09.17

10

Stack

• push(x) -- addtoend(addtotop)• pop() -- fetchfromend(top)

• O(1)inallreasonablecasesJ

• LIFO– LastIn,FirstOut

Stackbasedlanguages

• Implementapostfixcalculator– ReversePolishnotation

• 543*2- + => 5+((4*3)-2)

• Verysimpletoparse andinterpret

• FORTH,Postscriptarestack-basedlanguages

Arraybasedstack

• Howtoknowhowbigastackshalleverbe?

• Whenfull,allocatebiggertabledynamically,andcopyallpreviousvaluesthere

• O(n)?

3 6 7 5

3 6 7 5 2

• Whenfull,create2xbiggertable,copypreviousnelements:

• Afterevery2k insertions,performO(n)copy

• O(n)individualinsertions+• n/2+n/4+n/8…copy-ing• Total:O(n)effort!

Whataboutdeletions?

• whenn=32->33(copy32,insert1)• delete:33->32

– shouldyoudeleteimmediately?– Deleteonlywhenbecomeslessthan1/4thfull

– Havetodeleteatleastn/2todecrease– Havetoaddatleastntoincreasesize– Mostoperations,O(1)effort– ButfewoperationstakeO(n)tocopy– Foranymoperations,O(m)time

Amortizedanalysis

• Analyzethetimecomplexityovertheentire“lifespan” ofthealgorithm

• Someoperationsthatcostmorewillbe“covered”bymanyotheroperationstakingless

14.09.17

11

ListsanddictionaryADT…

• Howtomaintainadictionaryusing(linked)lists?

• IskinD?– gothroughallelementsdofD,testifd==kO(n)– Ifsorted:d=first(D);while(d<=k)d=next(D);– onaveragen/2tests…

• Add(k,D)=>insert(k,D)=O(1)orO(n)– testforuniqueness

Arraybasedsortedlist

• isdinD?• BinarysearchinD

low highmid

Binarysearch– recursiveBinarySearch(A[0..N-1],value,low,high){

if (high<low)return -1//notfound

mid=(low+high)/2if (A[mid]>value)

return BinarySearch(A,value,low,mid-1)elseif(A[mid]<value)

return BinarySearch(A,value,mid+1,high)else

returnmid//found}

Binarysearch– recursiveBinarySearch(A[0..N-1],value,low,high){

if (high<low)return -1//notfound

mid=low+((high- low)/2) //Note:not(low+high)/2!!if (A[mid]>value)

return BinarySearch(A,value,low,mid-1)elseif(A[mid]<value)

return BinarySearch(A,value,mid+1,high)else

returnmid//found}

Binarysearch– iterativeBinarySearch(A[0..N-1],value){

low=0;high=N- 1;

while (low<=high){mid=low+((high- low)/2)//Note:not(low+high)/2 !!if (A[mid]>value)

high=mid- 1else if (A[mid]<value)

low=mid+1else

returnmid//found}return -1//notfound

}

Workperformed

• x<=>A[18]?<• x<=>A[9]? >• x<=>A[13]?==

• O(lgn)

1 3618

14.09.17

12

Sorting

• givenalist,arrangevaluessothatL[1]<=L[2]<=…<=L[n]

• nelements=>n!possibleorderings• OnetestL[i]<=L[j]candividen!to2

– Makeabinarytreeandcalculatethedepth

• log(n!)=Ω(nlogn)

• Hence,lowerboundforsortingisΩ(nlogn)– usingcomparisons…

Decisiontreemodel

• n!orderings(leaves)• Heightofsuchtree?

Proof:log(n!)=Ω(nlogn)

• log(n!)=logn+log(n-1)+log(n-2)+…log(1)

>=n/2*log(n/2)

=Ω(nlogn)Half of elements are larger than log(n/2)

14.09.17

13

Thedivide-and-conquerdesignparadigm

1. Divide theproblem(instance)intosubproblems.

2. Conquer thesubproblems bysolvingthemrecursively.

3. Combine subproblem solutions.

Mergesort

Merge-Sort(A,p,r)if p<r

then q=(p+r)/2 //floorMerge-Sort( A,p,q)Merge-Sort( A,q+1,r)Merge( A,p,q,r)

It was invented by John von Neumann in 1945.

Example

• Applyingthemergesortalgorithm:

Mergeoftwolists:Θ(n)

A,B– liststobemergedL=newlist;//emptywhile(Anotemptyand Bnotempty)

ifA.first()<=B.first()thenappend(L,A.first());A=rest(A);elseappend(L,B.first());B=rest(B);

append(L,A); //allremainingelementsofAappend(L,B); //allremainingelementsofBreturnL

Wikipedia/viz.

Valu

e

Pos in array

Run-timeAnalysisofMergeSort

• Thus,thetimerequiredtosortanarrayofsizen > 1 is:– thetimerequiredtosortthefirsthalf,– thetimerequiredtosortthesecondhalf,and– thetimerequiredtomergethetwolists

• Thatis:

( )îíì

>+=

=1)(T21)1(

)T(2 nn

nn n Θ

Θ

14.09.17

14

Mergesort

• Worstcase,averagecase,bestcase…Θ(nlogn)

• Commonwisdom:– Requires additionalspaceformerging(incaseofarrays)

• Homework*:developin-placemergeoftwolistsimplementedinarrays/comparespeed/

a b

a bL[a] <= L[b] a b

L[a] > L[b]

Quicksort

• ProposedbyC.A.R.Hoarein1962.• Divide-and-conqueralgorithm.• Sorts“inplace”(likeinsertionsort,butnotlikemergesort).

• Verypractical(withtuning).

Quicksort

14.09.17

15

Partitioningversion2

pivot=A[R];//i=L;j=R-1;while(i<=j)while (A[i]<pivot)i++;//willstopatpivotlatestwhile (i<=jand A[j]>=pivot)j-- ;if (i<j){swap(A[i],A[j]);i++;j-- }

A[R]=A[i];A[i]=pivot;returni;

L R

L R <= pivot

> pivot

i j

L Rij

L Rij

pivot

Wikipedia/“video”

14.09.17

16

ChoiceofpivotinQuicksort

• Selectmedianofthree…

• Selectrandom– opponentcannotchoosethewinningstrategyagainstyou!

http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-046JFall-2005/VideoLectures/detail/embed04.htm

Randompivot

Selectpivot randomlyfromtheregion(blue)andswapwithlastposition

Selectpivotasamedianof3[ormore]randomvaluesfromregion

Applynon-recursivesortforarraylessthan10-20

L R

L R

<= pivot

> pivoti j

L Rij

• 2-pivotversionofQuicksort– (splitin3regions!)

• Handleequalvalues(equaltopivots)

14.09.17

17

14.09.17

18

14.09.17

19

Alternativematerials

• Quicksortaveragecaseanalysishttp://eid.ee/10z

• https://secweb.cs.odu.edu/~zeil/cs361/web/website/Lectures/quick/pages/ar01s05.html

• http://eid.ee/10y - MITOpenCourseware-Asymptoticnotation,Recurrences,SubstitutonMasterMethod

f(n)

f(n/b) f(n/b) f(n/b)

f(n)

f(n/b) f(n/b) f(n/b)

n vs f(n)log b a

14.09.17

20

14.09.17

21

Backtosorting

14.09.17

22

WecansortinO(nlogn)

• Isthatthebestwecando?

• Remember:usingcomparisons<,>,<=,>=wecannotdobetterthanO(nlogn)

Howfastcanwesortnintegers?

• Sortpeoplebytheirsex?(F/M,0/1)

• Sortpeoplebyyearofbirth?

14.09.17

23

14.09.17

24

Radixsort

Radix-Sort(A,d)1. for i=1to d/*leastsignificanttomostsignificant*/

2. useastablesorttosortAondigiti

14.09.17

25

Radixsortusinglists(stable)bbabbb adb aad aac ccc ccb aba cca

Radixsortusinglists(stable)

a

b

c

d

bba aba cca

bbb adb ccb

aac ccc

aad

1.

bbabbb adb aad aac ccc ccb aba cca

Radixsortusinglists(stable)

a

b

c

d

bba aba cca

bbb adb ccb

aac ccc

aad

a

b

c

d

cca

bbb

adb

ccb

aac

ccc

aad

bba aba

a

b

c

d

cca

bbb

adb

ccb

aac

ccc

aad

bba

aba

1. 2.

3.

bbabbb adb aad aac ccc ccb aba cca

Whynotfromlefttoright?

• Swap‘0’ withfirst‘1’• Idea1:recursivelysortfirstandsecondhalf

– Exercise?

01011001001010111100010010010010010100100101010000010000

01011000010010111100010010011001010100100101010000010000

01011000010010010100010010011001010100100111110000010000

01011000010010010100000100001001010100100111110001001001

14.09.17

26

Bitwisesortlefttoright

• Idea2:– swapelementsonlyiftheprefixesmatch…

– Forallbitsfrommostsignificant• advancewhen0• when1->lookfornext0

– ifprefixmatches,swap– otherwisekeepadvancingon0’sandlookfornext1

Bitwiselefttorightsort/*Historicalsorting– wasusedinUniv.ofTartuusingassembler….*//*Cimplementation– JaakVilo,1989*/

voidbitwisesort(SORTTYPE*ARRAY,intsize){

inti,j,tmp,nrbits;

registerSORTTYPEmask,curbit,group;

nrbits=sizeof(SORTTYPE)*8;

curbit=1<<(nrbits-1);/*setmostsignificantbit1*/mask=0; /*maskofthealreadysortedarea */

Jaak Vilo, Univ. of Tartu

do{/*Foreachbit*/i=0;

new_mask:for(;(i<size)&&(!(ARRAY[i]&curbit));i++) ; /*Advancewhilebit==0*/if(i>=size)gotoarray_end;group=ARRAY[i]&mask; /*Savecurrentprefixsnapshot*/

j=i; /*memorize locationof1*/for(;;) {

if(++i>=size)gotoarray_end; /* reachedendofarray*/if((ARRAY[i]&mask)!=group)goto new_mask ;/*newprefix*/

if(!(ARRAY[i]&curbit)) {/*bitis0– needtoswapwithpreviouslocationof1,A[i]ó A[j]*/tmp =ARRAY[i]; ARRAY[i]=ARRAY[j]; ARRAY[j]=tmp;j+=1; /*swapandincreasejtothe

nextpossible1*/}

}array_end:

mask=mask|curbit; /*areaundermaskisnowsorted*/curbit>>=1; /*nextbit */

}while(curbit); /*untilallbitshavebeensorted… */}

Jaak Vilo, Univ. of Tartu

Bitwisefromlefttoright

• Swap‘0’ withfirst‘1’

00100000010010010100001011001001010100100110010011111000

Jaak Vilo, Univ. of Tartu

Bucketsort

• Assumeuniformdistribution

• AllocateO(n)buckets

• Assigneachvaluetopre-assignedbucket

.78

.17

.39

.26

.72

.94

.21

.12

.23

.68

/

/

/

1

0

Sortsmallbucketswithinsertionsort

3

2

5

4

7

6

9

8

.12 .17

.21 .23 .26

.39

.68

.72 .78

.94

14.09.17

27

http://sortbenchmark.org/

• Thesortinputrecordsmustbe100bytesinlength,withthefirst10bytesbeingarandomkey

• Minutesort– maxamountsortedin1minute– 116GBin58.7sec(JimWyllie,IBMResearch)– 40-node80-Itaniumcluster,SANarrayof2,520disks

• 2009,500GBHadoop 1406nodesx(2QuadcoreXeons,8GBmemory,4SATA)OwenO'Malley andArunMurthyYahooInc.

• Performance/PriceSortandPennySort

SortBenchmark• http://sortbenchmark.org/• SortBenchmarkHomePage• Wehaveanewbenchmarkcallednew GraySort,new inmemoryofthefatherofthesort

benchmarks,JimGray.ItreplacesTeraByteSortwhichisnowretired.• Unlike2010,wewillnotbeacceptingearlyentriesforthe2011year.Thedeadlinefor

submittingentriesisApril1,2011.– Allhardwareusedmustbeoff-the-shelfandunmodified.– ForDaytonaclustersortswhereinputsamplingisusedtodeterminetheoutputpartitionboundaries,theinput

samplingmustbedoneevenlyacrossallinputpartitions.

NewrulesforGraySort:• Theinputfilesizeisnowminimum~100TBor1Trecords.Entrieswithlargerinputsizesalso

qualify.• ThewinnerwillhavethefastestSortedRecs/Min.• Wenowprovideanewinputgeneratorthatworksinparallelandgeneratesbinarydata.See

below.• FortheDaytonacategory,wehavetwonewrequirements.(1)Thesortmustrun

continuously(repeatedly)foraminimum1hour.(Thisisaminimumreliabilityrequirement).(2)Thesystemcannotoverwritetheinputfile.

14.09.17

28

Orderstatistics

• Minimum– thesmallestvalue• Maximum– thelargestvalue• Ingenerali’thvalue.• Findthemedianofthevaluesinthearray• MedianinsortedarrayA:

– nisoddA[(n+1)/2]– niseven– A[(n+1)/2]orA[(n+1)/2]

Orderstatistics

• Input:AsetAofnnumbersandi,1≤i≤n• Output:xfromAthatislargerthanexactlyi-1elementsofA

Q:Findi’thvalueinunsorteddata

A. O(n)B. O(nloglogn)C. O(nlogn)D. O(nlog2 n)

Minimum

Minimum(A)1 min=A[1]2 fori=2tolength(A)3 ifmin>A[i]4 thenmin=A[i]5 returnmin

n-1comparisons.

Minandmaxtogether

• compareeverytwoelementsA[i],A[i+1]• Comparelargeragainstcurrentmax• Smalleragainstcurrentmin

• 3n/2

SelectioninexpectedO(n)

Randomised-select(A,p,r,i)if p=rthen return A[p]q=Randomised-Partition(A,p,r)k=q– p+1 //nrofelementsinsubarrif i<=kthen return Randomised-Partition(A,p,q,i)else return Randomised-Partition(A,q+1,r,i-k)

14.09.17

29

Conclusion

• SortingingeneralO(nlogn)• Quicksortisrathergood

• Lineartimesortingisachievablewhenonedoesnotassumeonlydirectcomparisons

• Findi’thvalueinunsorted– expectedO(n)

• Findi’thvalue:worstcaseO(n)– seeCLRS

Ok…

• lists– aversatiledatastructureforvariouspurposes

• Sorting– atypicalalgorithm(manyways)• Whichsortingmethodsforarray/list?

• Array:mostoftheimportant(e.g.update)tasksseemtobeO(n),whichisbad

Q:searchforavalueXinlinkedlist?

A. O(1)

B. O(logn)

C. O(n)

Canwesearchfasterinlinkedlists?

• Whysortlinkedlistsifsearch anywayO(n)?

• Linkedlists:– whatisthe“mid-point” ofanysublist?– Therefore,binarysearchcannotbeused…

• Orcanit?

SkipList Skiplists

• Buildseverallistsatdifferent“skip” steps

• O(n)list• Level1:~n/2• Level2:~n/4• …• Levellogn~2-3elements…

14.09.17

30

9/14/171:33PM SkipLists 175

SkipLists

+¥-¥

S0

S1

S2

S3

+¥-¥ 10 362315

+¥-¥ 15

+¥-¥ 2315

SkipList

typedefstructnodeStructure*node;typedefstructnodeStructure{keyTypekey;valueTypevalue;nodeforward[1];/*variablesizedarrayof forwardpointers*/

};

9/14/171:33PM SkipLists 177

WhatisaSkipList• Askiplist forasetS ofdistinct(key,element)itemsisaseriesoflists

S0, S1 , … , Sh suchthat– EachlistSi containsthespecialkeys+¥ and-¥– ListS0 containsthekeysofS innondecreasingorder– Eachlistisasubsequenceofthepreviousone,i.e.,

S0 ÊS1 Ê … Ê Sh

– ListSh containsonlythetwospecialkeys• WeshowhowtouseaskiplisttoimplementthedictionaryADT

56 64 78 +¥31 34 44-¥ 12 23 26

+¥-¥

+¥31-¥

64 +¥31 34-¥ 23

S0

S1

S2

S3

Illustrationoflists

-inf inf25 30 47 99173

KeyRight[ .. ] - right links in arrayLeft[ .. ] - left links in array, array size – how high is the list

9/14/171:33PM SkipLists 179

Search• Wesearchforakeyx inaaskiplistasfollows:

– Westartatthefirstpositionofthetoplist– Atthecurrentpositionp,wecomparex withy ¬ key(after(p))

x = y:wereturnelement(after(p))x > y:we“scanforward”x < y:we“dropdown”

– Ifwetrytodropdownpastthebottomlist,wereturnNO_SUCH_KEY• Example:searchfor78

+¥-¥

S0

S1

S2

S3

+¥31-¥

64 +¥31 34-¥ 23

56 64 78 +¥31 34 44-¥ 12 23 26

9/14/171:33PM SkipLists 180

RandomizedAlgorithms• Arandomizedalgorithm

performscointosses(i.e.,usesrandombits)tocontrolitsexecution

• Itcontainsstatementsofthetype

b ¬ random()if b = 0

do A …else { b = 1}

do B … • Itsrunningtimedependsonthe

outcomesofthecointosses

• Weanalyzetheexpectedrunningtimeofarandomizedalgorithmunderthefollowingassumptions– thecoinsareunbiased,and– thecointossesare

independent• Theworst-caserunningtimeof

arandomizedalgorithmisoftenlargebuthasverylowprobability(e.g.,itoccurswhenallthecointossesgive“heads”)

• Weusearandomizedalgorithmtoinsertitemsintoaskiplist

14.09.17

31

9/14/171:33PM SkipLists 181

• Toinsertanitem(x, o) intoaskiplist,weusearandomizedalgorithm:– Werepeatedlytossacoinuntilwegettails,andwedenotewithi the

numberoftimesthecoincameupheads– Ifi ³ h,weaddtotheskiplistnewlistsSh+1, … , Si +1,eachcontaining

onlythetwospecialkeys– Wesearchforx intheskiplistandfindthepositionsp0, p1 , …, pi ofthe

itemswithlargestkeylessthanx ineachlistS0, S1, … , Si

– Forj ¬ 0, …, i,weinsertitem(x, o) intolistSj afterpositionpj

• Example:insertkey15,withi = 2

Insertion

+¥-¥ 10 36

+¥-¥

23

23 +¥-¥

S0

S1

S2

+¥-¥

S0

S1

S2

S3

+¥-¥ 10 362315

+¥-¥ 15

+¥-¥ 2315p0

p1

p2

9/14/171:33PM SkipLists 182

Deletion

• Toremoveanitemwithkeyx fromaskiplist,weproceedasfollows:– Wesearchforx intheskiplistandfindthepositionsp0, p1 , …, pi ofthe

itemswithkeyx,wherepositionpj isinlistSj

– Weremovepositionsp0, p1 , …, pi fromthelistsS0, S1, … , Si

– Weremoveallbutonelistcontainingonlythetwospecialkeys• Example:removekey34

-¥ +¥4512

-¥ +¥

23

23-¥ +¥

S0

S1

S2

-¥ +¥

S0

S1

S2

S3

-¥ +¥4512 23 34

-¥ +¥34

-¥ +¥23 34p0

p1

p2

9/14/171:33PM SkipLists 183

Implementationv2

• Wecanimplementaskiplistwithquad-nodes

• Aquad-nodestores:– item– linktothenodebefore– linktothenodeafter– linktothenodebelow– linktothenodeafter

• Also,wedefinespecialkeysPLUS_INFandMINUS_INF,andwemodifythekeycomparatortohandlethem

x

quad-node

9/14/171:33PM SkipLists 184

SpaceUsage

• Thespaceusedbyaskiplistdependsontherandombitsusedbyeachinvocationoftheinsertionalgorithm

• Weusethefollowingtwobasicprobabilisticfacts:Fact1: Theprobabilityofgettingi

consecutiveheadswhenflippingacoinis1/2i

Fact2: Ifeachofn itemsispresentinasetwithprobabilityp,theexpectedsizeofthesetisnp

• Consideraskiplistwithn items– ByFact1,weinsertanitemin

listSi withprobability1/2i

– ByFact2,theexpectedsizeoflistSi isn/2i

• Theexpectednumberofnodesusedbytheskiplistis

nnn h

ii

h

ii 2

21

2 00<= åå

==

Thus, the expected space usage of a skip list with nitems is O(n)

9/14/171:33PM SkipLists 185

Height

• Therunningtimeofthesearchaninsertionalgorithmsisaffectedbytheheighth oftheskiplist

• Weshowthatwithhighprobability,askiplistwithnitemshasheightO(log n)

• Weusethefollowingadditionalprobabilisticfact:Fact3: Ifeachofn eventshas

probabilityp,theprobabilitythatatleastoneeventoccursisatmostnp

• Consideraskiplistwithn items– ByFact1,weinsertaniteminlist

Si withprobability1/2i

– ByFact3,theprobabilitythatlistSi hasatleastoneitemisatmostn/2i

• Bypickingi = 3log n,wehavethattheprobabilitythatS3log nhasatleastoneitemisatmost

n/23log n = n/n3 = 1/n2

• Thusaskiplistwithn itemshasheightatmost3log n withprobabilityatleast1 - 1/n2

9/14/171:33PM SkipLists 186

SearchandUpdateTimes• Thesearchtimeinaskiplistis

proportionalto– thenumberofdrop-downsteps,

plus– thenumberofscan-forward

steps• Thedrop-downstepsare

boundedbytheheightoftheskiplistandthusareO(log n) withhighprobability

• Toanalyzethescan-forwardsteps,weuseyetanotherprobabilisticfact:Fact4:Theexpectednumberof

cointossesrequiredinordertogettailsis2

• Whenwescanforwardinalist,thedestinationkeydoesnotbelongtoahigherlist– Ascan-forwardstepisassociated

withaformercointossthatgavetails

• ByFact4,ineachlisttheexpectednumberofscan-forwardstepsis2

• Thus,theexpectednumberofscan-forwardstepsisO(log n)

• WeconcludethatasearchinaskiplisttakesO(log n) expectedtime

• Theanalysisofinsertionanddeletiongivessimilarresults

14.09.17

32

9/14/171:33PM SkipLists 187

Summary• Askiplistisadatastructure

fordictionariesthatusesarandomizedinsertionalgorithm

• Inaskiplistwithn items– Theexpectedspaceusedis

O(n)– Theexpectedsearch,

insertionanddeletiontimeisO(log n)

• Usingamorecomplexprobabilisticanalysis,onecanshowthattheseperformanceboundsalsoholdwithhighprobability

• Skiplistsarefastandsimpletoimplementinpractice

Conclusions

• Abstractdatatypeshideimplementations• ImportantisthefunctionalityoftheADT• Datastructuresandalgorithms determinethespeedoftheoperationsondata

• Lineardatastructuresprovidegoodversatility• Sorting– amosttypicalneed/algorithm• SortinginO(nlogn)MergeSort,Quicksort• SolvingRecurrences– meanstoanalyse• Skiplists– logn randomised datastructure