Chapter 2: Complexity Analysis

Objectives

Looking ahead – in this chapter, we’ll consider:• Computational and Asymptotic Complexity• Big-O Notation• Properties of the Big-O Notation• Ω and Θ Notations• Possible Problems with the Notation

2Data Structures and Algorithms in C++, Fourth Edition

Objectives (continued)

• Examples of Complexities• Finding Asymptotic Complexity• Best, Average, and Worst Cases• Amortized Complexity• NP-Completeness


Computational and Asymptotic Complexity

• Algorithms are an essential aspect of data structures• Data structures are implemented using algorithms• Some algorithms are more efficient than others• Efficiency is preferred; we need metrics to compare them• An algorithm’s complexity is a function describing the

efficiency of the algorithm in terms of the amount of data the algorithm must process

• There are two main complexity measures of efficiency


Computational and Asymptotic Complexity (continued)

• Time complexity describes the amount of time an algorithm takes in terms of the amount of input

• Space complexity describes the amount of memory (space) an algorithm takes in terms of the amount of input

• For both measures, we are interested in the algorithm’s asymptotic complexity

• This asks: when n (number of input items) goes to infinity, what happens to the algorithm’s performance?


Computational and Asymptotic Complexity (continued)

• To illustrate this, consider f(n) = n2 + 100n + log10n + 1000• As the value of n increases, the importance of each term

shifts until for large n, only the n2 term is significant

Fig. 2-1 The growth rate of all terms of function f (n) = n2 + 100n + log10n + 1,000.


Big-O Notation

• The most commonly used notation for asymptotic complexity used is "big-O" notation

• In the previous example we would say n2 + 100n + log10n + 1000 = O(n2) (read "big-oh of n squared")

Definition: Let f(n) and g(n) be functions, where n ε Z is a positive integer. We write f(n) = O(g(n)) if and only if there exists a real number c and positive integer N satisfying 0 < f(n) < cg(n) for all n > N. (And we say, "f of n is big-oh of g of n.“)

• This means that functions like n2 + n, 4n2 - n log n + 12, n2/5 - 100n, n log n, and so forth are all O(n2)


Big-O Notation (continued)

• Although the definition of big-O is correct, it lacks important information

• While c and N exist, it does not tell us how to calculate them or what to do if multiple candidates exist (and they often do)

• Consider the function f:f (n) = 2n2 + 3n + 1

and g:g(n) = n2

• Clearly f (n) is O(n2); possible candidates for c and N are shown in the next slide

Data Structures and Algorithms in C++, Fourth Edition 8


Fig. 2.2 Different values of c and N for function f (n) = 2n2 + 3n + 1 = O(n2) calculated according to the definition of big-O

• These values are obtained by solving the inequality from the definition of big-O:

f(n) < cg(n)• Substituting for f(n) and g(n) from the previous slide, we have:

2n2 + 3n + 1 < cn2 or 2 + 3/n + 1 / n2 < c• Since n > N, and N is a positive integer, we can start with N = 1

and substitute in either expression to obtain c



• Generally, we choose an N that allows one term of f to dominate the expression

• There are only two terms to consider: 2n2 and 3n, since the last term is a constant

• As long as n is greater than 1.5, 2n2 dominates the expression• So N must be 2 or more, and c is greater than 3.75 • This illustrates the central point of the definition of big-O, that

f(n) < cg(n) relies on the choices of c and N as stated above• The choice of c depends on the choice of N and vice-versa


Big-O Notation (continued)• For example, if we wanted the relationship to hold for all

positive N starting at 1, c would have to be 6• The graphs of f(n) and cg(n) are shown below:

Fig. 2.3 Comparison of functions for different values of c and N from Figure 2.2



• The graph of f(n) crosses the graph of each cg(n) at N• This does not mean other values of c and N are not useful; we

can choose to start with any c, provided we choose the correct N

• We can also use this information to determine the big-O of a given function

• Consider the function 3n2 + 4n - 2• We need a c and N such that:

3n2 + 4n - 2 < cn2

for all n > N



• Dividing by n2 gives us3 + 4/n – 2/n2 < c

• Choosing N = 1, we need to find a c such that3 + 4 – 2 < c

• We can set c = 6, so we have3n2 + 4n - 2 < 6n2

for all n > 1• So our function is O(n2)• Big-O provides a formal method for expressing asymptotic

upper bounds, bounding the growth of a function from above



• Knowing where a function lies within the big-O hierarchy lets us compare it quickly with other functions

• Thus we have an idea of which algorithm has the best time performance


Properties of Big-O Notation

• The big-O notation is awkward to work with all the time• The following is a list of useful theorems you can use to

simplify big-O calculations• Big-O is transitive: if f(n) = O(g(n)) and g(n) is O(h(n)), then f(n)

= O(h(n))• If f(n) = O(h(n)) and g(n) is O(h(n)), then f(n) + g(n) = O(h(n))• A function ank = O(nk) for any a > 0• Any kth degree polynomial is O(nk+j) for any j > 0


Properties of Big-O Notation (continued)

• f(n) = O(g(n)) is true if limn->∞ f(n)/g(n) is a constant. Put another way, if f(n) = cg(n), then f(n) = O(g(n))

• logan = O(logb n) for any a, b > 1. This means, except for a few cases, we don’t care what base our logarithms are

• Given the preceding, we can use just one base and rewrite the relationship as logan = O(lg n) for positive a ≠ 1 and lg n = log2n


Ω and Θ Notations

• Big-O only gives us the upper bound of a function• So if we ignore constant factors and let n get big enough,

some function will never be bigger than some other function• This can give us too much freedom• Consider that selection sort is O(n3), since n2 is O(n3) -

but O(n2) is a more meaningful upper bound• We need a lower bound, a function that always grows more

slowly than f(n), and a tight bound, a function that grows at about the same rate as f(n)

• Section 2.4 gives a good introduction to these concepts; let’s look at a different way to approach this


Ω and Θ Notations (continued)

• Big-Ω is for lower bounds what big-O is for upper bounds

Definition: Let f(n) and g(n) be functions, where n is a positive integer. We write f(n) = Ω(g(n)) if and only if g(n) = O(f(n)). We say "f of n is omega of g of n.“

• So g is a lower bound for f ; after a certain n, and without regard to multiplicative constants, f will never go below g

• Finally, theta notation combines upper bounds with lower bounds to get tight bound

Definition: Let f(n) and g(n) be functions, where n is a positive integer. We write f(n) = Θ(g(n)) if and only if g(n) = O(f(n)) and g(n) = (f(n)). We say "f of n is theta of g of n."



• There are some additional theorems we can consider when we take Ω and Θ into account; the first four theorems for big-O are also true for Ω and Θ

• Replacing O with Ω and "largest" with "smallest" in the fifth theorem for big-O and it remains true

• f(n) = Ω(g(n)) is true if limn->∞ g(n)/f(n) is a constant

• f(n) = Θ(g(n)) is true if limn->∞ f(n)/g(n) is a non-zero constant• nk = O((1+ ε) n)) for any positive k and ε• This means any polynomial is bound from above by any

exponential



• So an algorithm that runs in polynomial time is (eventually) preferable to an algorithm that runs in exponential time

• (log n)ε = O(n k) for any positive k and ε• This means a logarithm to any power grows more slowly than

a polynomial• So an algorithm that runs in logarithmic time is (eventually)

preferable to an algorithm that runs in polynomial (or from above, exponential) time


Possible Problems

• All the notations we’ve considered focus on comparing algorithms designed to solve the same problem

• We still have to exercise care; it is possible at first glance to eliminate potentially useful candidate functions

• Consider again the definition of big-Of(n) = O(g(n)) if 0 < f(n) < cg(n)

• The number of ns that violate this is finite and can be reduced by proper choice of c

• But if c is extremely large, it can cause us to reject a function g even if the function itself is promising


Possible Problems (continued)

• Consider two algorithms that solve a problem, one requiring 108n steps (O(n)), the other 10n2 (O(n2))

• Using big-O alone, we’d reject the second algorithm, because it grows too fast

• But that is only true if n > 107 and often it is much smaller, implying the second algorithm would be faster than the first

• So we need to consider other factors in our analysis• A “double-O” notation has been proposed in such cases• So f is OO(g(n)) if it is O(g(n)) and the constant c is too large to

be useful, implying 108n is OO(g(n))


Examples of Complexities

• Since we examine algorithms in terms of their time and space complexity, we can classify them this way, too

• This is illustrated in the next figure

Fig. 2.4 Classes of algorithms and their execution times on a computer executing 1 million operations per second (1 sec = 106 μsec = 103 msec)


Examples of Complexities (continued)

Fig. 2.4 (concluded)

• The class of an algorithm is the name used to refer to its big-O notation; it is a more convenient way to describe its behavior

• For example a linear function is O(n); its time increases in direct proportion to the amount of data processed


Examples of Complexities (continued)

• This relationship can also be expressed graphically:

Fig. 2.5 Typical functions applied in big-O estimates.

• This graph, and the previous chart, show that some algorithms have no practical application

• Even with today’s supercomputers, cubic order algorithms or higher are impractical for large numbers of elements


Finding Asymptotic Complexity

• As we have seen, asymptotic bounds are used to determine the time and space efficiency of algorithms

• Generally, we are interested in time complexity, which is based on assignments and comparisons in a program

• We’ll focus on assignments for the time being• Consider a simple loop:

for (i = sum = 0; i < n; i++)sum = sum + a[i]

• Two assignments are executed once (sum = 0 and i = sum) during initialization

• In the loop, sum = sum + a[i] is executed n times


Finding Asymptotic Complexity(continued)

• In addition, the i++ in the loop header is executed n times• So there are 2 + 2n assignments in this loop’s execution and it is

O(n)• Typically, as loops are nested, the complexity grows by a factor

of n, although this isn’t always the case• Consider

for (i = 0; i < n; i++) { for (j = 1, sum = a[0]; j <= i; j++) sum += a[j]; cout << ”sum for subarray 0 through “ << i

<<” is “<<sum<<end1;}



• The outer loop initializes i, then executes n times• During each pass through the loop, the variable i is updated,

and the inner loop and cout statement are executed• The inner loop initializes j and sum each time, so the number

of assignments so far is 1 + 3n• The inner loop executes i times, where i ranges from 1 to n –

1, based on the outer loop (when i is 0, it doesn’t run)• Each time the inner loop executes, it increments j, and

assigns a value to sum• So the inner loop executes = 2(1 + 2 + … + n – 1) = 2n(n – 1)

assignments



• The total number of assignments is then 1 + 3n + 2n(n - 1), which is O(1) + O(n) + O(n2) = O(n2)

• As mentioned earlier, not all loops increase complexity, so care has to be taken to analyze the processing that takes place

• However, additional complexity can be involved if the number of iterations changes during execution

• This can be the case in some of the more powerful searching and sorting algorithms


Best, Average, and Worst Cases

• If we want to truly get a handle on the complexity of more complicated algorithms, we need to distinguish three cases:– Worst case – the algorithm takes the maximum number of steps– Best case – the algorithm takes the fewest number of steps– Average case – performance falls between the extremes

• For simple situations we can determine the average case by adding together the number of steps required for each input and dividing by the number of inputs

• However, this is based on each input occurring with equal probability, which isn’t always likely


Best, Average, and Worst Cases(continued)

• To be more precise, we need to weight the number of steps that occur for a given input by the probability of that input occurring, and sum this over the number of inputs:

• In probability theory, this defines the expected value, which assumes the probabilities can be determined and their distribution known

• Because p is a probability distribution, it satisfies two constraints:– The function p can never be negative– The sum of all the probabilities is equal to 1


i iip input steps input


• Consider the example of sequentially searching an unordered array to find a target value

• The best and worst cases are straightforward: – Best case occurs when we find the target in the first cell– Worst case occurs when we find the target in the last cell, or not at all

(but end up searching the entire array)• For the average case, we first have to consider the probability

of finding the target• If we assume a uniform distribution of n values, then the

probability of finding the target in any one location is


1n


• So we would find the target in the first location with p = 1/n, in the second location with p = 1/n, etc.

• Since the number of steps required to get to each location is the same as the location itself, our sum becomes:

1/n * (1 + 2 + … + n) = (n + 1) / 2• Again, this is based on an equally likely chance of finding the

target in any cell• If the probabilities differ, then the computation becomes

more involved


Amortized Complexity

• In many cases, data structures are manipulated by sequences of operations

• As a consequence, operations early in the sequence can impact the performance of those later in the sequence

• To determine overall performance, we could accumulate the performance for each sequence to determine the result

• This can give very inaccurate results however• A more useful approach is to consider the entire sequence of

operations of the program• This approach is referred to as amortized analysis


Amortized Complexity (continued)

• This allows us to determine a worst-case bound irrespective of the inputs by looking at all of the operations

• The idea is that while some operations may be costly, they do not occur frequently enough to bias the entire program

• This is because less costly operations will outnumber the costly ones in the long run, "paying back" the program over a number of iterations

• This is particularly useful because rather than making assumptions about the program it guarantees worst-case performance


Amortized Complexity (continued)

• Consider a dynamic array application where we double the size of the array each time it fills up

• Array reallocation may be required, so in the worst case insertion may be O(n)

• Yet because remaining insertions are done in constant time, a sequence of n insertions can always be done in O(n) time

• Consequently, the n insertions can be completed in O(n) time• So the amortized time per operation is O(n) / n = O(1)


NP-Completeness

• Informally, a deterministic algorithm is one that behaves predictably

• Given a particular input, the underlying algorithm has only one way to decide what step to perform at any given point

• As opposed to this, a nondeterministic algorithm uses some type of operation to “guess” what to do next when a decision is made

• Consequently, it can exhibit different behaviors on different runs

• There are several ways this can happen; for example concurrent algorithms may experience race conditions


NP-Completeness (continued)

• A decision problem is any arbitrary yes-or-no question on an infinite set of inputs

• Because of this, it can be defined equivalently as the set of inputs for which the problem returns yes

• A nondeterministic algorithm can solve a decision problem if there is a path in the decision tree of the algorithm that leads to a “yes” answer; otherwise it would answer “no”

• If the number of steps in the decision tree path to the affirmative answer is O(nk), where n is the size of the specific problem, the algorithm is considered polynomial

• A problem that can be solved this way is called tractable



• Problems that can be solved by a deterministic algorithm in polynomial time belong to a class P of problems

• If the problem can be solved in polynomial time by a nondeterministic algorithm, it is of class NP

• Class P problems are tractable; NP problems are tractable only if a nondeterministic algorithm is used

• Now NP, which simply means that deterministic algorithms are nondeterministic algorithms that don’t use nondeterministic decisions

• It is also generally believed P ≠ NP, although this is a famous open problem in computer science



• The reason for this belief is the existence of NP-complete problems, which is related to the concept of reducibility

• A problem is reducible if every instance of the problem can be transformed into instances of another problem using a process referred to as a reduction algorithm

• If this transformation can be done efficiently (in polynomial time), then efficient solutions of the second problem can be transformed into efficient solutions of the original problem

• A problem is NP-complete if it is NP and every other problem in NP is reducible to the problem in polynomial time

• Thus, all NP-complete problems are equivalent, computationally



• If we can solve an NP-complete problem with a deterministic algorithm, all NP-complete problems can be solved the same way

• However, if any NP problem is intractable, so are all NP-complete problems

• The reducibility process uses an NP-complete problem to show another problem is NP-complete

• However, there has to be at least one problem that can be proven to be NP-complete through a means other than reducibility to make reduction possible


Chapter 2: Complexity Analysis

Documents

Transcript of Chapter 2: Complexity Analysis