Basic Analysis

Reading Time: 8 minutes


Algorithm analysis means predicting the computational complexity of an algorithm, that is predicting the amount of time, space, communication bandwidth and any other resources the algorithm requires. Most often, it is the time complexity and space complexity of an algorithm that interests us.

Measuring the actual complexity of an algorithm on real hardware, however, is quite difficult. We, therefore, analyze the rate of growth of the resources required by an algorithm with respect to its input size. This gives a simple characterization of the algorithm's efficiency and allows us to compare the relative performance of alternative algorithms.

Analyzing algorithms on real hardware involves many independent factors to consider, such as

  • Speed of the processor,
  • Size of the memory unit,
  • Load of the operating system,
  • The efficiency of programming languages.

It is, therefore, convenient to measure the computational time of an algorithm so that it is as machine-independent as possible. Measuring efficiency based on input size helps us to correctly analyze our algorithms.

[Got it. Thanks!]

To find the computational complexity of an algorithm for a particular resource, we, generally,

  1. Assume a model of computation on which our algorithm will run.
  2. Classify inputs of the same size such that resource use is at least, at most and on average.
  3. Determine the computational complexity as a function of input size for each input class.
  4. Classify the obtained functions into a set of functions and compare it with other algorithms of each input class.
<div class="card-body collapse p-3 hide" id="does-it-apply-to-all-kinds-of-complexities">
    As computational time is our primary concern, we, therefore, study complexity on the basis of time complexity, but everything applies with minor modifications to the complexity with respect to other resources.

    <a class="d-block d-print-none" href="#does-it-apply-to-all-kinds-of-complexities" data-toggle="collapse"><em>[Got it. Thanks!]</em></a>

</div>

[Got it. Thanks!]

Time Complexity

In general, the time taken by an algorithm grows with the size of the input, so it can be described that the running time of an algorithm as a function of the size of its input.

\[n \mapsto T(n) \text{ where,}\]

  • $n$ is input size: It depends on the problem being studied. For example, in sorting, the input size is the number of items in array, whereas, in multiplication, the input size is the total number of bits.

  • $T(n)$ is running time: It is the sum of the running times of each primitive operations of the pseudocode for a particular input. It is assumed that each step takes a constant amount of time.

Analysis of Insertion Sort

Let us analyze the insertion sort algorithm.

extension Array where Element: Comparable {
  mutating func insertionSorted() {
    // One element is trivially already-sorted so start from the second element.
    var sortedEnd = index(after: startIndex)
    while sortedEnd != endIndex {
      var current = sortedEnd

      /*
       * Look backwards for current's position in the sorted sequence sliding each
       * element forward to make room.
       */
      repeat {
        let predecessor = index(before: current)

        // If current doesn't belong before predecessor, we've found its position.
        if self[predecessor] < self[current] {
          break
        }
        
        swapAt(current, predecessor)
        current = predecessor
      } while current != startIndex

      sortedEnd = index(after: sortedEnd)
    }
  }
}

Step 1. Assume a Model of Computation

Before analyzing, we must have a general model of the machine on which our algorithm will run. We, therefore, assume a single processor random-access machine (RAM) model of computation and understand that our algorithms will be implemented as computer programs.

In this model, it is assumed, that,

  • Instructions are executed one after another, with no concurrent operations.
  • Each operation takes unit time to execute.
  • Loops and functions are not primitive operations.
  • Each memory access takes unit time, with no shortage of memory.
[Got it. Thanks!]

Step 2. Classify Input Based on Resource Use

Sorting a thousand numbers will always take longer than sorting five numbers. Things that interest us, is that, even for inputs of a given size, an algorithm’s running time may depend on which input of that size is given. In sorting, common scenarios include:

  • Best Case Scenario: An array is already sorted, e.g., array: [1, 2, 3, 4, 5].
  • Worst Case Scenario: An array is reverse sorted, e.g., array: [5, 4, 3, 2, 1].
  • Average Case Scenario: An array is nearly sorted, e.g., array: [1, 3, 2, 5, 4].

Step 3. Determine Complexity for Each Input Class

In worst-case scenario, when an array of size $n$ is (almost) reverse sorted, to put an element at $\color{red}i$ in its sorted position we may have to slide atmost $\color{blue}i-1$ elements forward. For example, to put 4 in position we slide one element or, to put 2 in position we slide three elements:

\[ [\color{blue}5,\color{red}4,\color{green}3,2,1\color{black}] \Rightarrow [\color{blue}4,5,\color{red}3,\color{green}2,1\color{black}] \Rightarrow [\color{blue}3,4,5,\color{red}2,\color{green}1\color{black}] \Rightarrow [\color{blue}2,3,4,5,\color{red}1\color{black}] \Rightarrow [1,2,3,4,5] \]

Now, instead of counting exact number of lines to slide an element, let us assume that it takes $c$ steps to put an element in position by sliding each element forward in the sorted sequence. Then, total steps required to sort the array is:

\[T(n) = c \cdot 1 + c \cdot 2 + \cdots + c \cdot (n-1) = c \cdot \left(\frac{1+(n-1)}{2} \cdot (n-1) \right) = \frac{cn^2}{2} - \frac{cn}{2}\]

When an element, however, is already in its position, we still have to test it and it can take atmost $d \cdot n$ steps, where $d$ is a constant term. Then, the total running time in worst-case scenario including the sliding and additional checks will be:

\[T(n) = \frac{cn^2}{2} - \frac{cn}{2} + d \cdot n\]

The above function is the actual time complexity of insertion sort in worst case scenario, when the array is (almost) reverse sorted.

In best-case scenario, when the array is sorted, no sliding will be done. In such cases, the time will only be spent in performing the test on the whole array. Therefore, the total running time will be:

$$T(n) = d \cdot n$$

A similar calculation can be made for average-case scenario, however, the computational complexity will almost be the same as worst-case scenario.

[Got it. Thanks!]

Step 4. Classify the Functions into Set of Functions

We can simplify the obtained functions further by ignoring all the lower order terms and coefficients as higher order term dominates for large values of $n$. The complexity can thus be described and classified as:

  • Best case complexity: The running time grows linearly when input is almost sorted and belongs to the set of a linear family of algorithms $n \in \Theta(n)$ in best case. The best case complexity determines the lower bound of an algorithm, that is, insertion sort can take no lesser time than $\Theta(n)$ complexity.

  • Worst case complexity: The running time grows quadratically for every other input and belongs to the set of a quadratic family of algorithms $n \in \Theta(n^2)$ in worst case. The worst case complexity determines the upper bound of an algorithm, that is, insertion sort can take no longer time than $\Theta(n^2)$ complexity.

  • Average case complexity: Average case input behaves the same as worst case input in this case. Moreover, the average case complexity does not interest us much when studying algorithms in general.

To make a blanket statement for all cases and not just the worst case, we can also classify the insertion sort algorithm into a set of $O(n^2)$ family of algorithms. In general, when no cases are mentioned besides a complexity, it is assumed that the complexity is for the worst-case scenario.

Similarly, complexity can be found for other well-known sorting algorithms also:

AlgorithmWorst CaseBest Case
Selection Sort$\Theta(n^2)$$\Theta(n^2)$
Insertion Sort$\Theta(n^2)$$\Theta(n)$
Quicksort$\Theta(n^2)$$\Theta(n \log n)$
Merge Sort$\Theta(n \log n)$$\Theta(n \log n)$

From the table, we could clearly define that,

  • Once input size becomes large enough, merge sort will always beat insertion sort in the worst case.
  • Insertion sort, however, is the most efficient algorithm if the input sequence is almost sorted.
  • Quicksort is as inefficient as insertion sort when the array is reverse sorted.
  • Selection Sort is inefficient in all situations.

In this way, we can compute time complexity of algorithms and analysing it in such way helps us to leave many unimportant details and focus on the true efficiency of an algorithm.

Even if selection sort belongs to $\Theta(n^2)$ family of algorithms for all kinds of inputs, an important advantage of selection sort is that it makes the minimum amount of swap to sort an array. In situations, where writing is costly, we may opt for selection sort.

This is also a disadvantage of over-simplifying our analysis.

[Got it. Thanks!]

Though merge sort belongs to $\Theta(n \log n)$ family of algorithms for all kind of inputs, an important disadvantage of merge sort is that it takes extra $\Theta(n)$ storage space to sort the array. So, merge sort is avoided when space is limited.

The space requirements can easily be found by analyzing the space complexity of each algorithm.

[Got it. Thanks!]