Basic Analysis

Reading Time: 8 minutes

Algorithm analysis means predicting the computational complexity of an algorithm, that is predicting the amount of time, space, communication bandwidth and any other resources the algorithm requires. Most often, it is the time complexity and space complexity of an algorithm that interests us.

Measuring the actual complexity of an algorithm on real hardware, however, is quite difficult. We, therefore, analyze the rate of growth of the resources required by an algorithm with respect to its input size. This gives a simple characterization of the algorithm's efficiency and allows us to compare the relative performance of alternative algorithms.

Why is it difficult on real hardware?

How to find complexity in general then?

To find the computational complexity of an algorithm for a particular resource, we, generally,

Assume a model of computation on which our algorithm will run.
Classify inputs of the same size such that resource use is at least, at most and on average.
Determine the computational complexity as a function of input size for each input class.
Classify the obtained functions into a set of functions and compare it with other algorithms of each input class.

Does it apply to all kinds of complexities?

<div class="card-body collapse p-3 hide" id="does-it-apply-to-all-kinds-of-complexities">
    As computational time is our primary concern, we, therefore, study complexity on the basis of time complexity, but everything applies with minor modifications to the complexity with respect to other resources.

    <a class="d-block d-print-none" href="#does-it-apply-to-all-kinds-of-complexities" data-toggle="collapse"><em>[Got it. Thanks!]</em></a>

</div>

[Got it. Thanks!]

Time Complexity

In general, the time taken by an algorithm grows with the size of the input, so it can be described that the running time of an algorithm as a function of the size of its input.

\[n \mapsto T(n) \text{ where,}\]

$n$ is input size: It depends on the problem being studied. For example, in sorting, the input size is the number of items in array, whereas, in multiplication, the input size is the total number of bits.
$T(n)$ is running time: It is the sum of the running times of each primitive operations of the pseudocode for a particular input. It is assumed that each step takes a constant amount of time.

Analysis of Insertion Sort

Let us analyze the insertion sort algorithm.

Insertion Sort

extension Array where Element: Comparable {
  mutating func insertionSorted() {
    // One element is trivially already-sorted so start from the second element.
    var sortedEnd = index(after: startIndex)
    while sortedEnd != endIndex {
      var current = sortedEnd

      /*
       * Look backwards for current's position in the sorted sequence sliding each
       * element forward to make room.
       */
      repeat {
        let predecessor = index(before: current)

        // If current doesn't belong before predecessor, we've found its position.
        if self[predecessor] < self[current] {
          break
        }
        
        swapAt(current, predecessor)
        current = predecessor
      } while current != startIndex

      sortedEnd = index(after: sortedEnd)
    }
  }
}

Step 1. Assume a Model of Computation

Before analyzing, we must have a general model of the machine on which our algorithm will run. We, therefore, assume a single processor random-access machine (RAM) model of computation and understand that our algorithms will be implemented as computer programs.

Explain features of RAM.

Step 2. Classify Input Based on Resource Use

Sorting a thousand numbers will always take longer than sorting five numbers. Things that interest us, is that, even for inputs of a given size, an algorithm’s running time may depend on which input of that size is given. In sorting, common scenarios include:

Best Case Scenario: An array is already sorted, e.g., array: [1, 2, 3, 4, 5].
Worst Case Scenario: An array is reverse sorted, e.g., array: [5, 4, 3, 2, 1].
Average Case Scenario: An array is nearly sorted, e.g., array: [1, 3, 2, 5, 4].

Step 3. Determine Complexity for Each Input Class

In worst-case scenario, when an array of size $n$ is (almost) reverse sorted, to put an element at $\color{red}i$ in its sorted position we may have to slide atmost $\color{blue}i-1$ elements forward. For example, to put 4 in position we slide one element or, to put 2 in position we slide three elements:

\[ [\color{blue}5,\color{red}4,\color{green}3,2,1\color{black}] \Rightarrow [\color{blue}4,5,\color{red}3,\color{green}2,1\color{black}] \Rightarrow [\color{blue}3,4,5,\color{red}2,\color{green}1\color{black}] \Rightarrow [\color{blue}2,3,4,5,\color{red}1\color{black}] \Rightarrow [1,2,3,4,5] \]

Now, instead of counting exact number of lines to slide an element, let us assume that it takes $c$ steps to put an element in position by sliding each element forward in the sorted sequence. Then, total steps required to sort the array is:

\[T(n) = c \cdot 1 + c \cdot 2 + \cdots + c \cdot (n-1) = c \cdot \left(\frac{1+(n-1)}{2} \cdot (n-1) \right) = \frac{cn^2}{2} - \frac{cn}{2}\]

When an element, however, is already in its position, we still have to test it and it can take atmost $d \cdot n$ steps, where $d$ is a constant term. Then, the total running time in worst-case scenario including the sliding and additional checks will be:

\[T(n) = \frac{cn^2}{2} - \frac{cn}{2} + d \cdot n\]

The above function is the actual time complexity of insertion sort in worst case scenario, when the array is (almost) reverse sorted.

What about the best case and average case?

Step 4. Classify the Functions into Set of Functions

We can simplify the obtained functions further by ignoring all the lower order terms and coefficients as higher order term dominates for large values of $n$. The complexity can thus be described and classified as:

Best case complexity: The running time grows linearly when input is almost sorted and belongs to the set of a linear family of algorithms $n \in \Theta(n)$ in best case. The best case complexity determines the lower bound of an algorithm, that is, insertion sort can take no lesser time than $\Theta(n)$ complexity.
Worst case complexity: The running time grows quadratically for every other input and belongs to the set of a quadratic family of algorithms $n \in \Theta(n^2)$ in worst case. The worst case complexity determines the upper bound of an algorithm, that is, insertion sort can take no longer time than $\Theta(n^2)$ complexity.
Average case complexity: Average case input behaves the same as worst case input in this case. Moreover, the average case complexity does not interest us much when studying algorithms in general.

To make a blanket statement for all cases and not just the worst case, we can also classify the insertion sort algorithm into a set of $O(n^2)$ family of algorithms. In general, when no cases are mentioned besides a complexity, it is assumed that the complexity is for the worst-case scenario.

Similarly, complexity can be found for other well-known sorting algorithms also:

Algorithm	Worst Case	Best Case
Selection Sort	$\Theta(n^2)$	$\Theta(n^2)$
Insertion Sort	$\Theta(n^2)$	$\Theta(n)$
Quicksort	$\Theta(n^2)$	$\Theta(n \log n)$
Merge Sort	$\Theta(n \log n)$	$\Theta(n \log n)$

From the table, we could clearly define that,

Once input size becomes large enough, merge sort will always beat insertion sort in the worst case.
Insertion sort, however, is the most efficient algorithm if the input sequence is almost sorted.
Quicksort is as inefficient as insertion sort when the array is reverse sorted.
Selection Sort is inefficient in all situations.

In this way, we can compute time complexity of algorithms and analysing it in such way helps us to leave many unimportant details and focus on the true efficiency of an algorithm.

Is Selection Sort really inefficient then?

Is Merge Sort really the best sorting algorithm?