Basic Analysis
Reading Time: 8 minutes
Algorithm analysis means predicting the computational complexity of an algorithm, that is predicting the amount of time, space, communication bandwidth and any other resources the algorithm requires. Most often, it is the time complexity and space complexity of an algorithm that interests us.
Measuring the actual complexity of an algorithm on real hardware, however, is quite difficult. We, therefore, analyze the rate of growth of the resources required by an algorithm with respect to its input size. This gives a simple characterization of the algorithm's efficiency and allows us to compare the relative performance of alternative algorithms.
Time Complexity
In general, the time taken by an algorithm grows with the size of the input, so it can be described that the running time of an algorithm as a function of the size of its input.
\[n \mapsto T(n) \text{ where,}\]
$n$ is input size: It depends on the problem being studied. For example, in sorting, the input size is the number of items in array, whereas, in multiplication, the input size is the total number of bits.
$T(n)$ is running time: It is the sum of the running times of each primitive operations of the pseudocode for a particular input. It is assumed that each step takes a constant amount of time.
Analysis of Insertion Sort
Let us analyze the insertion sort algorithm.
Step 1. Assume a Model of Computation
Before analyzing, we must have a general model of the machine on which our algorithm will run. We, therefore, assume a single processor random-access machine (RAM) model of computation and understand that our algorithms will be implemented as computer programs.
Step 2. Classify Input Based on Resource Use
Sorting a thousand numbers will always take longer than sorting five numbers. Things that interest us, is that, even for inputs of a given size, an algorithm’s running time may depend on which input of that size is given. In sorting, common scenarios include:
- Best Case Scenario: An array is already sorted, e.g.,
array: [1, 2, 3, 4, 5]. - Worst Case Scenario: An array is reverse sorted, e.g.,
array: [5, 4, 3, 2, 1]. - Average Case Scenario: An array is nearly sorted, e.g.,
array: [1, 3, 2, 5, 4].
Step 3. Determine Complexity for Each Input Class
In worst-case scenario, when an array of size $n$ is (almost) reverse sorted, to put an element at $\color{red}i$ in its sorted position we may have to slide atmost $\color{blue}i-1$ elements forward. For example, to put 4 in position we slide one element or, to put 2 in position we slide three elements:
\[ [\color{blue}5,\color{red}4,\color{green}3,2,1\color{black}] \Rightarrow [\color{blue}4,5,\color{red}3,\color{green}2,1\color{black}] \Rightarrow [\color{blue}3,4,5,\color{red}2,\color{green}1\color{black}] \Rightarrow [\color{blue}2,3,4,5,\color{red}1\color{black}] \Rightarrow [1,2,3,4,5] \]
Now, instead of counting exact number of lines to slide an element, let us assume that it takes $c$ steps to put an element in position by sliding each element forward in the sorted sequence. Then, total steps required to sort the array is:
\[T(n) = c \cdot 1 + c \cdot 2 + \cdots + c \cdot (n-1) = c \cdot \left(\frac{1+(n-1)}{2} \cdot (n-1) \right) = \frac{cn^2}{2} - \frac{cn}{2}\]
When an element, however, is already in its position, we still have to test it and it can take atmost $d \cdot n$ steps, where $d$ is a constant term. Then, the total running time in worst-case scenario including the sliding and additional checks will be:
\[T(n) = \frac{cn^2}{2} - \frac{cn}{2} + d \cdot n\]
The above function is the actual time complexity of insertion sort in worst case scenario, when the array is (almost) reverse sorted.
Step 4. Classify the Functions into Set of Functions
We can simplify the obtained functions further by ignoring all the lower order terms and coefficients as higher order term dominates for large values of $n$. The complexity can thus be described and classified as:
Best case complexity: The running time grows linearly when input is almost sorted and belongs to the set of a linear family of algorithms $n \in \Theta(n)$ in best case. The best case complexity determines the lower bound of an algorithm, that is, insertion sort can take no lesser time than $\Theta(n)$ complexity.
Worst case complexity: The running time grows quadratically for every other input and belongs to the set of a quadratic family of algorithms $n \in \Theta(n^2)$ in worst case. The worst case complexity determines the upper bound of an algorithm, that is, insertion sort can take no longer time than $\Theta(n^2)$ complexity.
Average case complexity: Average case input behaves the same as worst case input in this case. Moreover, the average case complexity does not interest us much when studying algorithms in general.
To make a blanket statement for all cases and not just the worst case, we can also classify the insertion sort algorithm into a set of $O(n^2)$ family of algorithms. In general, when no cases are mentioned besides a complexity, it is assumed that the complexity is for the worst-case scenario.
Similarly, complexity can be found for other well-known sorting algorithms also:
| Algorithm | Worst Case | Best Case |
|---|---|---|
| Selection Sort | $\Theta(n^2)$ | $\Theta(n^2)$ |
| Insertion Sort | $\Theta(n^2)$ | $\Theta(n)$ |
| Quicksort | $\Theta(n^2)$ | $\Theta(n \log n)$ |
| Merge Sort | $\Theta(n \log n)$ | $\Theta(n \log n)$ |
From the table, we could clearly define that,
- Once input size becomes large enough, merge sort will always beat insertion sort in the worst case.
- Insertion sort, however, is the most efficient algorithm if the input sequence is almost sorted.
- Quicksort is as inefficient as insertion sort when the array is reverse sorted.
- Selection Sort is inefficient in all situations.
In this way, we can compute time complexity of algorithms and analysing it in such way helps us to leave many unimportant details and focus on the true efficiency of an algorithm.