Loading...

Merge sort Analysis

Merge Sort

Merge Sort is a comparison-based, stable, and efficient sorting algorithm that follows the "divide and conquer" strategy. This means that it breaks down a problem into smaller, more manageable subproblems until the solution to the original problem becomes evident. Merge Sort accomplishes this by recursively dividing the unsorted list into smaller sub-lists, sorting them, and then merging them back together.

Process

The core steps of the Merge Sort algorithm can be summarized as follows:

  1. Divide: The unsorted list is divided into two equal halves until each sublist contains only one element. This is achieved through a process of repeated halving, similar to a binary tree structure.
  2. Conquer: Each of these smaller sub-lists is recursively sorted. This is done by applying the Merge Sort algorithm to each sub-list until they contain only one element, which is inherently sorted.
  3. Merge: The sorted sub-lists are merged back together, combining them in a way that ensures the final merged list is sorted as well.

 

merge-sort-example-0.png

 

Code

#include <iostream>
#include <vector>
void merge(std::vector<int>& arr, int left, int mid, int right) {
    int n1 = mid - left + 1;
    int n2 = right - mid;
    // Create temporary arrays
    std::vector<int> leftArr(n1);
    std::vector<int> rightArr(n2);
    // Copy data to temp arrays leftArr[] and rightArr[]
    for (int i = 0; i < n1; i++) {
        leftArr[i] = arr[left + i];
    }
    for (int j = 0; j < n2; j++) {
        rightArr[j] = arr[mid + 1 + j];
    }
    // Merge the two sorted arrays back into arr
    int i = 0; // Initial index of left subarray
    int j = 0; // Initial index of right subarray
    int k = left; // Initial index of merged subarray
    while (i < n1 && j < n2) {
        if (leftArr[i] <= rightArr[j]) {
            arr[k] = leftArr[i];
            i++;
        } else {
            arr[k] = rightArr[j];
            j++;
        }
        k++;
    }
    // Copy the remaining elements of leftArr[], if any
    while (i < n1) {
        arr[k] = leftArr[i];
        i++;
        k++;
    }
    // Copy the remaining elements of rightArr[], if any
    while (j < n2) {
        arr[k] = rightArr[j];
        j++;
        k++;
    }
}
void mergeSort(std::vector<int>& arr, int left, int right) {
    if (left < right) {
        int mid = left + (right - left) / 2;
        // Sort the first and second halves
        mergeSort(arr, left, mid);
        mergeSort(arr, mid + 1, right);
        // Merge the sorted halves
        merge(arr, left, mid, right);
    }
}
int main() {
    std::vector<int> arr = {38, 27, 43, 3, 9, 82, 10};
    int n = arr.size();
    std::cout << "Original array: ";
    for (int num : arr) {
        std::cout << num << " ";
    }
    std::cout << std::endl;
    mergeSort(arr, 0, n - 1);
    std::cout << "Sorted array: ";
    for (int num : arr) {
        std::cout << num << " ";
    }
    std::cout << std::endl;
    return 0;
}
// Output
Original array: 38 27 43 3 9 82 10 
Sorted array: 3 9 10 27 38 43 82 

 

Complexity

The time complexity of the Merge Sort algorithm is O(n log n) in all cases, which makes it an efficient and reliable sorting algorithm.

Here's a breakdown of the time complexity in different cases:

Worst-case time complexity (O(n log n))

Merge Sort consistently achieves a time complexity of O(n log n) regardless of the initial order of elements in the array. This is because it always divides the array into two halves, recursively sorts them, and then merges them. The merging step, which involves comparing and combining elements from two sorted subarrays, takes O(n) time, and since the array is divided into log₂(n) levels (where n is the number of elements), the total time complexity remains O(n log n).

Average-case time complexity (O(n log n))

Merge Sort also has an average-case time complexity of O(n log n). It performs well on random or semi-sorted data, as its divide-and-conquer approach ensures that the array is consistently divided into approximately equal parts, leading to the log₂(n) levels of recursion.

Best-case time complexity (O(n log n))

Unlike some other sorting algorithms that may have a better best-case time complexity for already sorted data, Merge Sort maintains a consistent O(n log n) time complexity in the best case as well. Even when the data is nearly sorted, Merge Sort still divides the array into halves, resulting in the same number of recursive levels.

Space Complexity

The space complexity of the Merge Sort algorithm is O(n), where "n" is the number of elements in the array being sorted. Merge Sort has a space complexity that is linear with respect to the size of the input data.

The primary reason for this space complexity is the need to create temporary storage for the subarrays during the merging step of the algorithm. Specifically, Merge Sort divides the input array into smaller subarrays repeatedly until they contain only one element each. Then, during the merging phase, it combines these subarrays back into larger sorted subarrays.

To perform the merging step, Merge Sort requires additional memory to temporarily store the subarrays before they are merged. In the worst case, when dealing with large arrays, the space needed for these temporary subarrays can be significant, leading to a space complexity of O(n).

However, it's important to note that Merge Sort's space complexity is typically considered to be "out-of-place" or "not in situ." This means that it uses additional memory for the temporary subarrays and doesn't modify the original array during the sorting process. In contrast, some other sorting algorithms, like Quick Sort, may have better space complexity because they sort the array in place without requiring additional memory for subarrays.

In practice, Merge Sort's space usage is acceptable for most applications, and its stable time complexity performance often makes it a preferred choice for sorting when additional memory usage is not a limiting factor.

Advantages

  1. Stable Sorting: Merge Sort is a stable sorting algorithm, which means it preserves the relative order of equal elements. This property is valuable in many applications where maintaining the original order of equivalent items is important.
  2. Predictable Performance: Merge Sort consistently exhibits a time complexity of O(n log n) in all cases, regardless of the initial order of elements. This predictable performance is particularly advantageous when sorting large datasets where worst-case scenarios can be costly.
  3. Efficient for Large Datasets: Merge Sort's efficient O(n log n) time complexity makes it a reliable choice for sorting large datasets, such as in external sorting algorithms used in database systems.
  4. Parallelization: Merge Sort can be parallelized efficiently, taking advantage of multi-core processors and distributed computing environments. Each subproblem can be sorted independently and then merged back together.
  5. External Sorting: Merge Sort is well-suited for external sorting tasks, where data does not fit entirely in memory and must be sorted from external storage (e.g., on disk). Its divide-and-conquer approach minimizes the number of reads and writes to external storage.

Disadvantages

  1. Space Complexity: Merge Sort has a space complexity of O(n), which means it requires additional memory for storing temporary subarrays during the merging process. In situations with limited available memory, this can be a drawback.
  2. Slower for Small Datasets: For very small datasets, the overhead of function calls and memory allocation during the recursive process can make Merge Sort less efficient compared to simpler sorting algorithms like Insertion Sort or Selection Sort.
  3. Not In-Place: Merge Sort is typically implemented in an "out-of-place" manner, meaning it doesn't modify the original array but instead creates new arrays for subproblems. This may not be suitable when memory usage is a critical concern.
  4. More Complex Implementation: Implementing Merge Sort can be more complex compared to some other sorting algorithms. It requires careful handling of indices and temporary storage for subarrays.