Sorting Fundamentals
Sorting is the process of arranging a sequence of records in either ascending or descending order based on one or more specified keys.
Stability: A sorting algorithm is considered stable if, for records with identical keys, their relative order remains unchanged after sorting. If the input has r[i] = r[j] with i < j, stability ensures r[i] still appears before r[j] in the output.
Internal Sorting: All data elements reside in memory during the sorting processs.
External Sorting: Data volume exceeds memory capacity, requiring data to be moved between internal and external storage during sorting.
Algorithm Implementations
1. Straight Insertion Sort
This method builds a sorted sequence by taking one element at a time and inserting it into its correct position within the already sorted portion of the array.
// Stable
// Time Complexity: O(N^2)
// Space Complexity: O(1)
void StraightInsert(int* dataset, int count) {
assert(dataset);
for (int i = 0; i < count - 1; i++) {
int current = i;
int value = dataset[current + 1];
while (current >= 0) {
if (value < dataset[current]) {
dataset[current + 1] = dataset[current];
current--;
} else {
break;
}
}
dataset[current + 1] = value;
}
}
- Efficiency increases as the input dataset becomes closer to being sorted.
- Time Complexity: O(N^2)
- Space Complexity: O(1)
- Stability: Stable
2. Shell Sort (Diminishing Increment)
An optimization over insertion sort. It starts by sorting elements far apart and progressively reduces the gap between elements to be compared.
// Unstable
// Time Complexity: O(N^1.3 - N^2)
// Space Complexity: O(1)
void Shell(int* dataset, int count) {
assert(dataset);
int gap = count;
while (gap > 1) {
gap = gap / 3 + 1;
for (int i = 0; i < count - gap; i++) {
int current = i;
int value = dataset[current + gap];
while (current >= 0) {
if (value < dataset[current]) {
dataset[current + gap] = dataset[current];
current -= gap;
} else {
break;
}
}
dataset[current + gap] = value;
}
}
}
- Time complexity is difficult to pin down precisely due to gap selection strategies.
- Stability: Unstable
3. Selection Sort
Repeatedly finds the minimum (or maximum) element from the unsorted portion and moves it to the beginning.
// Unstable
// Time Complexity: O(N^2)
// Space Complexity: O(1)
void Selection(int* dataset, int count) {
assert(dataset);
int start = 0, finish = count - 1;
while (start < finish) {
int min_idx = start, max_idx = start;
for (int i = start + 1; i <= finish; i++) {
if (dataset[i] < dataset[min_idx]) min_idx = i;
if (dataset[i] > dataset[max_idx]) max_idx = i;
}
Swap(&dataset[start], &dataset[min_idx]);
if (start == max_idx) max_idx = min_idx;
Swap(&dataset[finish], &dataset[max_idx]);
start++; finish--;
}
}
- Generally inefficient for large datasets.
- Time Complexity: O(N^2)
- Stability: Unstable
4. Heap Sort
Leverages the heap data structure. For ascending order, a max-heap is constructed. The root (maximum) is swapped with the last element, and the heap property is restored.
// Unstable
// Time Complexity: O(N log N)
// Space Complexity: O(1)
static void SiftDown(int* dataset, int size, int root) {
int child = root * 2 + 1;
while (child < size) {
if (child + 1 < size && dataset[child + 1] > dataset[child]) child++;
if (dataset[root] < dataset[child]) {
Swap(&dataset[root], &dataset[child]);
root = child;
child = root * 2 + 1;
} else break;
}
}
void Heapify(int* dataset, int count) {
for (int i = (count - 2) / 2; i >= 0; i--) SiftDown(dataset, count, i);
}
void Heap(int* dataset, int count) {
Heapify(dataset, count);
int end = count - 1;
while (end > 0) {
Swap(&dataset[0], &dataset[end]);
SiftDown(dataset, end, 0);
end--;
}
}
- Time Complexity: O(N log N)
- Stability: Unstable
5. Bubble Sort
Repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order.
// Stable
// Time Complexity: O(N^2)
// Space Complexity: O(1)
void Bubble(int* dataset, int count) {
assert(dataset);
for (int i = 0; i < count - 1; i++) {
int swapped = 0;
for (int j = 0; j < count - 1 - i; j++) {
if (dataset[j] > dataset[j + 1]) {
Swap(&dataset[j], &dataset[j + 1]);
swapped = 1;
}
}
if (!swapped) break;
}
}
- Easy to implement but rarely used in practice for performance reasons.
- Time Complexity: O(N^2)
- Stability: Stable
6. Quick Sort
Uses a divide-and-conquer strategy. A pivot is selected, and the array is partitioned so that elements less than the pivot are on the left, and greater elements are on the right.
Partition Methods
Hoare Partition:
static int HoarePartition(int* a, int lo, int hi) {
int pivot_val = a[hi];
int i = lo - 1, j = hi + 1;
while (1) {
do { i++; } while (a[i] < pivot_val);
do { j--; } while (a[j] > pivot_val);
if (i >= j) return j;
Swap(&a[i], &a[j]);
}
}
Diggging Method:
static int DigPartition(int* a, int lo, int hi) {
int pivot = a[hi];
int low = lo, high = hi;
while (low < high) {
while (low < high && a[low] <= pivot) low++;
a[high] = a[low];
while (low < high && a[high] >= pivot) high--;
a[low] = a[high];
}
a[low] = pivot;
return low;
}
Forward-Backward Pointers:
static int TwoPointerPartition(int* a, int lo, int hi) {
int pivot_val = a[hi];
int prev = lo - 1;
for (int cur = lo; cur < hi; cur++) {
if (a[cur] < pivot_val) {
prev++;
if (prev != cur) Swap(&a[prev], &a[cur]);
}
}
Swap(&a[prev + 1], &a[hi]);
return prev + 1;
}
Recursive Implementation
// Unstable
// Time Complexity: O(N log N)
// Space Complexity: O(log N)
void Quick(int* a, int lo, int hi) {
if (lo >= hi) return;
int p = TwoPointerPartition(a, lo, hi);
Quick(a, lo, p - 1);
Quick(a, p + 1, hi);
}
Optimizations
Median-of-Three: Selects the median of the first, middle, and last elements as the pivot to avoid worst-case scenarios on sorted data.
static int MedianIndex(int* a, int lo, int hi) {
int mid = lo + (hi - lo) / 2;
if ((a[lo] < a[mid] && a[mid] < a[hi]) || (a[hi] < a[mid] && a[mid] < a[lo])) return mid;
if ((a[mid] < a[lo] && a[lo] < a[hi]) || (a[hi] < a[lo] && a[lo] < a[mid])) return lo;
return hi;
}
Small Range Optimization: Switch to Insertion Sort for small sub-arrays (e.g., size < 10) to reduce recursion overhead.
void OptimizedQuick(int* a, int lo, int hi) {
if (lo >= hi) return;
if (hi - lo + 1 < 10) {
StraightInsert(a + lo, hi - lo + 1);
} else {
int mid = MedianIndex(a, lo, hi);
Swap(&a[mid], &a[hi]);
int p = DigPartition(a, lo, hi);
OptimizedQuick(a, lo, p - 1);
OptimizedQuick(a, p + 1, hi);
}
}
Non-Recursive Implementation
Uses an explicit stack to manage sub-array boundaries.
void QuickIterative(int* a, int lo, int hi) {
Stack st;
StackInit(&st);
StackPush(&st, hi);
StackPush(&st, lo);
while (!StackEmpty(&st)) {
int l = StackTop(&st); StackPop(&st);
int h = StackTop(&st); StackPop(&st);
if (l >= h) continue;
int p = HoarePartition(a, l, h);
if (p - 1 > l) { StackPush(&st, p - 1); StackPush(&st, l); }
if (h > p + 1) { StackPush(&st, h); StackPush(&st, p + 1); }
}
StackDestroy(&st);
}
- Often the fastest general-purpose comparison sort in practice.
- Time Complexity: O(N log N)
- Stability: Unstable
7. Merge Sort
A divide-and-conquer algorithm that splits the array into halves, recursively sorts them, and then merges the sorted halves.
Recursive Implementation
// Stable
// Time Complexity: O(N log N)
// Space Complexity: O(N)
static void MergeRecurse(int* src, int* dest, int l, int r) {
if (l >= r) return;
int m = l + (r - l) / 2;
MergeRecurse(src, dest, l, m);
MergeRecurse(src, dest, m + 1, r);
int i = l, j = m + 1, k = l;
while (i <= m && j <= r) dest[k++] = (src[i] <= src[j]) ? src[i++] : src[j++];
while (i <= m) dest[k++] = src[i++];
while (j <= r) dest[k++] = src[j++];
memcpy(src + l, dest + l, (r - l + 1) * sizeof(int));
}
void Merge(int* a, int n) {
int* temp = malloc(n * sizeof(int));
MergeRecurse(a, temp, 0, n - 1);
free(temp);
}
Non-Recursive Implementation
void MergeIterative(int* a, int n) {
int* temp = malloc(n * sizeof(int));
for (int width = 1; width < n; width *= 2) {
for (int i = 0; i < n; i += 2 * width) {
int l = i, m = fmin(i + width, n), r = fmin(i + 2 * width, n);
int p1 = l, p2 = m, k = l;
while (p1 < m && p2 < r) temp[k++] = (a[p1] <= a[p2]) ? a[p1++] : a[p2++];
while (p1 < m) temp[k++] = a[p1++];
while (p2 < r) temp[k++] = a[p2++];
memcpy(a + l, temp + l, (r - l) * sizeof(int));
}
}
free(temp);
}
External Sorting Aplpication
Merge Sort is ideal for external sorting (data too large for RAM). Data is split into manageable chunks, sorted in memory (using Quick Sort or similar), written to files, and then merged file-by-file.
void MergeFiles(const char* in1, const char* in2, const char* out) {
FILE *f1 = fopen(in1, "r"), *f2 = fopen(in2, "r"), *outf = fopen(out, "w");
int v1, v2, r1 = fscanf(f1, "%d", &v1), r2 = fscanf(f2, "%d", &v2);
while (r1 != EOF && r2 != EOF) {
if (v1 <= v2) { fprintf(outf, "%d\n", v1); r1 = fscanf(f1, "%d", &v1); }
else { fprintf(outf, "%d\n", v2); r2 = fscanf(f2, "%d", &v2); }
}
while (r1 != EOF) { fprintf(outf, "%d\n", v1); r1 = fscanf(f1, "%d", &v1); }
while (r2 != EOF) { fprintf(outf, "%d\n", v2); r2 = fscanf(f2, "%d", &v2); }
fclose(f1); fclose(f2); fclose(outf);
}
- Time Complexity: O(N log N)
- Space Complexity: O(N)
- Stability: Stable
8. Counting Sort
A non-comparative sorting algorithm. It counts the number of objects having distinct key values and uses arithmetic to determine positions.
// Stable
// Time Complexity: O(N + Range)
// Space Complexity: O(Range)
void Counting(int* a, int n) {
int min = a[0], max = a[0];
for (int i = 1; i < n; i++) {
if (a[i] < min) min = a[i];
if (a[i] > max) max = a[i];
}
int range = max - min + 1;
int* count = calloc(range, sizeof(int));
for (int i = 0; i < n; i++) count[a[i] - min]++;
int idx = 0;
for (int i = 0; i < range; i++)
while (count[i]--) a[idx++] = i + min;
free(count);
}
- Efficient only when the range of input data (max-min) is not significantly larger than the number of elements.
- Only works for integer data.
- Time Complexity: O(N + Range)
- Stability: Stable
Performance Comparison
| Algorithm | Average Time | Space | Stable |
|---|---|---|---|
| Insertion | O(N^2) | O(1) | Yes |
| Shell | O(N^1.3) | O(1) | No |
| Selection | O(N^2) | O(1) | No |
| Heap | O(N log N) | O(1) | No |
| Bubble | O(N^2) | O(1) | Yes |
| Quick | O(N log N) | O(log N) | No |
| Merge | O(N log N) | O(N) | Yes |
| Counting | O(N + Range) | O(Range) | Yes |