[Beginning Database Design Solutions, Second Edition]

Title: Solve the inversion counting problem in C#

[Solve the inversion counting problem in C#]

This example uses an interesting algorithm to solve the inversion counting problem in C#. It's very similar to the mergesort algorithm described in my book Essential Algorithms.

An inversion in an array is when two values are out of order. More precisely, if value[a] > value[b] and a < b, then those two values are out of order and count as an inversion.

For example, the array [1, 4, 3, 2, 5] has 3 inversions: {4, 3}, {4, 2}, and {3, 2}.

In the inversion counting problem, the goal is to count the number of inversions in an array.

This example demonstrates two approaches: a simple O(N²) algorithm and a more complicated but faster O(N log N) approach.

O(N²) Algorithm

One very simple solution is to loop through each item in the array. For each item A, you loop through the following items and count the number of items that are smaller than the A. The following code shows how the program implements this simple algorithm.

// Count inversions the simple O(N^2) way. private int N2Count(int[] values) { int count = 0; for (int i = 0; i < values.Length - 1; i++) { for (int j = i + 1; j < values.Length; j++) { if (values[i] > values[j]) count++; } } return count; }

That works, but it takes O(N²) time where N is the size of the array. That's fast enough for practical purposes, but we can do better. Before I describe that algorithm, however, you should know a bit about mergesort.

Mergesort

Mergesort is one of many O(N log N) sorting algorithms. The way it works is it first splits the array into two equally sized pieces. It then recursively calls itself to split the pieces into smaller and smaller chunks. When a piece contains a single item, it is already sorted. (Because a single item is always in sorted order.)

As the recursions return, the algorithm merges the sorted sub-algorithms to make bigger and bigger sorted pieces until the entire array is sorted.

For example, consider these values:

[3, 5, 7, 2, 4, 1, 8, 6]

The algorithm first splits this into two pieces:

[3, 5, 7, 2] [4, 1, 8, 6]

Next it recursively splits those pieces:

[3, 5] [7, 2] [4, 1] [8, 6]

And it recursively splits the pieces again:

[3] [5] [7] [2] [4] [1] [8] [6]

Each of the subarrays is sorted because it contains a single item.

Now the last recursive calls return and merge their subarrays. For example, the calls that reached subarrays [3] and [5] return and the calling code merges those subarrays to get [3, 5]. This is no big deal because those items were already in order.

Similarly the calls that reached [2] and [7] return and the calling code merges them to get [7, 2]. This time the merge code had to rearrange the 2 and 7 to put them in the right order.

The remaining calls at this level merge subarrays to get [1, 4] and [6, 8].

Now those calls return and the higher-level calls merge the new subarrays. The values [3, 5] and [2, 7] merge to get [2, 3, 5, 7]. Similarly the values [4, 1] and [6, 8] merge to get [1, 4, 6, 8].

Finally the first call to the algorithm merges those subarrays to get [1, 2, 3, 4, 5, 6, 7, 8].

O(N log N) Algorithm

The inversion counting algorithm follows a similar approach.

First, here's the method that recursively splits the array into smaller and smaller pieces.

private int MergesortAndCount(int[] arr, int l, int r) { int count = 0; if (l < r) { int m = (l + r) / 2; count += MergesortAndCount(arr, l, m); count += MergesortAndCount(arr, m + 1, r); count += MergeAndCount(arr, l, m, r); } return count; }

The parameters are the array arr, and the indexes l and r of the leftmost and rightmost items in this call's subarray.

The code creates variable count to keep track of the number of inversions.

If l < r, then the subarray contains more than one item so the method must split it. To do that, it first calculates the middle of the subarray m.

Next the method recursively calls itself to sort the left and right halves of its subarray. Those calls to MergesortAndCount return the number of inversions that they found, and this call adds those to its count variable.

Having sorted the two halves of the subarray, the method calls MergeAndCount to merge the two halves. That method returns the number of inversions that it finds, so this method adds those to count.

The method finishes by returning its count.

The following code shows the MergeAndCount method.

// Merge the pieces and return the number of inversions. private int MergeAndCount(int[] arr, int l, int m, int r) { int left_length = m - l + 1; int[] left = new int[left_length]; Array.Copy(arr, l, left, 0, left_length); int right_length = r - (m + 1) + 1; int[] right = new int[right_length]; Array.Copy(arr, m + 1, right, 0, right_length); int i = 0; int j = 0; int k = l; int count = 0; while (i < left_length && j < right_length) { if (left[i] <= right[j]) { arr[k++] = left[i++]; } else { arr[k++] = right[j++]; count += left_length - i; } } if (i < left_length) { Array.Copy(left, i, arr, k, left_length - i); } if (j < right_length) { Array.Copy(right, j, arr, k, right_length - j); } return count; }

This method first copies the left and right subarrays into new arrays left and right. The left half includes the indices l through m inclusive. The right half includes the indices m + 1 through r inclusive.

Next the code enters a while loop where i iterates over the left array and j iterates over the right array. The value k keeps track of the next position in the array arr. (It starts at l so the code copies the left and right values back into the same indices l through r where the values started.)

Each time through the loop, the code compares left[i] to right[j] to see which value is smaller. It copies the smaller value to the next available position arr[k], increments k, and increments i or j as appropriate.

If right[j] is smaller than left[i], then that represents an inversion. When we move right[i] into arr, we move it to the left of each of the items remaining in the left array. The value right[j] was originally to the right of those items, so each of those represents an inversion.

To count those, the code adds the number of items in left that have not yet been moved into arr. That number is the total number of items in left minus the current position in that array i.

When it has finished copying either left or right, the code copies the remaining items from the other array back into arr.

When the method has finished copying all of the left and right values back into arr, now in sorted order, the method returns its inversion count.

The following code shows the method that invokes NlogNCount to calculate the number of inversions.

private int NlogNCount(int[] values) { return MergesortAndCount(values, 0, values.Length - 1); }

This method simply calls MergesortAndCount, passing it the array and the indices of its first and last values.

The rest of the program is relatively straightforward. If you enter a number in the Length text box and click Generate, the program creates a random list containing that many integers. When you click Solve, the program uses both algorithms to count the inversions and displays the results.

If you look at the picture at the top of this post, you'll see that both algorithms were very fast finding inversions in an array containing 7,000 items, but the mergesort-style algorithm was much faster, taking only around 2% as long in this example.

Download the example to experiment with it and to see additional details.

Title: Solve the inversion counting problem in C#

O(N2) Algorithm

Mergesort

O(N log N) Algorithm

O(N²) Algorithm