Title: Compare sorting algorithms in C#, part 5 of 5 (Countingsort)
Countingsort
The discussion of Quicksort in my previous post mentions that the fastest possible sorting algorithms that use comparisons take on the order of N * log(N) steps. Countingsort does not use comparisons so it is not bound by that result. In fact Countingsort is so fast that it seems to sort using magic rather than comparisons.
Unfortunately Countingsort only works well very specific circumstances. First, the items you are sorting must be integers. You cannot use Countingsort to sort strings. Second, the range of values that the items have must be fairly limited. If your items range from 1 to 100, Countingsort will work extremely well. If the items range from 1 to 30 thousand, Countingsort still be fast but not quite as fast. If the items range from 1 to 10 billion, you may be better off with Array.Sort.
The basic idea is that the algorithm counts the number of items that have each value. For example, if the list contains 39 items that have value 17, then the algorithm needs to know that.
The algorithm starts by allocating a temporary counts array of integers to hold the count for each value. This array has bounds that cover the range of the items. If the items range in value from 0 to MaxValue, then the algorithm creates an array with indexes ranging from 0 to MaxValue like this:
int[] counts = new int[MaxValue + 1];
Because C# arrays always have 0 for a lower bound, working with a range of values that doesn't start at 0 is trickier. For example, if values range from -100 to +100, the algorithm must take extra steps to translate those indexes into something that will fit in an array. I'll leave that as an exercise for you to work on.
Next, the algorithm looks through the items in the list and increments the counts entry corresponding to that item. When this stage is finished, counts[i] holds a count of the number of items that have the value i.
Note that allocating the counts array initializes all of its entries to 0 so the algorithm doesn't need to do that.
// Count the items.
for (int i = 0; i < NumItems; i++)
{
counts[values[i]]++;
}
The algorithm then runs through the counts array copying the necessary number of items with a particular value into the sorted array. For example, if counts[7] is 12, then the program writes the value 7 into the sorted array 12 times. The following code shows this step.
// Place items in the sorted array.
int index = 0;
for (int i = 0; i <= MaxValue; i++)
{
for (int j = 0; j < counts[i]; j++)
{
values[index++] = i;
}
}
The following code shows the entire algorithm. It's remarkably simple.
// Countingsort.
private void Countingsort(int[] values)
{
// Create the Counts array.
int[] counts = new int[MaxValue + 1];
// Count the items.
for (int i = 0; i < NumItems; i++)
{
counts[values[i]]++;
}
// Place items in the sorted array.
int index = 0;
for (int i = 0; i <= MaxValue; i++)
{
for (int j = 0; j < counts[i]; j++)
{
values[index++] = i;
}
}
}
To sort N numbers that have a range that spans M values, Countingsort executes roughly 2 * N + M steps.
First it reads the N items to fill in the counts array. That takes N steps.
Then the algorithm runs through the M values in the counts array and adds their values to the sorted array. It takes M steps to run through the counts array and the code drops N values into the sorted array, so this second step takes roughly N + M steps.
That makes the total run time on the order of N + (N + M) = 2 * N + M.
If N is large and M is relatively small, then 2 * N + M is much faster than the N * log(N) performance given by Quicksort. To sort 30,000 numbers that ranged from 1 to 1,000, for example, Quicksort might execute more than 300,000 steps. Countingsort would execute only about 61,000 steps and take about a fifth as long.
If you look at the picture above, you'll see that Array.Sort took 2.38 seconds to sort 30,000 items 1,000 times with values between 0 and 1,000 while Countingsort took only 0.58 seconds.
This algorithm easily beats Array.Sort but only when the values have a limited range.
Conclusion
Which sorting algorithm you should choose depends on your data. The Array.Sort method provided by the .NET Framework works extremely well and is easy to use so it should be your first choice. If you find that Array.Sort isn't giving you good enough performance and you have special information about the data you are sorting, you may be able to improve performance by using another algorithm. If the data is almost completely sorted already, you may get better performance from Bubblesort. If the list of items is very small, you may get better performance from Bubblesort or Selectionsort. Finally if the range of values used by the data is very small, Countingsort may be much faster than Array.Sort and the other methods.
Download the example to experiment with it and to see additional details.
|