Standard deviation is a measurement of how spread out the values in a normally distributed sample are. A standard deviation close to 0 means the values are relatively close together. A large standard deviation means they are spread far apart.

You can also use standard deviation as an indication of how far from the mean a values is. The picture on the right (from Wikipedia) shows the standard deviations for a set of data. For example, 34.1% of the values in a data set lie within 1 standard deviation of the mean. To think of it another way, if you pick a random value from among the data, there is only a 0.1% chance that it lies more than 3 standard deviations away from the mean.

Sometimes you can extend that to get an idea of the likeliness of a scenario. For example, suppose you give a test to 100 students, the mean is 75, and the standard deviation is 10. Then there’s only a 0.1% chance that a randomly chosen student has a score below 45. It is important to know that such students may still exist, they will jst be rare.

The standard deviation is defined as the square root of the variance.

To calculate variance, you calculate the mean (add up all the values and divide by the number of values). Then you loop through the values and calculate the square of the difference between each value and the mean. You average those squared differences and that’s the variance.

Mathematically the equation for the sample standard deviation is:

Actually there’s one more twist. You calculate the standard deviation for a population like this:

The difference between the two definitions is somewhat subtle. If you have a complete population of values, in other words values for every member of a group of values, then you use the second population definition. If you have only a subset of the values for a population and you want to deduce something about the population as a whole (for example, you only polled 10% of the electorate), then you use the population definition. For more on this issue, see these posts at eard Statistics and Libweb.

This example uses the following extension method to calculate standard deviation.

// Return the standard deviation of an array of Doubles. // // If the second argument is True, evaluate as a sample. // If the second argument is False, evaluate as a population. public static double StdDev(this IEnumerable<int> values, bool as_sample) { // Get the mean. double mean = values.Sum() / values.Count(); // Get the sum of the squares of the differences // between the values and the mean. var squares_query = from int value in values select (value - mean) * (value - mean); double sum_of_squares = squares_query.Sum(); if (as_sample) { return Math.Sqrt(sum_of_squares / (values.Count() - 1)); } else { return Math.Sqrt(sum_of_squares / values.Count()); } }

The calculation is straightforward.

The oddest thing about this example is the fact that one extension method cannot calculate standard deviation for more than one data type. This version assumes the values are in an `IEnumerable` containing integers, but the exact same calculation works for floats, doubles, and other numeric data types.

You could try to make this a generic extension method but unfortunately you cannot specify that the parameter data type must be numeric. You might be able to use an `IEnumerable<object>` and convert the objects into doubles, but that seems inelegant and inefficient. If you figure out how to do this properly without making a separate extension method for each data type, please let me know. (If you look at the documentation, it seems that pre-defined extension methods such as `Average` that perform calculations on numeric types have multiple versions for different data types, one for int, one for float, etc.)

Using the extension method is easy. The following code shows how the main program uses it to display the standard deviations for the list of integers named `values`.

// Display statistics. txtAverage.Text = values.Average().ToString("0.00"); txtStddevSample.Text = values.StdDev(true).ToString("0.00"); txtStddevPopulation.Text = values.StdDev(false).ToString("0.00");

The program also contains commented code to make `values` an array instead of a list. Either will work because both implement `IEnumerable`.

Pingback: Use an improved extension method to calculate standard deviation in C# - C# HelperC# Helper