This example shows how you can make a linear least squares fit to a set of data points.

Suppose you have a set of data points that you believe were generated by a process that should ideally be linear. In that case, you might like to find the best parameters `m` and `b` to make the line `y = m * x + b` fit those points as closely as possible.

A common approach to this problem is to minimize the sum of the squares of the vertical distances between the line and the points. For example, suppose the point `P0 = (x0, y0)` is one of your data points. The vertical error squared for that point is the difference between `y0` and the line’s Y coordinate for that X position. In this case, that’s `y0 - (m * x0 + b)`. To calculate the total error squared, you square this error and add up the results for all of the data points.

Keep in mind that you know all of the points so, for given values of `m` and `b`, you can easily loop through all of the points and calculate the error.

Here’s a function that does just that:

// Return the error squared. public static double ErrorSquared(List<PointF> points, double m, double b) { double total = 0; foreach (PointF pt in points) { double dy = pt.Y - (m * pt.X + b); total += dy * dy; } return total; }

This code loops through the points subtracting each point’s Y coordinate from the coordinate of that line at the point’s X position. It squares the error and adds it to the total. When it finishes its loop, the method returns the total of the squared errors.

As a mathematical equation, the error function E is:

where the sum is performed over all of the points (x_{i}, y_{i}).

To find the least squares fit, you need to minimize this function E(m, b). That sounds intimidating until you remember that the x_{i} and y_{i} values are all known—they’re the values you’re trying to fit with the line.

The only variables in this equation are `m` and `b` so it’s relatively easy to minimize this equation by using a little calculus. Simply take the partial derivatives of E with respect to `m` and `b`, set the two resulting equations equal to 0, and solve for `m` and `b`.

Taking the partial derivative with respect to `m` and rearranging a bit to gather common terms and pull constants out of the sums you get:

Taking the partial derivative with respect to `b` and rearranging a bit you get:

To find the minimum for the error function, you set these two equations equal to 0 and solve for `m` and `b`.

To make working with the equations easier, let:

If you make these substitutions and set the equations equal to 0 you get:

Solving for `m` and `b` gives:

Again these look like intimidating equations, but all of the S’s are values that you can calculate given the data points that you are trying to fit.

The following code calculates the S’s and uses them to find the linear least squares fit for the points in a `List<PointF>`.

// Find the least squares linear fit. // Return the total error. public static double FindLinearLeastSquaresFit( List<PointF> points, out double m, out double b) { // Perform the calculation. // Find the values S1, Sx, Sy, Sxx, and Sxy. double S1 = points.Count; double Sx = 0; double Sy = 0; double Sxx = 0; double Sxy = 0; foreach (PointF pt in points) { Sx += pt.X; Sy += pt.Y; Sxx += pt.X * pt.X; Sxy += pt.X * pt.Y; } // Solve for m and b. m = (Sxy * S1 - Sx * Sy) / (Sxx * S1 - Sx * Sx); b = (Sxy * Sx - Sy * Sxx) / (Sx * Sx - S1 * Sxx); return Math.Sqrt(ErrorSquared(points, m, b)); }

For a slightly different explanation, see my DevX article Predicting Your Firm’s Future with Least Squares, Part I (free registration required).

Dear Rod Stephens.

Thank you for your help with Centroid, this info helped me a lot.

I’m not good at programming and mathematics, but still would like to do some project. In this regard, there is also the question about linear approximation. But me need it in a 3D space, in XYZ plane, not XY.

Prompt please, whether it is possible to approximate a point in space, first in the plane XY, and then in the plane XZ, and then apply Z coordinates of the first and last points obtained in the approximation in the plane XZ to the line, approximated in the plane XY?

I can’t find any info about it in internet, only for 2D.

Just CGAL library (http://www.cgal.org/Manual/latest/doc_html/cgal_manual/Principal_component_analysis/Chapter_main.html#Subsection_68.2.5), but it for C++, not C#.

In any case thank you very much!

Sorry for my english.

Best regards,

Alexander

Sorry but I haven’t done this in 3D. It sounds interesting but I don’t have time right now. (If I have the chance, I’ll post a followup here.)

If you search the internet for “linear least squares 3d” you will find some articles that describe how to use linear least squares to fit a line or plane in 3D.

Pingback: Find a polynomial least squares fit for a set of points in C# -

Hi Rod,

I am not sure this actually works. I copied your code, which compiles OK and passed in a list containing these numbers:

X Y

5 -0.15

6 -0.13

7 -0.11

8 -0.09

9 -0.07

10 -0.05

11 -0.03

12 -0.01

13 0.01

14 0.03

I ran this through your code and tested the result by calculating what Y should be when X = 10. I got -0.045165746272104221. The data is a perfect straight line and so are the results calculated using your functions. However they have different gains. Maybe me and if so I apologise.

When I try it, I get y = -0.0500000008127906.

Double check your numbers to make sure they are all entered correctly. You might also set a break point in the Fit button’s Click event handler and verify that the Points array doesn’t contain any extra values that you don’t expect.

If neither of those helps, email me and I can send you the version of the program I used to get the value above.