Use VBA code to add and remove a watermark on all pages in a Word document


[watermark]

This post shows one way that you can add and remove a watermark in a Word document. To add a watermark in this way, you add a “building block” to the header of each of the document’s sections. If headers are marked so the first page is different or if odd and even pages have different watermarks, then you need to add the watermark to each separately.

The following code adds a watermark to each page.

Sub AddWatermarks()
Dim doc As Document
Dim sec As Section
Dim hdr As HeaderFooter
Dim rng As Range
Dim strBBPath As String

' Building block names.
Const confidential_1 As String = "CONFIDENTIAL 1"
Const confidential_2 As String = "CONFIDENTIAL 2"
Const do_not_copy_1 As String = "DO NOT COPY 1"
Const do_not_copy_2 As String = "DO NOT COPY 2"
Const draft_1 As String = "DRAFT 1"
Const draft_2 As String = "DRAFT 2"
Const sample_1 As String = "SAMPLE 1"

    ' Where to find the building blocks.
    strBBPath = "C:\Users\" & (Environ$("Username")) & _
        "\AppData\Roaming\Microsoft\Document Building " & _
        "Blocks\1033\14\Built-In Building Blocks.dotx"

    ' Loop through the sections.
    Set doc = ActiveDocument
    For Each sec In doc.Sections
        ' Loop through the section's headers.
        For Each hdr In sec.Headers
            ' Set rng to the end of the header.
            Set rng = hdr.Range
            rng.Start = rng.End
            
            ' Insert the desired building block.
            Application.Templates(strBBPath). _
                BuildingBlockEntries(confidential_1).Insert _
                    Where:=rng, RichText:=True
        Next hdr
    Next sec

    ' Uncomment if it's a big file and
    ' you want to know when it's done.
    'MsgBox "Done"
End Sub

This code declares some variables and then defines strings to identify the six standard watermark types. It then loops through document’s sections. For each section, it loops through the section’s headers. It creates a Range representing the end of the header and adds the desired building block to it.

If the document is long, this can take a while, so the code finishes by displaying a message box to let you know that it is done.

The following code removes watermarks from the active Word document.

' Remove watermarks from the active document.
Sub RemoveWatermarks()
Dim sec As Section

    For Each sec In ActiveDocument.Sections
        RemoveWatermarksFromRange sec.Headers(wdHeaderFooterFirstPage).Range
        RemoveWatermarksFromRange sec.Headers(wdHeaderFooterPrimary).Range
        RemoveWatermarksFromRange sec.Headers(wdHeaderFooterEvenPages).Range
    Next sec
End Sub

This subroutine loops through the document’s sections and calls the following RemoveWatermarksFromRange subroutine for each of the three kinds of headers for each section.

' Remove shapes that have a name containing the
' string PowerPlusWaterMarkObject from this range.
Sub RemoveWatermarksFromRange(rng As Range)
Dim shape_range As Shape
    
    For Each shape_range In rng.ShapeRange
        If (InStr(shape_range.Name, "PowerPlusWaterMarkObject") > 0) Then
            shape_range.Delete
        End If
    Next shape_range
End Sub

This code loops through the range’s shapes. If a shape has a name that contains the string “PowerPlusWaterMarkObject,” then it is a watermark so the code deletes it.

Note that this code worked for me, but I have not tested it extensively. I recommend that you make a copy of your document before you run the code on it, just in case something goes wrong.

I didn’t need a custom watermark, so this code does not deal with those. If you write code to add and remove custom watermarks, or if you make other changes that you think might help someone else, please post a comment below.


Download Example   Follow me on Twitter   RSS feed   Donate



Posted in Office, VBA | Tagged , , , , , , , | Leave a comment

Graph the gamma function in C#


[gamma function]

The post Calculate the gamma function in C# uses numerical integration to calculate the gamma function. (See my book Essential Algorithms, Second Edition for information on numerical integration.)

That method works fairly well for calculating Γ(x) where x ≤ 1. In particular, its values match the factorial fairly well for some rather large values of x.

Unfortunately that method is not very accurate for values of x ≤ 0.5. For smaller values of x, it’s better to use a different method. This post shows such a method and graphs it. The graph uses some useful tools and also shows one way to draw discontinuous functions.

The Lanczos Approximation

The Lanczos approximation gives one way to approximate the gamma function. Follow that link for a description of the approximation and a basic Python implementation. For a C# translation, see this Rosetta Code entry.

This example uses the following Rosetta Code function.

private static int g = 7;
private static double[] p =
{
    0.99999999999980993,
    676.5203681218851,
    -1259.1392167224028,
    771.32342877765313,
    -176.61502916214059,
    12.507343278686905,
    -0.13857109526572012,
    9.9843695780195716e-6,
    1.5056327351493116e-7
};
private double MyGammaDouble(double z)
{
    if (z < 0.5)
        return Math.PI / (Math.Sin(Math.PI * z) * MyGammaDouble(1 - z));
    z -= 1;
    double x = p[0];
    for (var i = 1; i < g + 2; i++)
        x += p[i] / (z + i);
    double t = z + g + 0.5;
    return Math.Sqrt(2 * Math.PI) * (Math.Pow(t, z + 0.5)) * Math.Exp(-t) * x;
}

This code simply follows the approximation as described on Wikipedia.

Graphing Tools

One of the biggest issues with drawing graphics is that you sometimes want to draw something in device coordinates (pixels) at a position specified in drawing coordinates. For example, you might want to draw a circle with a radius of 3 pixels at a specific point (x, y) on the graph. You need to map the point (x, y) onto the screen, but then you can’t use the same mapping to draw the circle or it may come out stretched.

To make it easier to perform that kind if drawing operation, this example provides several useful methods. The basic approach is to draw everything in pixels. The program then uses a transformation matrix to figure out where things should be drawn.

Making the Transformation

The following MakeTransforms method creates a transformation matrix to map world (drawing) coordinates to device coordinates.

// Make transformation matrices to map between
// world coordinates to device coordinates.
public static void MakeTransforms(this Graphics gr,
    float wxmin, float wymin, float wxmax, float wymax,
    float dxmin, float dymin, float dxmax, float dymax,
    bool apply_transform,
    out Matrix transform, out Matrix inverse)
{
    RectangleF world_rect = new RectangleF(
        wxmin, wymin, wxmax - wxmin, wymax - wymin);
    PointF[] device_points =
    {
        new PointF(dxmin, dymax),
        new PointF(dxmax, dymax),
        new PointF(dxmin, dymin),
    };

    transform = new Matrix(world_rect, device_points);
    inverse = transform.Clone();
    inverse.Invert();

    if (apply_transform) gr.Transform = transform;
}

This Graphics extension method first creates a rectangle to represent the world coordinates. It then makes an array of three points that indicate where the rectangle’s upper left, upper right, and lower left corners should be mapped in device coordinates.

The method assumes that you are drawing a graph, so it inverts the Y coordinates. For example, the world coordinate rectangle’s upper left corner is at (xmin, ymin). If you were drawing that rectangle directly on the screen, that would be its upper left corner. However, in a graph the Y coordinates normally increase upward, so in world coordinate space (xmin, ymin) is the lower left corner of the coordinate area. To map that location (which the RectangleF uses as its upper left corner) to the correct location, we map it to (dxmin, dymax), the bottom left corner of the drawing area. (Yes, this is all very confusing and counterintuitive. That’s the price we pay for having pixel coordinates increase downward instead of upward.)

Having define the world coordinate rectangle and the device coordinates where it should be mapped, the code uses the rectangle and point array to define a transformation matrix. The method copies the matrix and inverts the copy, so we can map back from device to world coordinates if we like. (Although this example does not do that.)

If the method’s apply_transform parameter is true, the method sets the Graphics object’s Transform property to the transformation matrix. (This example does not do that.)

Transforming Points

If you have a transformation matrix, you can use it to transform points from one coordinate system to another. In this example, that lets you transform points from world to device coordinates.

Unfortunately the Matrix class’s TransformPoints method transforms an array if points, but this example often needs to transform one point at a time. To make that easier, the program defines the following Matrix extension method.

public static PointF TransformPoint(this Matrix transform, PointF point)
{
    PointF[] points = { point };
    transform.TransformPoints(points);
    return points[0];
}

This method simply creates an array to hold the single point passed in as a parameter. It uses the Matrix object’s TransformPoints method to transform the array and returns the result.

Drawing Tick Marks

If you apply a transformation to a Graphics object, then when you call that object’s drawing methods, the result is appropriately transformed. Things become more complicated when you want to size the object in device coordinates.

For example, suppose you want to draw five pixel long tick marks along the X and Y axes. You need to position them in world coordinates, but make them five pixels long in device coordinates.

The following WDrawTick method does that.

public static void WDrawTick(
    this Graphics gr, Matrix transform,
    Pen pen, PointF wp, float ddx, float ddy)
{
    wp = transform.TransformPoint(wp);
    gr.DrawLine(pen,
        wp.X - ddx, wp.Y - ddy,
        wp.X + ddx, wp.Y + ddy);
}

This method transforms the point wp (which stands for “world point”) into device coordinates. It then draws from that point offset by the distances (-ddx, -ddy) to that point offset by (ddx, ddy). For example, suppose you want to draw a vertical tick mark that is 6 pixels tall at the point (50, 100) in device coordinates. Then you would set ddx = 0 and ddy = 3, so the method would draw from (50 – 0, 100 – 3) to point (50 + 0, 100 + 3).

Plotting Points

The following WPlotPoint method follows a similar pattern to the one used by WDrawTick.

public static void WPlotPoint(
    this Graphics gr, Matrix transform,
    Brush brush, Pen pen, 
    PointF point, float drx, float dry)
{
    point = transform.TransformPoint(point);
    RectangleF rect = new RectangleF(
        point.X - drx, point.Y - dry,
        2 * drx, 2 * dry);
    if (brush != null) gr.FillEllipse(brush, rect);
    if (pen != null) gr.DrawEllipse(pen, rect);
}

This method first transforms the point wp into device coordinates. It then creates a rectangle centered at that point with half width (X radius) drx and half height (Y radius) dry. If the method’s brush parameter is not null, the method fills the ellipse define by the rectangle. Similarly if the method’s pen parameter is not null, the method draws the ellipse.

Drawing Text

The following WDrawString method uses the same pattern.

public static void WDrawString(
    this Graphics gr, Matrix transform,
    string text, Font font, Brush brush,
    PointF point, float ddx, float ddy, StringFormat sf)
{
    point = transform.TransformPoint(point);
    point.X += ddx;
    point.Y += ddy;
    gr.DrawString(text, font, brush, point, sf);
}

This method transforms the point where it should draw the text into device coordinates. It then draws the text at that point. The StringFormat parameter lets you decide how the text is aligned (left, right, above, below, or centered) with respect to the point.

Drawing the Curve

The basic idea is simple: make x iterate over values in some desired range, use the function to calculate the corresponding Y values, and draw lines connecting the points. There are two complicating issues here.

First, we calculate the X and Y values in world coordinates but then draw them in device coordinates. By now you can probably figure out how to do that.

Second, this particular function has some discontinuities where its value is undefined and where Y values jump abruptly from negative to positive infinity or vice versa. (See the picture at the top of the post.) We don’t really want the graph to draw a vertical line connecting those two points.

The following PlotCurve method draws a curve while handling those issues.

private void PlotCurve(Graphics gr, Matrix transform,
    Pen pen, Func<double, double> function,
    double xmin, double xmax, double dx)
{
    const double ymin = -10000;
    const double ymax = 10000;

    List<PointF> points = new List<PointF>();
    double last_y = function(xmin);
    for (double x = xmin; x <= xmax; x += dx)
    {
        // Calculate y.
        double y = function(x);
        if (y < ymin) y = ymin;
        if (y > ymax) y = ymax;

        // If y changed by too much, draw whatever we have.
        if (Math.Abs(y - last_y) > 1000)
        {
            if (points.Count > 1)
            {
                gr.WDrawLines(transform, pen, points.ToArray());
            }
            points.Clear();
        }
        points.Add(new PointF((float)x, (float)y));
        last_y = y;
    }

    // Draw any remaining points.
    if (points.Count > 1)
    {
        gr.WDrawLines(transform, pen, points.ToArray());
    }
}

The method creates a points list to hold points. It calculates the function’s Y coordinate at the first X coordinate and saves it in the variable last_y. The method then makes variable x loop from xmin to xmax.

For each x value, the code calculates the corresponding function value and makes sure it lies within a reasonable range, in this case between -10,000 and 10,000.

The code then compares the new y value to the last saved value stored in last_y. If the value has changed by more than 1000, then the function is basically vertical. That means the function has entered or crossed one of its discontinuities.

In that case, the code calls the WDrawLines extension method to draw any points that are currently in the points list and clears the list.

Next, whether it drew the previous points or not, the code adds the new Y value to the points list.

After it has finished looping through all of the X values, the code checks the points list one last time to see if it contains at least two points. If it does, the method draws it.

The following code shows the WDrawLines extension method.

public static void WDrawLines(
    this Graphics gr, Matrix transform,
    Pen pen, PointF[] wpoints)
{
    PointF[] dpoints = (PointF[])wpoints.Clone();
    transform.TransformPoints(dpoints);
    gr.DrawLines(pen, dpoints);
}

Like the drawing methods described earlier, this method transforms its data from world to device coordinates and then uses the result to draw. This code creates a copy of the wpoints array so it doesn’t mess up the original array. (That wouldn’t hurt this example because the array passed into this method isn’t used again later, but it’s a good practice.) The code then uses the transformation matrix to transform the points into device coordinates and draws the points.

Conclusion

The main program performs a lot of graph drawing stuff. It draws the axes, tick marks, and tick mark labels. It uses the numerical integration method described by the previous example to draw the gamma function in red. It then uses the Lanczos approximation to draw the curve again in green.

If you compare the picture at the top of the post to the following picture at Wikipedia, you can see that the green curve looks pretty good. Click the image below to go to the gamma function’s Wikipedia entry.



By Alessio Damato – own work  This W3C-unspecified plot was created with Gnuplot., CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=365942

The Γ(x) = (x – 1)! for positive integers, but it is also defined for the real numbers that are not non-positive integers (integers that are negative or zero). In fact, the gamma function is even defined for all complex numbers except for the non-positive integers. The following picture shows a three-dimensional plot of the absolute value of the gamma function for complex values.



By Geek3 – Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=5156881

Download the example to experiment with it and to see additional details.


Download Example   Follow me on Twitter   RSS feed   Donate




Posted in algorithms, graphics, mathematics | Tagged , , , , , , , , | Leave a comment

Calculate the gamma function in C#


[gamma function]

The gamma function, represented by the capital Greek letter gamma Γ, was derived by Daniel Bernoulli for complex numbers with a positive real part. The interesting thing about the gamma function is that &Gamma(n + 1) = n! for integers n greater than zero. The function is continuous for positive integers, so it defines interpolated factorial values for non-integers. For example, 3! = &Gamma(3 + 1) = 6 and 4! = &Gamma(4 + 1) = 12. If you want to know what 3.5! would be, you can calculate &Gamma(3.5 + 1) ≈ 11.631728396567.

Bernoulli defined the gamma function using the following improper integral.


[gamma function]

Calculating the Gamma Function

Bernoulli’s integral doesn’t have a closed-form solution, but the following function approximates its value.

// Integrate: x^(z-1)*e^(-x) dx from 0 to infinity.
private double Gamma(double z, double dx, int num_slices)
{
    double result = 0;
    double x = 0;
    for (int i = 0; i < num_slices; i++)
    {
        double new_term = Math.Pow(x, z - 1) * Math.Exp(-x);
        if (double.IsNaN(new_term)) break;
        if (double.IsInfinity(new_term)) break;
        result += new_term;
        x += dx;
    }
    return result * dx;
}

[Essential Algorithms]

This function basically adds up num_slices slices of curve that are dx thick. (This is covered in my book Essential Algorithms, Second Edition.) To get a good result, use a small value for dt and a large value for num_slices. If you use too many slices, however, rounding errors will add up so you may not improve the resulting accuracy.

The method first initializes result to zero. It then loops through the slices, calculating the height of the function for each slice. When x is big enough, a slice’s height may turn out to be not a number (NaN) or infinity, at least as far as the double data type is concerned. In that case, the code breaks out of its loop and stops considering new slices because further values will not be helpful.

After is has added up all of the slices, the code multiplies the total height by dx to convert the heights into areas and returns the result.

There are other ways to approximate the gamma function. Believe it or not, this one seemed simplest. See Wikipedia for details.

Displaying Gamma Function Values

The example program uses the following code to display factorial values and their corresponding gamma function values.

private void btnCalculate_Click(object sender, EventArgs e)
{
    lvwValues.Items.Clear();
    Cursor = Cursors.WaitCursor;

    int minx = int.Parse(txtMinX.Text);
    int maxx = int.Parse(txtMaxX.Text);
    double dt = double.Parse(txtDt.Text);
    int num_slices = int.Parse(txtNumSlices.Text);
    for (int x = minx; x <= maxx; x++)
    {
        double x_factorial = Factorial(x);
        if (double.IsInfinity(x_factorial)) break;
        double gamma_x = Gamma(x + 1, dt, num_slices);
        double difference = Math.Abs(gamma_x - x_factorial);
        double percent_difference = difference / gamma_x * 100.0;
        lvwValues.AddRow(
            x.ToString("G4"),
            x_factorial.ToString("G4"),
            gamma_x.ToString("G4"),
            difference.ToString("G4"),
            percent_difference.ToString("G4"));
    }
    lvwValues.AutoSizeColumns();
    Cursor = Cursors.Default;
}

This code gets the parameters that you entered on the form and then loops through the desired X values. For each value, the code first calculates the factorial. If that value is too big to fit in a double, the code breaks out of its loop. (Because it won’t be able to calculate larger factorial values, either.)

Next the code calls the gamma function for the value plus one, calculates the difference between the factorial and the gamma function, and calculates the percentage difference. It then uses the AddRow extension method (described shortly) to add the values to the program’s ListView control. You can see from the picture at the top of the post that the error is less than one with the given parameters for values up to 15! = Γ(15 + 1).

ListView Extensions

This example uses two useful ListView control extensions. The following AddRow extension adds an item and subitems to create a row in the ListView control.

// Add an item and subitems to a ListView.
public static void AddRow(this ListView lvw,
    string item, params string[] subitems)
{
    ListViewItem list_view_item = lvw.Items.Add(item);
    foreach (string subitem in subitems)
        list_view_item.SubItems.Add(subitem);
}

The AddRow extension uses its first parameter to create a new ListView item. It then loops through its other parameters and adds them as subitems.

The following code automatically sizes a ListView control’s column widths.

// Autosize all columns.
public static void AutoSizeColumns(this ListView lvw,
    params string[] values)
{
    foreach (ColumnHeader column in lvw.Columns)
        column.Width = -2;
}

This code simply loops through the ListView columns and sets their widths to -2. That tells each column to set its width so it is big enough to hold its values.

Conclusion

The gamma function gives you a new way to calculate factorials, but the obvious way of multiplying integers together is much faster. The real reason why the gamma function is interesting is that it lets you calculate factorials for non-integer values.

The program can calculate values up to 170!. Values larger than that are too big for the double data type.

If you look at the picture at the top of the post, you’ll see that the differences between the factorial and gamma function are quite small. As the values grow larger, so do the errors. The following picture shows the results between 125! and 141!.


[gamma function]

Notice that the first errors are large but still a small percentage of the values. For example, the error in calculating 125! is more than 1×10196, but it’s less than 0.0000000000068% of the value 125!.

For larger values, the percent error quickly grows much larger. At those larger values, rounding errors dominate so you can’t reduce the errors too much if you use the double data type.

Download the example to experiment with it. If you like, try implementing some of the other ways for calculating the gamma function and see which give the best results.


Download Example   Follow me on Twitter   RSS feed   Donate




Posted in algorithms, mathematics | Tagged , , , , , , , | 1 Comment

Find rectangles defined by a side and aspect ratio in C#

[find rectangles]

You probably haven’t needed to make your program draw rectangles that are defined by specifying one of its sides. I haven’t either, but I want this for another program that I plan to write.

Of course a single side isn’t quite enough to completely determine a rectangle. To use this example, you also need to specify the rectangle’s desired aspect ratio. (The width / height ratio.)

This program performs several tasks such as letting you draw a line segment and reading the aspect ratio either as a floating point value or in a width:height format. Those details aren’t terribly interesting, however, so I won’t cover them here. Download the example to see then in all their glory. Here I’m just going to explain how to find rectangles from the edge segment and aspect ratio.

There are two main tasks that the program must perform to find rectangles defined by an edge and an aspect ratio. First, it must find edges that make right angles with the original segment. Second, it must make those edges the correct lengths to give the desired aspect ratio.

Perpendicular Vectors

[example]

I’ve covered this before (and probably will again), but it’s worth repeating. If you have a vector <vx, vy>, then the vectors <vx, -vy> and <-vx, vy> are perpendicular to, and have the same length as, the original vector.

In the picture on the right, the red vector is the original. Note that I’ve chosen a vector where vx and vy are both positive to reduce confusion. (Remember that Y coordinates increase downward.) This all works if vx or dy is negative, but it’s a bit more confusing.

If you look at the picture, you can see how the blue and green vectors are defined. Note that the blue vector <vx, -vy> points to the right and the green vector <-vx, vy> points to the left as you look down the length of the red original vector.

Setting Edge Lengths

After we find a vector perpendicular to the given edge, we need to figure out how far to go in that direction to define the two adjacent sides. We can use the aspect ratio to do that.

Suppose the given edge has length M and let N represent the length of the adjacent edges. Then we know that either M / N = aspect_ratio or N / M = aspect_ratio, depending on whether you consider the given edge as along the rectangle’s length or height. We can solve those equations for N to give two possible values for the length of the adjacent edges.

N1 = M / aspect_ratio
N2 = M * aspect_ratio

Now suppose the given edge has vertices p1 and p2, and the rectangle’s remaining vertices are p3 and p4. We can use the perpendicular vectors to find points p3 and p4.

For example, consider the first solution for N1. If you take a vector of length M and divide its X and Y components by the aspect ratio, you get a new vector pointing in the same direction but with length N1 as desired. Similarly you can multiply the vector’s components by the aspect ratio to get a vector pointing in the same direction but with length N2.

[example]

Now add the scaled vector’s components to the coordinates of point p1 to get point D as shown in the picture on the right. Similarly add the scaled vector to point B to get point C. That gives you the four points that define one of the rectangles.

Repeat this process for the two possible values for N1 and N2 and for the two perpendicular directions <vx, -vy> and <-vx, vy> to get a total of four rectangles defined by the original side and the aspect ratio.

FindRectangles

The following FindRectangles method uses the technique described in the preceding section to find rectangles defined by a side and aspect ratio.

private List<PointF[]> FindRectangles(float aspect_ratio,
    PointF p1, PointF p2)
{
    // Get the vector p1 --> p2.
    float vx = p2.X - p1.X;
    float vy = p2.Y - p1.Y;
    if (Math.Sqrt(vx * vx + vy * vy) < 0.1) return null;

    PointF p3, p4;
    List<PointF[]> result = new List<PointF[]>();
    float perp_x, perp_y;

    // Make rectangle 1.
    perp_x = vy * aspect_ratio;
    perp_y = -vx * aspect_ratio;
    p3 = new PointF(p2.X + perp_x, p2.Y + perp_y);
    p4 = new PointF(p1.X + perp_x, p1.Y + perp_y);
    result.Add(new PointF[] { p1, p2, p3, p4 });

    // Make rectangle 2.
    p3 = new PointF(p2.X - perp_x, p2.Y - perp_y);
    p4 = new PointF(p1.X - perp_x, p1.Y - perp_y);
    result.Add(new PointF[] { p1, p2, p3, p4 });

    // Make rectangle 3.
    perp_x = vy / aspect_ratio;
    perp_y = -vx / aspect_ratio;
    p3 = new PointF(p2.X + perp_x, p2.Y + perp_y);
    p4 = new PointF(p1.X + perp_x, p1.Y + perp_y);
    result.Add(new PointF[] { p1, p2, p3, p4 });

    // Make rectangle 4.
    p3 = new PointF(p2.X - perp_x, p2.Y - perp_y);
    p4 = new PointF(p1.X - perp_x, p1.Y - perp_y);
    result.Add(new PointF[] { p1, p2, p3, p4 });

    return result;
}

The method takes as parameters the aspect ratio and two points that define the initial edge. It returns a list of arrays of points. Each array of points contains the four points that define a rectangle.

The code starts by finding the vector between the two known points.

To make the first rectangle, it switches the vx and vy components to get a perpendicular vector and multiples the components by the aspect ratio to get a vector with the necessary length N2. It then adds the vector’s components to the initial points p1 and p2 to find the rectangle’s remaining two points p3 and p4.

The method adds the points to an array and adds the array to the result list.

The code then repeats those steps to generate the other three rectangles and add them to the result list. When it has defined all three rectangles, the method returns the list.

Drawing the Rectangles

The last part of the example that I’m going to describe is the code that draws the rectangles. The program uses the following code to store various drawing values.

private PointF Point1, Point2;
private List<PointF[]> Rectangles = new List();

private Brush[] RectBrushes =
{
    new SolidBrush(Color.FromArgb(128, Color.Red)),
    new SolidBrush(Color.FromArgb(128, Color.Red)),
    new SolidBrush(Color.FromArgb(128, Color.Blue)),
    new SolidBrush(Color.FromArgb(128, Color.Blue)),
};
private Pen[] RectPens =
{
    Pens.Red,
    Pens.Red,
    Pens.Blue,
    Pens.Blue,
};

The values Point1 and Point2 are the endpoints of the initial edge that you draw. The program uses the FindRectangles method to store the found rectangles in the Rectangles list. The RectBrushes and RectPens arrays hold the brushes and pens that the program uses to draw the rectangles.

The following Paint event handler does the drawing.

private void picCanvas_Paint(object sender, PaintEventArgs e)
{
    e.Graphics.Clear(picCanvas.BackColor);
    e.Graphics.SmoothingMode = SmoothingMode.AntiAlias;
    
    for (int i = 0; i < Rectangles.Count; i++)
    {
        e.Graphics.FillPolygon(RectBrushes[i], Rectangles[i]);
        e.Graphics.DrawPolygon(RectPens[i], Rectangles[i]);
    }

    if (Point1 != Point2)
    {
        e.Graphics.DrawLine(Pens.Black, Point1, Point2);
    }
}

After some setup, the code loops through the rectangles in the Rectangles list, filling and then outlining each. If you look at the RectBrushes array, you’ll see that the program first draws two translucent red rectangles and then draws two translucent blue rectangles. if you look at the picture at the top of this post, you’ll see that the two first rectangles come out purple (blue on top of red). The third and fourth rectangles are blue and they cover the first two rectangles.

Conclusion

As I said, you may never need to find rectangles defined by an edge and an aspect ratio. In my next post I’ll use this technique to find a special kind of rectangle that is related to an interesting mathematical problem, the solution to which demonstrates the most understandable use of topology that I’ve seen.


Download Example   Follow me on Twitter   RSS feed   Donate




Posted in drawing, geometry, graphics | Tagged , , , , , , , , , | 3 Comments

Zip files with the ZipFile class in C#

[example]

The static ZipFile class is relatively new (.NET Framework 4.5 and later), so this example uses Visual Studio 2019.

The following section explains how to load the ZipFile class in a Visual Studio 2019 project. The rest of this post explains the example program.

Using ZipFile

Create a new Windows Forms .NET Framework project. If you use .NET Core, some of this will be different. I’m not going to cover that, but you can probably figure it out.

The ZipFile class is stored in the System.IO.Compression namespace. Rather than just adding a reference to that as you would in older versions of Visual Studio, the currently approved way to add the necessary library to your project is to use the NuGet package manager. Fortunately adding NuGet packages is fairly easy.

In the Properties window, right-click Dependencies and select Manage NuGet Packages. Select the Browse tab and search for Sytem.IO.Compression.ZipFile. Click on that entry and click the Install button on the right. The Preview Changes dialog lists the package and any dependencies that it requires. Look over the list, wish you could do without all of those dependencies, and click OK. If you get a license dialog, click I Accept. After the installation finishes, close the NuGet Package Manager.

To make using the ZipFile class easier, add the directive using System.IO.Compression; to the top of the form’s code.

After you’ve added the package that contains ZipFile, you’re ready to build the user interface.

User Interface

Take a look at the picture at the top of the post. The large ListBox holds the files that you want to zip. Click the + button or use the File menu’s Add Files command to display a file open dialog that lets you select files to add to the list. If you select one or more files and click Open, the program adds the files to the list.

To remove one or more files from the list, select them and then click the X button or the File menu’s Remove Files command.

After you have selected the files that you want to zip, select the File menu’s Save Zip File command. In the file selection dialog that appears, enter or select the Zip file that you want to create and click Save.

Removing Files

The program contains some code to make the user interface work better. For example, it disables the X button and the File menu’s Remove Files command if there are not files selected in the list. Download the example to see that code.

I want to show one piece of user interface code because it highlights an important issue when removing items from a collection. When you click the X button or select the File menu’s Remove Files button, the following method executes.

private void RemoveFiles()
{
    // Remove the first selected item by index until none are selected.
    while (lstFiles.SelectedIndices.Count > 0)
    {
        lstFiles.Items.RemoveAt(lstFiles.SelectedIndices[0]);
    }

    // Remove the first selected item as an object until none are selected.
    //while (lstFiles.SelectedItems.Count > 0)
    //{
    //    lstFiles.Items.Remove(lstFiles.SelectedItems[0]);
    //}

    // Remove the (first) selected item until none are selected.
    //while (lstFiles.SelectedItem != null)
    //{
    //    lstFiles.Items.Remove(lstFiles.SelectedItem);
    //}

    // Remove the selected items by index in reverse order.
    //for (int i = lstFiles.SelectedIndices.Count - 1; i >= 0; i--)
    //{
    //    lstFiles.Items.RemoveAt(i);
    //}
    EnableMenuItems();
}

As long as the list has selected files, the program removes the first of them. When it removes the first selected file file, the indices of any files that come after it in the list decrease by 1. The SelectedIndices list updates so it holds the new indices of the remaining items, and the process can continue. The loop continues removing the first selected item from the list until there are no more selected items.

Another approach that you might try would be to make a variable, say i, loop through the indices of the selected items and remove them from the list. Unfortunately when you do that, the indices update and the variable i increments so it skips entries.

For example, the first time through the loop, the code removes the first selected index. Now the item that was second becomes the first (with index 0), but i increments to 1 so it is past the new first item.

You also cannot use a foreach loop to iterate through the SelectedItems or SelectedIndices list and remove the items in those lists. Whenever you remove an item, that modifies the lists and that messes up the foreach loop.

The code shows a few other strategies that work commented out. If you study them, you should be able to figure out how they work.

Zipping Files

The easiest way to zip files is to put them in the same directory and then use the ZipFile class’s CreateFromDirectory method to zip the directory. I didn’t take that approach because I wanted to show how to add specific files to a zip file.

When you add files to a zip archive, the file names within the archive are relative. That way when you unzip a directory hierarchy, the files go where they belong.

However, in this example you select individual files, so there isn’t a hierarchy. That’s fine unless you happen to select two files that have the same name because the zip archive cannot have two entries with the same name.

To solve that problem, the example appends a number to files with the same names. For example, if you select three files named myfile.txt, then inside the zip archive they are named myfile.txt, myfile(1).txt, and myfile(2).txt.

Without further ado, here’s the code. When you select the File menu’s Save Zip File command, its event handler calls the following SaveZipFile method.

private void SaveZipFile()
{
    // Get the Zip file name. If the user cancels, do nothing.
    if (sfdZipFile.ShowDialog() != DialogResult.OK) return;

    // If the file exists, delete it.
    // (The dialog made the user agree to that.)
    string zip_file_name = sfdZipFile.FileName;
    if (File.Exists(zip_file_name)) File.Delete(zip_file_name);

    Cursor = Cursors.WaitCursor;

    // Create the Zip archive for update.
    using (ZipArchive archive =
        ZipFile.Open(zip_file_name, ZipArchiveMode.Update))
    {
        // Add the files to the archive.
        for (int i = 0; i < lstFiles.Items.Count; i++)
        {
            // Select the file in the list so the user can see progress.
            lstFiles.SelectedIndex = i;
            lstFiles.Refresh();

            // Get the file's name without the path.
            string filename = lstFiles.Items[i].ToString();
            FileInfo file_info = new FileInfo(filename);

            // Make an entry name for the file.
            string entry_name = file_info.Name;
            for (int version = 1; version < 10000; version++)
            {
                ZipArchiveEntry old_entry = archive.GetEntry(entry_name);
                if (old_entry == null) break;
                entry_name =
                    Path.GetFileNameWithoutExtension(file_info.Name) +
                    $"({version})" + file_info.Extension;
            }

            // Add the file to this entry.
            archive.CreateEntryFromFile(filename, entry_name);
        }
    }

    lstFiles.SelectedIndex = -1;
    Cursor = Cursors.Default;
    MessageBox.Show("Done");
}

This code displays a SaveFileDialog to let you select the soon-to-be-created zip file. If you click Cancel, the method returns.

If you pick a file and click Save, the code gets the name of the file and uses File.Exists to see if the file already exists. If it does, the code uses File.Delete to delete it. (The SaveFileDialog asked if you wanted to replace the existing file.)

Next the code uses the ZipFile class’s Open method to create a new zip archive (which is stored in the file). The program uses the Update parameter so it can look at the file as it is constructed. Notice that the code places the archive in a using block so it is automatically disposed when the block finishes.

The program then loops through the files in the list box and gets each file’s name. It then enters a loop that runs until the code finds a name for this file that is not already in the archive.

Inside the loop, the program uses the archive’s GetEntry method to see if the file’s name is already in the archive. If the name is not already present, the code breaks out of the loop and uses the current name.

If the name is already in use, the code composes a new file name consisting of the original file’s name without the path or extension, followed by the loop number inside parentheses, followed by the original file’s extension. The loop continues until the program finds a file name that is not in use.

Notice how the code uses string interpolation $"({version})" to compose the file’s number. String interpolation is one of the best features added to C# in several versions. Unfortunately it’s a fairly recent development, so you can’t use it with older versions of C# (which load and run faster than the newer versions).

Having found a file name that is not already in use, the code uses the archive’s CreateEntryFromFile method to add the file to the archive using the recently composed name.

The program then continues to loop through the files in the list box, adding them to the archive.

Summary

This example uses the ZipFile class to add a collection of files to a zip archive. Because of the way it lets you select the files from arbitrary locations, it flattens the files’ directory hierarchy and places them all at the same level in the zip file.

Download the example to see additional details and to experiment with the program.


Download Example   Follow me on Twitter   RSS feed   Donate




Posted in files | Tagged , , , , , , , , , | Leave a comment

Simulate a doubling betting strategy in C#

[betting strategy]

Casinos love customers who have a betting strategy because those customers always end up losing. As long as the game’s events are independent of each other (for example, one roll of the dice does not depend on the previous rolls), the odds favor the house, so no strategy will win in the long run.

One well-known betting strategy doubles your bet every time you lose. This example simulates that strategy so you can get an idea of why it doesn’t work.

The Strategy

This example uses roulette, but any game that pays 1-to-1 will work. A European/French style roulette wheel has 37 slots numbered 0 through 36. Number 0 is green. Of the numbers 1 through 36, half are red and half are black. American style wheels have an additional green value, 00.

If you bet black and the ball lands on black, you win. If you bet black and the ball lands on red or green, you lose. Roulette is a pretty complicated game, but for this example that’s about all you need to know. Google “roulette” for more details.

Because you lose if the ball lands red or green (assuming you bet black), the chance of you winning is 18/37, which is slightly less than 1/2. That means your expected value from betting $1 is 18/37 * $2 ≈ $0.973. That should be your tip off right there. Because a $1.00 bet earns you only $0.973, you’re going to lose in the long run.

Note that it doesn’t matter which color you bet on. You can bet on red if you like or even switch colors fron bet to bet, either in a pattern or randomly. (Just stay away from green because the odds aren’t close to 1-to-1. The result is the same but the analysis is harder.)

Okay, so here’s the betting strategy. You start by betting $1 on black. If you lose, you double your bet and place $2 on black. If you lose again, you bet $4 on black. You keep doubling your bet until you win.

Suppose your last bet was $N. Then after this process, you have bet 1 + 2 + 4 + 8 + … + N = 2 * N – 1. You keep your final bet plus you win $N, so you end up with $2 * N, a profit of $1.

Now you start the whole thing over again by betting $1.

The betting strategy’s logic is that you will eventually win, so you always win $1. Simply repeat until you’re rich!

The Catch

That seems reasonable and works except for one small but important detail: you don’t have an infinite amount of money. That means eventually the wheel will produce enough black spins in a row to wipe you out.

For example, suppose you start with $63. Then you can afford to bet $1, $2, $4, $8, $16, and $32 before you run out of money. That means to lose this round, the wheel would need to come up black six times in a row. The odds of that happening are roughly (1/2)^6 = 1/256. That seems like it won’t happen, so you think you’re safe.

In fact, you probably will win this first round. Unfortunately you won’t stop there. You’ll use the betting strategy again and again until you eventually lose. You may win for a while, but eventually the odds will catch up to you.

Using the Program

To use the program, enter your starting amount of money in the Current Bank text box. Enter the total amount that you would like to win in the Stop At box. Then click Go.

As it simulates the betting strategy, the program displays your maximum bank, the number of rounds it has played, the current bet, whether you won or lost the previous round, and your largest bet.

The program simulates the betting strategy until your bank reaches the stopping amount or you no longer have enough money to cover your next bet. (Another condition that you might include would be a maximum number of rounds. You probably can’t run 10,000 rounds at a real roulette wheel before you collapse from exhaustion.)

If you pick specific values, for example a $100 initial bank and a stopping amount of $150, you’ll win some times and lose others. If you perform many trials and keep track, you’ll find that you lose overall.

If you enter very large values, you can test the program to see what happens if you have an almost unlimited amount of money. You can play for quite a long time, but eventually you’ll lose.

The Code

When you click the Go button, the program executes the following code.

private int Round;
private decimal MaxBet, CurrentBank, MaxBank, LastBet;
private bool LastWon;
private Random Rand = new Random();
private int StopAt = 0;

private void btnGo_Click(object sender, EventArgs e)
{
    if (tmrBet.Enabled)
        StopSimulation();
    else
        StartSimulation();
}

private void StopSimulation()
{
    tmrBet.Enabled = false;
    tmrBet.Tag = "Go";
}

private void StartSimulation()
{
    tmrBet.Tag = "Stop";
    Round = 0;
    MaxBet = 0;
    StopAt = int.Parse(txtStopAt.Text);
    CurrentBank = decimal.Parse(txtCurrentBank.Text);
    MaxBank = CurrentBank;
    LastBet = 0;
    LastWon = true;
    tmrBet.Enabled = true;
}

This code begins with variables to track various values. The button’s Click event handler checks the program’s tmrBet timer. If the timer is enabled, then the simulation is currently running. In that case the code calls the StopSimulation method.

If the timer is disabled, then you are trying to start a new run so the code calls the SgtartSimulation method.

The StopSimulation method disables the timer and sets the button’s caption to Go so it is ready to start a new run.

The StartSimulation method sets the button’s caption to Stop, resets the program’s counters, and enables the timer.

The following code shows the timer’s Tick event handler taht executes when the timer fires.

private void tmrBet_Tick(object sender, EventArgs e)
{
    decimal current_bet;
    if (LastWon) current_bet = 1;
    else current_bet = LastBet * 2;

    Round++;
    txtRound.Text = Round.ToString();

    txtCurrentBet.Text = current_bet.ToString();

    if (StopAt < CurrentBank)
    {
        StopSimulation();
        return;
    }

    if (current_bet > CurrentBank)
    {
        StopSimulation();
        return;
    }

    LastBet = current_bet;
    if (MaxBet < current_bet)
    {
        MaxBet = current_bet;
        txtMaxBet.Text = MaxBet.ToString();
    }

    int spin = Rand.Next(37);
    LastWon = ((spin != 0) && (spin % 2 == 0));
    if (LastWon)
    {
        CurrentBank += current_bet;
        txtWinLoss.Text = "WIN";
    }
    else
    {
        CurrentBank -= current_bet;
        txtWinLoss.Text = "LOSS";
    }

    txtCurrentBank.Text = CurrentBank.ToString();
    if (MaxBank < CurrentBank)
    {
        MaxBank = CurrentBank;
        txtMaxBank.Text = MaxBank.ToString();
    }

    if (CurrentBank < 1) StopSimulation();
    tmrBet.Enabled = false;
}

This code simulates the betting strategy. It’s fairly straightforward.

The code first checks LastWon to see if the previous bet was won. If the last bet was one, the code sets the next bet to $1. If the previous bet was lost, then the code sets the next bet to twice the previous bet.

Next the code increments and displays the round number, and displays the current bet. It then checks the current bank for stopping conditions. If the bank has reached the desired stopping amount, the code calls the StopSimulation method and returns. If the bank is smaller than the next required bet, the code also calls the StopSimulation method and returns. The code saves the current bet in variable LastBet and updates MaxBet if necessary.

Next the program picks a random number between 0 and 36. If the number is greater than 0 (the green number) and even (the black numbers are even), then we have won and the program adds the current bet to the bank. If the number is 0 (green) or odd (red), the code subtracts the current bet from the bank.

The event handler updates MaxBank if appropriate. Finally, if the current bank is less than 1, the code calls the StopSimulation method.

Conclusion

The example demonstrates a couple of useful techniques such as using a timer to run a simulation and using the same button to start and stop the simulation.

Download the example and experiment with it. It often produces some winnings before eventually losing everything. If only you could know when to stop, you could come out ahead. But if you knew when to stop, you could dominate the stock market, too.

Before I leave this alone, I’ll mention one other betting strategy that is even worse than the doubling strategy. The odds of the ball landing red five times in a row are roughly 1/(2^5) = 1/32. The strategy is to wait until the ball has landed on red four times in a row and then bet on black. The reasoning is that it is unlikely to land on red again.

This is probably the most common mistake people make in statistical reasoning. Because every spin of the wheel is independent of the previous spins, there is a roughly 1/2 chance that the next spin will be black, not 1/32. That should be obvious, but I’ve talked with people that defended this strategy with great enthusiasm. These days there are a lot of people who believe far weirder things like the Earth is flat, so this shouldn’t be a huge surprise. If you don’t see why the strategy won’t work, trying modifying this example so it simulates that approach and see what happens.


Download Example   Follow me on Twitter   RSS feed   Donate




Posted in algorithms, mathematics | Tagged , , , , , , , , , | 1 Comment

Book Drawing: C++ All-in-One For Dummies, 4th Edition

John Mueller has kindly donated five copies of his book C++ All-in-One For Dummies, 4th Edition for me to give away. The only catch is that you should post a review on Amazon when you’ve had a chance to look the book over.

This is a massive 912 page volume, so I wouldn’t expect anyone to read it cover-to-cover immediately. This is, however, a good chance for you to see what C++ is like. You’ll learn that it’s a lot like C#, but with a few differences, particularly when it comes to pointers.

The book uses Code::Blocks (www.codeblocks.org) as its programming environment and includes instructions for downloading and installing it. (It’s can’t be too hard because I managed it! 😉)

To sign up to win a free book, simply email me at RodStephens@CSharpHelper.com. Include “C++ book drawing” in the subject line and your name and address in the body.

From previous experience with book giveaways, I can say that your odds of winning are fairly good. Normally I only get a dozen or so requests for these books, and this time it’s not a C# book so interest may be even lower. The gist of it is, if you’re interested in learning a bit about C++ programming, this is your big chance to win a free book to get you started! Megabucks would be better, but this is more likely.

The entry deadline is next Wednesday February 10, 2021. I’ll announce the winners (without email addresses) shortly after that.

Good luck!

Disclaimer: Note that I tech edited this edition, despite what it says at the back of the book.


Follow me on Twitter   RSS feed   Donate




Posted in books | Tagged , , , , , , | Leave a comment

Easily save and restore CheckedListBox values in the Registry in C#

example

The example Easily save and restore a form’s settings in the Registry in C# shows one way you can save a form’s settings and the values placed in its controls. That example really just demonstrates the basic ideas and shows how to save the values in a few kinds of controls.

This example extends the previous one to save and restore the items that are checked in a CheckedListBox control.

Saving Values

The program’s SaveAllSettings method saves the form’s width, height, and position. It then calls the SaveChildSettings method passing it the form in its parent parameter.

The SaveChildSettings method loops through the parent’s child controls. It uses a switch statement to save the values that are appropriate for each child.

The following code shows the new case for a CheckedListBox control.

case "CheckedListBox":
    CheckedListBox lst = child as CheckedListBox;
    for (int i = 0; i < lst.Items.Count; i++)
    {
        // Default is false.
        if (lst.GetItemChecked(i))
        {
            string item_name = child.Name + "(" + i.ToString() + ")";
            SaveSetting(app_name, item_name, "True");
        }
    }
    break;

This code converts the generic child object into a CheckedListBox and then loops through the indices of the control’s items. For each item, the code uses the list’s GetItemChecked metod to determine whether the item is checked.

If the item is checked, the code creates the item’s name consisting of the control’s name plus the item’s index inside parentheses. For example, the clbMonster control’s item number 4 would have the name “clbMonster(4).” The code then calls the SaveSetting method to save the value “True” for the item. (See the previous example for a description of the SaveSetting method.)

Notice that the code does not save a value for items that are not checked. When it reloads the values, the program assumes that any missing values are false.

After it has saved the child control’s values, the SaveChildSettings method recursively calls itself to process the children of the child (if it has any).

Loading Values

The code that loads values follows a similar pattern to the one used to save values. The LoadAllSettings method loads the form’s dimensions and position, and then calls the LoadChildSettings method, passing it the form as the parent parameter.

The LoadChildSettings method loops through the parent’s child controls and uses a switch statement to save appropriate values. The following code shows the case for the CheckedListBox control.

case "CheckedListBox":
    CheckedListBox lst = child as CheckedListBox;
    for (int i = 0; i < lst.Items.Count; i++)
    {
        // Default is false.
        string item_name = child.Name + "(" + i.ToString() + ")";
        bool is_checked =
            bool.Parse(GetSetting(app_name, item_name, "False").ToString());
        lst.SetItemChecked(i, is_checked);
    }
    break;

This code converts the child control into a CheckedListBox and then loops through its items. For each item, the code composes the item’s name as before. It then calls the GetSetting method to get the item’s value using the default value “False.” The code parses the result as a Boolean value and uses the list’s SetItemChecked method to check or uncheck the item.

Summary

The code shown here only handles CheckedListBox controls. See the previous post for additional details such as how the code handles other control types.

Using similar techniques, you should be able to add other control types if you need them.

This example uses a single form. It’s Load and FormClosing load and save settings, respectively. You should be able to use a similar technique to save and restore settings for multiple forms. Simply use different app names for the different forms. Alternatively you could prepend the form’s name to each of the control names.

You can also use a similar approach to save and restore settings in places other than the Registry. For example, you could save values in a text, XML, JSON, or settings file.

Download the example to experiment with it and to try making these modifications.


Download Example   Follow me on Twitter   RSS feed




Posted in forms, programs, registry | Leave a comment

Use k-means clustering to find clusters of data in C#

[k-means clustering]

I was recently editing John Mueller and Luca Massaron’s book Machine Learning For Dummies, Second Edition. (This link is for the first edition. I’ll try to remember to update it when the new edition is available.) The book is about building machine learning algorithm programs in the Python or R programming languages, but this algorithm is so simple that you can implement it easily without any of the special special libraries that Python and R provide.

The Goal

The goal is to partition data into clusters. For example, suppose you have a database of customers and their video purchases. If you can cluster the datam then you might be able to identify customers with similar interests. If a customer is in a cluster that seems to like the cartoon sci-fi movies like Monsters vs. Aliens, then you might recommend the movie Megamind to that customer.

The idea is to partition N pieces of data into k clusters. To do that, we need to find centers or centroids for the clusters and assign each piece of data to the center that is closest.

Figuring out how to represent qualitative data such as liking a particular movie is tricky. For example, you might use a different dimension for each movie and assign a customer the values 1 through 5 along that dimension.

[k-means clustering]

It’s much easier to see clusters in two-dimensional point data, so that’s what this example considers. The picture on the right shows 400 data points in a two-dimensional plane. It’s easy to see that the points should probably be grouped into three clusters. This example will identify the clusters in cases such as this one and in harder cases where the clusters are not as clearly defined.

The Example

Right-click to create data points.

The Make Points button lets you create many random points all at once. If the points were distributed completely randomly across the program’s picture box, the result wouldn’t have obvious clusters so it would be as interesting. To create non-randomly positioned points, left-click to define seed points. (Those are shown as crosses on the program.) After you have defined some seeds, clicking the Make Points button will make the program generate points that are random but grouped around the seeds.

After you have defined some points, enter the number of clusters you want to create and click the Make Clusters button. The program runs the algorithm, coloring the clusters as it goes.

The label at the bottom of the form shows the current score and the final solution’s score (I’ll say more about that later) and the number of steps it took to find the clusters.

Click the Clear button to remove the points and seeds so you can start over.

K-Means Clustering

The k-means clustering algorithm is one way to find clusters for the points. There are other versions of this algorithm, but because this one is so common, it’s often called simply “the k-means algorithm.” It’s also called Lloyd’s algorithm, named after Stuart Lloyd who first proposed the algorithm at Bell Labs in 1957. There are faster versions of the algorithm, so this version is also sometimes called the “standard” or “naïve” version.

This version of the algorithm is remarkably simple.

  1. To find k clusters, pick k of the data points randomly to be the initial cluster centers.
  2. Repeat:
    1. For each data point P, find the closest cluster center and assign the point to that cluster.
    2. For each cluster, move the center to the centroid of the points assigned to that cluster.
    3. Repeat step 2 until the cluster centers no longer move significantly.

The result you get depends on the initial centers that you pick randomly, so sometimes the algorithm will reach a stable result even though it may not be the best possible solution. For example, the following picture shows two stable solutions for the same points.


[k-means clustering]

To find the best solution, repeat the algorithm several times with different initial cluster centers and pick the one that gives the best result. I’ll talk later about how you might decide which solution is best.

Source Code

The algorithm itself is simple, but there are a lot of details that you need to handle to make the example work nicely. I’m not going to cover them here. Instead I’ll focus on the details that are needed for the algorithm itself. Download the example to see the rest.

PointData

The program uses the following PointData class to store the data points.

public class PointData
{
    public PointF Location;
    public int ClusterNum;
    public PointData(PointF location, int cluster_num)
    {
        Location = location;
        ClusterNum = cluster_num;
    }
    public PointData(float x, float y, int set)
        : this(new PointF(x, y), set)
    {
    }
}

This class defines two key properties: Location and ClusterNum. The Location property gives the data point’s location in two-dimensional space. The ClusterNum property gives the index of the cluster to which the point has been assigned by the k-means clustering algorithm.

MakePoints

The program uses the following MakePoints method to generate random points grouped around the seeds that you have defined.

private void MakePoints()
{
    // Make sure there is at least one seed.
    if (Seeds.Count < 1)
    {
        MessageBox.Show("Please define at least one seed first.");
        return;
    }

    // Clear the Centroids list.
    Centroids.Clear();

    // Make the points.
    Random rand = new Random();
    double max_r = Math.Min(
        picItems.ClientSize.Width,
        picItems.ClientSize.Height) / 6;
    int num_points = int.Parse(txtNumPoints.Text);
    for (int i = 0; i < num_points; i++)
    {
        int seed_num = rand.Next(Seeds.Count);
        double r =
            max_r * rand.NextDouble() +
            max_r * rand.NextDouble();
        double theta = rand.Next(360);
        float x = Seeds[seed_num].X + (float)(r * Math.Cos(theta));
        float y = Seeds[seed_num].Y + (float)(r * Math.Sin(theta));
        Points.Add(new PointData(x, y, 0));
    }

    picItems.Refresh();
}

This method does some setup, such as clearing the current centroid list, and then loops through the desired number of points. For each point, it picks a random seed point. It then generates two values and an angle. It places the new point a distance equal to the sum of the two values away from the chosen point. It uses the cosine and sine of the angle to move the point away in the random direction.

Assigning Points

The code that assigns the points to clusters is the most interesting part of the program. When you click the Make Clusters button, the following code executes.

private void btnMakeClusters_Click(object sender, EventArgs e)
{
    int num_clusters = int.Parse(txtNumClusters.Text);
    if (Points.Count < num_clusters) return;

    // Reset the data.
    // Pick random centroids.
    Centroids = new List<PointF>();
    Points.Randomize();
    for (int i = 0; i < num_clusters; i++)
        Centroids.Add(Points[i].Location);
    foreach (PointData point_data in Points)
        point_data.ClusterNum = 0;

    NumSteps = 0;
    picItems.Refresh();
    lblScore.Text = "";
    Cursor = Cursors.WaitCursor;
    tmrUpdate.Enabled = true;
}

This code creates a new Centroids list to hold the centroids for the clusters. It calls the Randomize extension method to randomize the points. (See the code for details of how that works.) It then uses the first points as the initial centroids for the clusters.

The code then loops through the points and assigns them all initially to cluster number 0. It resets the NumSteps counter and enables the tmrUpdate timer component.

The following code shows the tmrUpdate timer’s Tick event handler.

private void tmrUpdate_Tick(object sender, EventArgs e)
{
    NumSteps++;
    UpdateSolution();
    picItems.Refresh();
}

This code simply increments the step counter, calls UpdateSolution to do the interesting work, and refreshes the display.

The following code shows the UpdateSolution method.

private void UpdateSolution()
{
    // Find new centroids.
    int num_clusters = Centroids.Count;
    PointF[] new_centers = new PointF[num_clusters];
    int[] num_points = new int[num_clusters];
    foreach (PointData point in Points)
    {
        double best_dist =
            Distance(point.Location, Centroids[0]);
        int best_cluster = 0;
        for (int i = 1; i < num_clusters; i++)
        {
            double test_dist =
                Distance(point.Location, Centroids[i]);
            if (test_dist < best_dist)
            {
                best_dist = test_dist;
                best_cluster = i;
            }
        }
        point.ClusterNum = best_cluster;
        new_centers[best_cluster].X += point.Location.X;
        new_centers[best_cluster].Y += point.Location.Y;
        num_points[best_cluster]++;
    }

    // Calculate the new centroids.
    List<PointF> new_centroids = new List<PointF>();
    for (int i = 0; i < num_clusters; i++)
    {
        new_centroids.Add(new PointF(
            new_centers[i].X / num_points[i],
            new_centers[i].Y / num_points[i]));
    }

    // See if the centroids have moved.
    bool centroids_changed = false;
    for (int i = 0; i < num_clusters; i++)
    {
        const float min_change = 0.5f;
        if ((Math.Abs(Centroids[i].X - new_centroids[i].X) > min_change) ||
            (Math.Abs(Centroids[i].Y - new_centroids[i].Y) > min_change))
        {
            centroids_changed = true;
            break;
        }
    }
    if (!centroids_changed)
    {
        tmrUpdate.Enabled = false;
        //System.Media.SystemSounds.Beep.Play();
        lblScore.Text = "Score: " + Score().ToString() +
            ", # Steps: " + NumSteps.ToString();
        Cursor = Cursors.Default;
        return;
    }

    // Update the centroids.
    Centroids = new_centroids;
}

This method starts with some initialization. It then loops through the data points. For each point, it loops through the current cluster centroids to see which is closest to the point. It then assigns that point to the closest cluster.

This loop also adds the point’s X and Y coordinates to the new_centers value for the cluster. After it has assigned all of the points, the program will divide those values by the number of points assigned to each cluster to find the cluster’s new centroid.

After it finishes the first loop, the method loops through the centroids. As I mentioned in the preceding paragraph, it divides the values stored in the new_centroids array by the number of points in each cluster to find the cluster’s new centroid.

Next the program determines whether the new centroids are significantly different from the previous ones. To do that, it loops through the clusters and calculates the X and Y distances between the old and new centroids. If either coordinate has moved by more than 0.5, the program considers the centroid to have moved. (Note that the distances here are pixels, so if the centroids move by less than 0.5 pixels, they are essentially fixed.)

If the centroids have not moved, then the method disables the tmrUpdate timer and displays the results.

Score

I mentioned earlier that you may get different results from different starting centers. To find the best solution, you need to repeat the algorithm several times and pick the best arrangement of clusters. To do that, you need to be able to somehow score the solutions.

The following Score method calculates a score for a solution.

private double Score(List centroids, List points)
{
    double total = 0;
    foreach (PointData point_data in points)
    {
        total += Distance2(point_data.Location,
            centroids[point_data.ClusterNum]);
    }
    return total;
}

This method calculates the total of the squares of the distances between each point and the center of its cluster. It simply loops through the points, calls the Distance2 method (not shown) to calculate the distance between the point and its assigned center, and adds the result to a total.

Summary

The k-means clustering algorithm shown here is fairly simple. It usually produces a result that looks reasonably quickly in only a few steps. Download the example and give it a try. Some of the most interesting runs happen when the randomly selected initial cluster centers are close to each other so they need to move several times before they settle down.

[k-means clustering]

Note that this algorithm does not pick the number of clusters that it should make. In the earlier examples shown here, it’s fairly obvious that the points should be grouped into three clusters, but consider the picture on the right. Here the algorithm used three clusters when it probably should have used two. Basically one of the two obvious clusters is separated into two pieces. If you run the program several times, it will sometimes divide the left cluster and sometimes the right.

In these simple two-dimensional examples, it’s usually easy to guess the correct number of clusters, but it might be much harder for higher-dimensional data. For example, suppose the points were in six-dimensional space. It could be very hard to figure out how many clusters to use.

To solve that problem, you will probably need to come up with a new scoring system that can compare solutions containing different numbers of clusters. You couldn’t just use the simple Score method shown earlier, because it will favor using a separate cluster for each data point.

Download the example and experiment with it. If you find anything interesting, such as a good technique for picking the best number of clusters, post a comment below.


Download Example   Follow me on Twitter   RSS feed   Donate




Posted in algorithms | Tagged , , , , , , , , | Leave a comment

Automatically set camera distance in WPF and C#

[camera distance]

This example extends the post Build a geodesic sphere with WPF and C# to let you automatically set the camera distance to make the scene fill the viewport.

Making a 3D scene fill the viewport nicely can be tricky. The basic idea is to position the camera the right distance from the scene to give a good result. However, some scenes are asymmetric so the “best” distance will depend on the angle from which you are viewing the scene.

One approach that often gives fairly good results is to use the camera’s field of view and the radius of the scene’s bounding sphere to set the camera distance. To do that, you need to know how the field of view works.

Field of View

[camera distance]

The picture on the right shows how the camera’s field of view affects the drawing. The field of view is the (horizontal) angle that the camera can see. Here width is the width of the viewport and distance is the distance between the camera and the virtual viewing plane. For our purposes, distance is the camera distance, or the distance between the camera and the center of the scene’s objects.

Simple trigonometry gives us:

[camera distance]
If we solve for distance, we get:

[camera distance]
In order to make the scene’s objects fill the viewport, we set the distance so the width/2 is the radius of the scene’s bounding sphere.

C# Code

When you click the program’s Set Size button, the following code executes.

private void btnSetSize_Click(object sender, RoutedEventArgs e)
{
    const double width = 2.5;
    double fov_radians = TheCamera.FieldOfView * Math.PI / 180.0;
    double distance = (width / 2) / Math.Tan(fov_radians / 2);

    const double distance_scale = 1.0;
    CameraR = distance * distance_scale;
    lblDistance.Content = CameraR.ToString("0.0");
    PositionCamera();
}

This example’s scene is a geodesic sphere with some axes. The axes extend to points 1.25 units away from the origin, so I’m giving the bounding sphere for this example a radius of 1.25 or a diameter of 2.5. That means the desired width value should be 2.5.

The camera’s FieldOfView property holds the field of view in degrees but the Math.Tan function needs its input angle in radians, so the program converts the field of view value into radians.

Next the code uses the formula shown earlier to calculate the desired camera distance. It then sets the camera distance to that value multiplied by a scale factor. You can use the scale factor if you want to add or subtract a little distance to make your scene look better. This example looks okay without scaling, so distance_scale is set to 1.0.

The code finishes by displaying the new distance and repositioning the camera to show the new result.

Summary

[WPF 3d]

Automatically setting camera distance can be tricky depending on the objects in your scene. This example calculates a camera distance from the camera’s field of view and a bounding sphere diameter (width). You can add a scale factor to adjust the result if that makes sense for your scene.

For more information on creating three-dimensional scenes with WPF and C#, see my book WPF 3d.

Meanwhile, download the example to see additional details and to experiment with it.


Download Example   Follow me on Twitter   RSS feed   Donate




Posted in 3D graphics, algorithms, graphics, mathematics, wpf | Tagged , , , , , , , , , , , , , , , , | 6 Comments