Extract comments from a group of files in C#

[extract comments]

I’m finishing up another book (I’ll announce it when it’s ready in a week or so) and I’m working through one of the thornier issues of any programming book: spell-checking code comments. I already have an example to extract comments from a file: Extract comments from a C# file in C#. However, this book’s examples include more than 2,000 C# files. Many of those are automatically generated by Visual Studio, but plenty are not.

This example lets you search for and select files. It then loops through the selected files to extract their comments and writes the comments into a file for inspection.

The program provides several useful features such as the ability to search a directory hierarchy for files. Those pieces are useful, but I’ve covered them in other posts and they’re relatively straightforward, so I won’t describe them here. Download the example and take a look at how those pieces work.

This post focuses on the following code, which executes when you select some files and click Create Document.

// Extract all of the files' comments.
private void btnCreateFile_Click(object sender, EventArgs e)
{
    // Get the file name.
    OutputFile = txtFilename.Text.Trim();
    if (OutputFile.Length == 0)
    {
        MessageBox.Show("Please enter an output file name");
        return;
    }
    Cursor = Cursors.WaitCursor;
    lblNumProcessed.Text = "";
    lblNumProcessed.Visible = true;

    // Loop through the selected files and get their comments.
    int i = 0;
    HashSet<string> comments = new HashSet<string>();
    char[] separators = { '\r', '\n' };
    foreach (string input_file in clbFiles.CheckedItems)
    {
        lblNumProcessed.Text = i.ToString() + " files processed";
        lblNumProcessed.Refresh();
        i++;

        // Get this file's comments.
        string file_comments = ExtractComments(input_file);

        // Split the file's comments by line and
        // add new ones to the HashSet.
        foreach (string comment in
            file_comments.Split(separators,
                StringSplitOptions.RemoveEmptyEntries))
        {
            if (!comments.Contains(comment)) comments.Add(comment);
        }
    }

    // Sort the comments.
    string[] lines = comments.ToArray();
    Array.Sort(lines);

    // Write the comments into the output file.
    File.WriteAllLines(OutputFile, lines);

    Cursor = Cursors.Default;
    lblNumProcessed.Visible = false;
    btnOpen.Enabled = true;
    MessageBox.Show("Saved " + lines.Length + " comments");
}

This code first verifies that you entered an output file name and exits if you didn’t.

The code then creates a HashSet to contain the comments. Many of the examples include tools built in previous examples, and any common code contains the same comments. This example only keeps one copy of each comment. Later, if I find a problem in a comment, I use the program described in the post Find files and replace text in them in C# to fix all copies of the comment.

The code then loops through the files that are selected in the clbFiles CheckedListBox control and calls the ExtractComments method described in the post Extract comments from a C# file in C# to get each file’s comments. The code separates the comments and loops through them, adding them to the HashSet if they are not already present.

After it finishes processing all of the files, the program pulls the comments out of the HashSet into an array and calls Array.Sort to sort the comments. It finishes by writing the comments into the output file.

After I run the program, I open the output file in Microsoft Word and use its highlighting features to loop for misspelled words.

This system still isn’t perfect. For example, variables names that are in the comments look misspelled to Word. The comments alone also don’t guarantee that the variable name NumEmployees shouldn’t be spelled numEmployees, but it does let me catch the most obvious errors.


Download Example   Follow me on Twitter   RSS feed   Donate




About RodStephens

Rod Stephens is a software consultant and author who has written more than 30 books and 250 magazine articles covering C#, Visual Basic, Visual Basic for Applications, Delphi, and Java.
This entry was posted in files, strings, syntax and tagged , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.