Title: Extract comments from a C# file in C#
The idea behind this example is to extract comments from a C# code file so you can spellcheck them. This is a fairly simple, manual approach. Some commercial products may be able to spellcheck comments and strings for you.
This process is somewhat complicated by:
- End-of-line comments that start with //
- Multiline comments that come between /* and */
- Strings
All of these can overlap. For example, a string can contain the characters // and that doesn't begin an end-of-line comment. Similarly a multiline comment can contain a quote without starting a string.
The following ExtractComments method reads a C# file and returns a string containing its comments.
// Return a file's comments.
private string ExtractComments(string filename)
{
// Get the file's contents.
string all_text = File.ReadAllText(filename);
// Get rid of \" escape sequences.
all_text = all_text.Replace("\\\"", "");
// Process the file.
string comments = "";
while (all_text.Length > 0)
{
// Find the next string or comment.
int string_pos = all_text.IndexOf("\"");
int end_line_pos = all_text.IndexOf("//");
int multi_line_pos = all_text.IndexOf("/*");
// If there are none of these, we're done.
if ((string_pos < 0) &&
(end_line_pos < 0) &&
(multi_line_pos < 0)) break;
if (string_pos < 0) string_pos = all_text.Length;
if (end_line_pos < 0) end_line_pos = all_text.Length;
if (multi_line_pos < 0) multi_line_pos = all_text.Length;
// See which comes first.
if ((string_pos < end_line_pos) &&
(string_pos < multi_line_pos))
{
// String.
// Find its end.
int end_pos = all_text.IndexOf("\"", string_pos + 1);
// Extract and discard everything up to the string.
if (end_pos < 0)
{
all_text = "";
}
else
{
all_text = all_text.Substring(end_pos + 1);
}
}
else if (end_line_pos < multi_line_pos)
{
// End of line comment.
// Find its end.
int end_pos =
all_text.IndexOf("\r\n", end_line_pos + 2);
// Extract the comment.
if (end_pos < 0)
{
comments +=
all_text.Substring(end_line_pos) + "\r\n";
all_text = "";
}
else
{
comments += all_text.Substring(
end_line_pos, end_pos - end_line_pos) + "\r\n";
all_text = all_text.Substring(end_pos + 2);
}
}
else
{
// Multi-line comment.
// Find its end.
int end_pos = all_text.IndexOf(
"*/", multi_line_pos + 2);
// Extract the comment.
if (end_pos < 0)
{
comments +=
all_text.Substring(multi_line_pos) + "\r\n";
all_text = "";
}
else
{
comments += all_text.Substring(multi_line_pos,
end_pos - multi_line_pos + 2) + "\r\n";
all_text = all_text.Substring(end_pos + 2);
}
}
}
return comments;
}
The method starts by reading the file and removing any \" escape sequences. These are a hassle because the " character complicates quote matching to find strings. The program simply eliminates them.
The program then enters a loop that continues until the file's string is empty. In the loop, the program finds the start of the next quoted string, end-of-line comment, or multiline comment. Depending on which comes next, it reads to the string's closing quote, the end of the line, or the end of the multiline comment. It adds the appropriate text to the comment string and removes it from the file's content string.
Download the example to experiment with it and to see additional details.
|