List unique words in a text file in C#

[unique words]

This example uses regular expressions and LINQ to list the unique words contained in a text file in C#.

When you enter the name of a file and click List Words, the following code executes.

// List the words in the file.
private void btnListWords_Click(object sender, EventArgs e)
{
    // Get the file's text.
    string txt = File.ReadAllText(txtFile.Text);

    // Use regular expressions to replace characters
    // that are not letters or numbers with spaces.
    Regex reg_exp = new Regex("[^a-zA-Z0-9]");
    txt = reg_exp.Replace(txt, " ");

    // Split the text into words.
    string[] words = txt.Split(
        new char[] { ' ' },
        StringSplitOptions.RemoveEmptyEntries);

    // Use LINQ to get the unique words.
    var word_query =
        (from string word in words
         orderby word select word).Distinct();
    
    // Display the result.
    string[] result = word_query.ToArray();
    lstWords.DataSource = result;
    lblSummary.Text = result.Length + " words";
}

The code first uses File.ReadAllText to copy the file’s text into a string.

Next the code uses regular expressions to replace non-letter and non-number characters with spaces. It uses the pattern [^a-zA-Z0-9]. The ^ means “not the following characters.” The a-zA-Z0-9 part means any lowercase or uppercase letter or a digit. The code uses the Regex object’s Replace method to replace characters that match the pattern with a space character.

The code then uses Split to break the text into an array of words, removing any duplicates.

The code uses LINQ to select all of the words from the array and sort them. It uses the Distinct method to remove duplicates.

Finally the code displays the words in a ListBox and displays the number of words in a Label.


Download Example   Follow me on Twitter   RSS feed   Donate




This entry was posted in files, LINQ, regular expressions and tagged , , , , , , , , , , . Bookmark the permalink.

One Response to List unique words in a text file in C#

  1. Pingback: List the unique words in a Microsoft Word file in C# - C# HelperC# Helper

Leave a Reply

Your email address will not be published. Required fields are marked *