List the unique words in a Microsoft Word file in C#

[unique words]

The example List unique words in a text file in C# shows how to list the unique the words in a text file. This example shows how to list the unique words in a Microsoft Word file.

Before you start, add a reference to the Microsoft Word 12.0 Object Library (or whatever version you have installed on your system). Then add the following using statement to make working with the Word namespace easier. The Word = part means you can use Word as an alias for the namespace.

using Word = Microsoft.Office.Interop.Word;

The following code shows how the program gets the words from a Word file.

// Read the text contents of a Word file.
private string GrabWordFileWords(string file_name)
{
    // Get the Word application object.
    Word._Application word_app = new Word.ApplicationClass();

    // Make Word visible (optional).
    word_app.Visible = false;

    // Open the file.
    object filename = file_name;
    object confirm_conversions = false;
    object read_only = true;
    object add_to_recent_files = false;
    object format = 0;
    object missing = System.Reflection.Missing.Value;

    Word._Document word_doc =
        word_app.Documents.Open(ref filename,
            ref confirm_conversions,
            ref read_only, ref add_to_recent_files,
            ref missing, ref missing, ref missing, ref missing,
            ref missing, ref format, ref missing, ref missing,
            ref missing, ref missing, ref missing, ref missing);

    // Return the document's text.
    string result = word_doc.Content.Text;

    // Close the document without prompting.
    object save_changes = false;
    word_doc.Close(ref save_changes, ref missing, ref missing);
    word_app.Quit(ref save_changes, ref missing, ref missing);

    // Return the result.
    return result;
}

The code first creates a Word application server. It sets the server’s Visible property to false so it doesn’t appear, but you can change that if you like.

Next the program opens the desired Word document. It then uses the document’s Content.Text property to get the file’s text.

The method finishes by closing the file and the Word server, and returning the file’s text. The rest of the code is similar to the code used by the previous example to process text files. See that example for details.


Download Example   Follow me on Twitter   RSS feed   Donate




This entry was posted in files, Office, strings, Word and tagged , , , , , , , , , , , , . Bookmark the permalink.

One Response to List the unique words in a Microsoft Word file in C#

  1. Pingback: Examine the unique words in a Microsoft Word file in C# - C# HelperC# Helper

Leave a Reply

Your email address will not be published. Required fields are marked *