Suggest autocomplete words in C#

[autocomplete]

This example shows one way that a program can suggest words for the user.

When I type something on my phone, it displays a list of possible words for autocomplete below the editing area. For example, if I type “gping” it suggests “going” as a possible word that I might be trying to type.

Edit Distance

I don’t know what algorithm the phone uses to suggest words, but this example shows one possible approach. This program uses an “edit distance” algorithm to decide how similar the word you have typed so far is from those in a dictionary. It then suggests the words that are most similar to what you have typed.

To completely understand the program, you need to understand the edit distance algorithm. I wrote an article about this algorithm and you can find it in the DevX article Turn a Parsnip into a Turnip with Edit Distance Algorithms.

You can read the article on DevX so I won’t repeat it here, but the general idea is to build an edit graph that shows the best way to convert one word into another word by inserting and removing letters. The last node in the graph gives the “distance” from the start word to the end word.

When you type into this example’s upper text box, the following event handler executes to display the best matches.

// Find good guesses for this word.
private void txtWord_TextChanged(object sender, EventArgs e)
{
    lstGuesses.Items.Clear();
    string word = txtWord.Text;
    if (word.Length == 0) return;

    // Find the best matches.
    string[] words;
    int[] values;
    FindBestMatches(word, 10, out words, out values);

    // Display the best matches.
    for (int i = 0; i < words.Length; i++)
    {
        lstGuesses.Items.Add(values[i].ToString() +
            '\t' + words[i]);
    }
}

This code clears the program’s ListBox of matches. It then gets the text that you typed and, if that text is blank, returns.

If the text is not blank, the code calls the FindBestMatches method to get arrays containing the best matches and their edit distance values. The results are sorted in increasing order of edit distance value.

The event handler finishes by displaying the edit distance values and the words in the program’s ListBox so you can see them.

All of the hard work is done in the following FindBestMatches method.

// Find the best matches for the typed word.
private void FindBestMatches(string word, int num_matches,
    out string[] words, out int[] values)
{
    // Find words that start with the same letter.
    string start_char = word.Substring(0, 1).ToUpper();
    int start_index = Array.BinarySearch(Words, start_char);
    List<string> match_words = new List<string>();
    List<int> match_values = new List<int>();
    for (int i = start_index + 1; i < Words.Length; i++)
    {
        // Get the next word and make sure it starts
        // with the same letter.
        string test_word = Words[i];
        if (test_word.Substring(0, 1).ToUpper() != start_char)
            break;

        // Consider the next word up to the length
        // of the typed word.
        int max_length = Math.Min(test_word.Length, word.Length);
        string short_word = test_word.Substring(0, max_length);

        // Build the edit graph.
        Node[,] nodes = MakeEditGraph(word, short_word);

        // List the distance.
        int x = nodes.GetUpperBound(0);
        int y = nodes.GetUpperBound(1);
        match_words.Add(test_word);
        match_values.Add(nodes[x, y].distance);
    }

    // Sort the matches by distance, smallest distance first.
    string[] match_words_array = match_words.ToArray();
    int[] match_values_array = match_values.ToArray();
    Array.Sort(match_values_array, match_words_array);

    // Return the desired number of matches.
    int max = Math.Min(num_matches, match_values_array.Length);
    words = new string[max];
    Array.Copy(match_words_array, words, max);
    values = new int[max];
    Array.Copy(match_values_array, values, max);
}

The Words array contains a list of 14,596 words in sorted order. The list includes each letter of the alphabet. For example, the section of words starting with the letter “a” begins with the value “A” to make finding those words easier.

The FindBestMatches method gets the first character in the target word and converts it to upper case. It then uses the Array.BinarySearch method to locate that letter in the word list. (Note that BinarySearch only works if the array is sorted, which it is.) For example, if you have entered “bea,” then the method searches for the word “B.”

Next, the code loops through the array examining words that start with this letter. If you typed “bea,” then it loops over words that start with “b.”

For each word that starts with the given letter, the program truncates the word to the length that you have typed so far. If you typed “bea,” and the program is considering the word “barter,” it truncates that test word to “bar.”

Next, the program builds an edit graph that transforms what you typed into the test word (for example, “bea” into “bar”). It adds the entire test word (barter) and its edit distance to the match_words and match_values lists.

After it has processed all of the words that start with the given letter, the method must return those that have the smallest edit distance values. To do that, it converts the match_words and match_values lists into arrays and uses Array.Sort to sort them. It uses match_words as the array to sort and uses match_values as an array of keys to use while sorting. The Array.Sort method sorts match_values in increasing order and arranges match_words so its values correspond to the numbers in match_values.

Finally, the FindBestMatches method uses Array.Copy to copy the desired number of matches and their edit distances into the words and values output arrays for return to the calling code.

TextBox AutoComplete

The TextBox control also provides some autocomplete features, which the example demonstrates in its bottom text box. When the program starts, the following Form_Load event handler loads the Words array and prepares the bottom text box for autocomplete.

// The dictionary.
private string[] Words;

// Load the dictionary.
private void Form1_Load(object sender, EventArgs e)
{
    Words = File.ReadAllLines("Words.txt");
    txtWord.Text = "absi";

    // Make an autocomplete TextBox.
    AutoCompleteStringCollection word_source =
        new AutoCompleteStringCollection();
    word_source.AddRange(Words);
    txtAuto.AutoCompleteSource = AutoCompleteSource.CustomSource;
    txtAuto.AutoCompleteCustomSource = word_source;
    txtAuto.AutoCompleteMode = AutoCompleteMode.Suggest;
}

The event handler uses the File class’s ReadAllLines method to load the file Words.txt into the Words array. It then enters the test word fragment “absi” in the form’s upper text box.

Next, the code creates an AutoCompleteStringCollection object to hold the text box’s auto completion words. This object behaves mostly like a string collection. (My guess is that internally it uses a trie, also known as a radix tree or prefix tree, to make searching for words easier. For more information on tries, see the Wikipedia article trie.)

The code uses the collection’s AddRange method to add the words in the Words array to it. It then sets the text box’s AutoCompletionSource property to CustomSource, its AutoCompletionCustomSource to the string collection, and its AutoCompleteMode property to Suggest.

After that, the text box works automatically. When you type the beginning of a word, the control displays dropdown list of words that begin with what you have typed. You can use then click on a word or the up and down arrows and press Enter to select a word to place in the text box.

Notes

As an exercise, you might try using LINQ to implement at least part of the FindBestMatches method. It might not work well for selecting words that start with the correct letter (it would loop through every entry in the word list), but it might make pulling out the best matches easier.

This method is not what my phone does but it does produce some reasonable suggestions. There are several ways that it might be improved. For example, this method considers all character exchanges equally likely. If you type “whst” the code considers the possibilities “whit” and “what” equally likely. If you look on the phone’s keypad, however, you’ll see that the s and a keys are right next to each other, so I probably tried to type “what” and just hit the wrong key. You could modify the edit distance algorithm to give preference to swapping letters that were close to each other on the keypad. (I may do that at some point.)

You could also probably improve the algorithm by considering the grammar and meaning of the sentence being typed. For example, if you type “Are you going ton,” then the final word is more likely to be “tonight” than “tonnage.” This is a much bigger topic, so I won’t pursue it.

This example also doesn’t include contractions in its word list. If you type “whats,” then you probably mean “what’s” and adding that to the word list (with some special rules for adding apostrophes) would give you a better result.

Finally, this method assumes that you typed the first letter of the word correctly but that may not always give the best solution. If you type “gello,” then “hello” is probably a better match than “gallop.” (I may work on this issue a bit later, too.)

As you can see, there are several ways you can improve this program, but it does give reasonable suggestions in many cases.


Download Example   Follow me on Twitter   RSS feed   Donate




This entry was posted in algorithms, strings and tagged , , , , , , , , , , , , , , . Bookmark the permalink.

5 Responses to Suggest autocomplete words in C#

  1. Pingback: Improve autocomplete suggestion in C# - C# HelperC# Helper

  2. Orel Gabay says:

    Love this blog!
    Great stuff every time.Thanks for that.
    Can you please attach the FileStringEditDistance example in the link above?

    • RodStephens says:

      DevX owns that article so I can’t repost it. Follow the link to see it.

      I just clicked on it and the first time it landed me on a generic DevX page, but the second time I clicked it I landed on the article. If you can’t get to it after a few tries, let me know.

      • Orel Gabay says:

        I can see the article but can’t find the download example link inside the website

        • RodStephens says:

          A few years ago, DevX was bought by another company and they lost all of the downloads. You can download the example posted at C# Helper to get the basic algorithm. If you want the code for one of my DevX articles, email me and I’ll send it to you.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.