Add LINQ to autocomplete in C#

[autocomplete]

This example adds LINQ to the example Improve autocomplete suggestion in C#. It adds LINQ in two places: when it loads the list of words and when the program searches for the best matches that begin with a particular letter.

The following code shows how the older version of the program loads its word list.

// Non-LINQ method.
foreach (string line in lines)
{
    if (line.Length > 1)
    {
        // Get the first letter.
        int index = (int)line.ToUpper()[0] - (int)'A';
        Words[index].Add(line);
    }
}

This code loops through the words in the lines array. For each word it gets the word’s first letter and uses it to decide which list in the Words array of lists should hold that word.

The following code shows the LINQ version.

// LINQ method.
var word_groups =
    from string line in lines
    group line by line.ToUpper()[0] into g
    select new { Letter = g.Key, Words = g };
foreach (var g in word_groups)
{
    int index = (int)g.Letter - (int)'A';
    Words[index] = g.Words.ToList();
}

This code defines a query that selects the words in the lines array and groups them by their first letters. For each group, the program uses the group’s first letter to calculate the index where the group’s words should be stored in the Words list and then stores the group there converting it into a List.

LINQ is often not as fast as other methods. Its strength lies in the fact that it’s easy to write and often produces code that is easier to read than code that you write to perform similar actions. In this case, however, I don’t think the LINQ is easier to write or understand. The LINQ group command has a confusing syntax and the result is rather confusing. For this simple example that selects all of the items, LINQ doesn’t add much and actually makes the code longer, so I wouldn’t use it here.

The other use of LINQ in this example, however, is more helpful. The following code shows how the previous version of the program finds the best matches for a word.

// Find the best matches for the typed word.
private void FindBestMatches(string word, int num_matches,
    out string[] words, out int[] values)
{
    // Get the word list for the word's first letter.
    int index = (int)word.ToUpper()[0] - (int)'A';

    // Find words that start with the same letter.
    List<string> match_words = new List<string>();
    List<int> match_values = new List<int>();
    foreach (string test_word in Words[index])
    {
        // Consider the next word up to
        // the length of the typed word.
        int max_length = Math.Min(test_word.Length, word.Length);
        string short_word = test_word.Substring(0, max_length);

        // Build the edit graph.
        Node[,] nodes = MakeEditGraph(word, short_word);

        // List the distance.
        int x = nodes.GetUpperBound(0);
        int y = nodes.GetUpperBound(1);
        match_words.Add(test_word);
        match_values.Add(nodes[x, y].distance);
    }

    // Sort the matches by distance, smallest distance first.
    string[] match_words_array = match_words.ToArray();
    int[] match_values_array = match_values.ToArray();
    Array.Sort(match_values_array, match_words_array);

    // Return the desired number of matches.
    int max = Math.Min(num_matches, match_values_array.Length);
    words = new string[max];
    Array.Copy(match_words_array, words, max);
    values = new int[max];
    Array.Copy(match_values_array, values, max);
}

The code loops through the list of words that have the same initial letter as the target word. For each test word in the list, the code creates an edit graph representing the transformation from the word to the test word and saves the word and its edit distance. After the loop finishes, the program sorts the words and their edit distances and returns at most the desired number of matches.

A few hints that LINQ might be useful here include the fact that the code searches over lists (words with matching first letters), sorts some arrays (the words and their edit distances), and returns at most a certain number of items (the words and their edit distances again).

This code performs a somewhat involved calculation to find the edit distance for a word. While you may be able to work that calculation into LINQ code, LINQ statements are a lot easier to understand if you extract that calculation into a separate method. The new example does that with the following GetEditDistance method.

// Return the edit distance from target_word to test_word.
private int GetEditDistance(string target_word, string test_word)
{
    // Make test_word the same length as target_word.
    int max_length = Math.Min(test_word.Length, target_word.Length);
    test_word = test_word.Substring(0, max_length);

    // Build the edit graph.
    Node[,] nodes = MakeEditGraph(target_word, test_word);

    // Return the distance.
    int x = nodes.GetUpperBound(0);
    int y = nodes.GetUpperBound(1);
    return nodes[x, y].distance;
}

This method performs the same steps used by the previous examples to calculate a test word’s edit distance. It then returns that distance.

The following code shows how the new example uses this GetEditDistance method and LINQ to find the best matches for a word.

// Find the best matches for the typed word.
private string[] FindBestMatches(string target_word,
    int num_matches)
{
    // Get the word list for the word's first letter.
    int index = (int)target_word.ToUpper()[0] - (int)'A';

    // Use LINQ to select the test words' edit distance.
    var word_distance_query =
        from string test_word in Words[index]
        orderby GetEditDistance(target_word, test_word)
        select GetEditDistance(target_word, test_word) +
            " " + test_word;

    // Return up to the desired number of matches.
    return word_distance_query.Take(num_matches).ToArray();
}

This code builds a LINQ query that searches the Words list that contains the words beginning with the target word’s first letter. It orders the results by the edit distances between those words and the target word. Finally, it selects the edit distances and the corresponding words.

Having built the query, the code invokes it’s Take method to take at most the desired number of matches. It calls the result’s ToArray method to convert the result into an array and returns it.

This example has one other minor change. The previous version kept the test words and their edit distances in two separate arrays so it could sort the arrays by edit distance. It then needed to combine the values to create strings showing the words and their edit distances. The TextChanged event handler that calls FindBestMatches must then assemble the edit distances and words as shown in the following code.

// Find good guesses for this word.
private void txtWord_TextChanged(object sender, EventArgs e)
{
    lstGuesses.Items.Clear();
    string word = txtWord.Text;
    if (word.Length == 0) return;

    // Find the best matches.
    string[] words;
    int[] values;
    FindBestMatches(word, 10, out words, out values);

    // Display the best matches.
    for (int i = 0; i < words.Length; i++)
    {
        lstGuesses.Items.Add(values[i].ToString() +
            '\t' + words[i]);
    }
}

The new example’s LINQ code sorts the results, builds the result strings, and returns at most a desired number of items all in one step. This not only simplifies the FindBestMatches method, but it also simplifies the text box’s TextChanged event handler that calls it as shown in the following code.

// Find good guesses for this word.
private void txtWord_TextChanged(object sender, EventArgs e)
{
    lstGuesses.DataSource = null;
    string word = txtWord.Text;
    if (word.Length == 0) return;

    // Find the best matches.
    lstGuesses.DataSource = FindBestMatches(word, 100);
}

This use of LINQ is worth the effort. It makes the FindBestMatches method easier to understand and makes the TextChanged event handler that calls it simpler, too. It probably reduces performance, but this application is fast enough that the user won’t notice.


Download Example   Follow me on Twitter   RSS feed   Donate




This entry was posted in algorithms, LINQ, strings and tagged , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.