Title: Use LINQ to select words of certain lengths from a file in C#
This example uses LINQ to read a file, remove unwanted characters, select words of a specified length, and save the result in a new file.
Recently I needed a big word list so I searched around for public domain dictionaries. I found one that was close to what I needed in the file 6of12.txt in the 12Dicts package available here. That file has several problems that make it not quite prefect for my use:
- It contains words that are too short and too long for my purposes.
- It includes non-alphabetic characters at the end of some words to give extra information about them.
- Some words contain embedded non-alphabetic characters as in A-bomb and bric-a-brac.
The following code shows how the program processes the file.
// Select words that have the given minimum length.
private void btnSelect_Click(object sender, EventArgs e)
{
// Remove non-alphabetic characters at the ends of words.
Regex end_regex = new Regex("[^a-zA-Z]+$");
string[] all_lines = File.ReadAllLines("6of12.txt");
var end_query =
from string word in all_lines
select end_regex.Replace(word, "");
// Remove words that still contain non-alphabetic characters.
Regex middle_regex = new Regex("[^a-zA-Z]");
var middle_query =
from string word in end_query
where !middle_regex.IsMatch(word)
select word;
// Make a query to select lines of the desired length.
int min_length = (int)nudMinLength.Value;
int max_length = (int)nudMaxLength.Value;
var length_query =
from string word in middle_query
where (word.Length >= min_length) &&
(word.Length <= max_length)
select word;
// Write the selected lines into a new file.
string[] selected_lines = length_query.ToArray();
File.WriteAllLines("Words.txt", selected_lines);
MessageBox.Show("Selected " + selected_lines.Length +
" words out of " + all_lines.Length + ".");
}
The code starts by using a LINQ query to remove non-alphabetic characters from the ends of words.
It then uses a second LINQ query to select only words that now contain no non-alphabetic characters. (That eliminates A-bomb and bric-a-brac.)
Next a third LINQ query selects words with lengths between those indicated by the user.
Finally the code invokes the final query's ToArray method to convert the results into an array of words. It then uses File.WriteAllLines to write the words into a new file named Words.txt.
The code finishes by displaying the number of words in the new and original files.
Download the example to experiment with it and to see additional details.
|