Test different methods that compare directories to see which files they have in common in C#

The example Compare directories to see which files they have in common in C# uses Directory.GetFiles to get the files in two directories. It sorts them and compares the two sorted lists to see which files are in the first directory, the second, or both. See that example for details about how it works.

Kuru posted a comment saying LINQ would be easier and more readable. This example compares the original code and two LINQ approaches.

The first LINQ approach uses LINQ to select the files from the directories and sort them. It then loops through the sorted lists as before to see which files are in which directory. The following code shows how this approach uses LINQ to select the files.

// Use LINQ twice to compare the files in each directory.
private void Compare_LinqTwice(string dir1, string dir2)
{
    // Get sorted lists of files in the directories.
    DirectoryInfo dir1_info = new DirectoryInfo(dir1);
    var dir1_query =
        from FileInfo file_info in dir1_info.GetFiles()
        orderby file_info.Name
        select file_info.Name;
    string[] file_names1 = dir1_query.ToArray();

    DirectoryInfo dir2_info = new DirectoryInfo(dir2);
    var dir2_query =
        from FileInfo file_info in dir2_info.GetFiles()
        orderby file_info.Name
        select file_info.Name;
    string[] file_names2 = dir2_query.ToArray();

    // Compare.
    ...
}

This code creates a DirectoryInfo object for the first directory. A LINQ query uses that object’s GetFiles method to get an enumerable list of FileInfo objects representing the directory’s files. It orders the results by the FileInfo objects’ names, and selects those names. The program then calls the query’s ToArray method to copy the results into an array of strings.

The code repeats those steps to get a sorted array containing the second directory’s file names. It then compares the two arrays as the previous example did.

The following code shows the third approach. (The one suggested by Kuru.)

// Use LINQ joins to compare the files in each directory.
private void Compare_LinqJoins(string dir1, string dir2)
{
    // Get sorted lists of files in the directories.
    DirectoryInfo dir1_info = new DirectoryInfo(dir1);
    var dir1_query =
        from FileInfo file_info in dir1_info.GetFiles()
        //orderby file_info.Name
        select file_info.Name;
    string[] file_names1 = dir1_query.ToArray();

    DirectoryInfo dir2_info = new DirectoryInfo(dir2);
    var dir2_query =
        from FileInfo file_info in dir2_info.GetFiles()
        //orderby file_info.Name
        select file_info.Name;
    string[] file_names2 = dir2_query.ToArray();

    // Compare.
    var dir1_only_query =
        from string file_name in file_names1
        where (!file_names2.Contains(file_name))
        select file_name;
    List<string> dir1_only = dir1_only_query.ToList();

    var dir2_only_query =
        from string file_name in file_names2
        where (!file_names1.Contains(file_name))
        select file_name;
    List<string> dir2_only = dir2_only_query.ToList();

    var both_query =
        from string file_name in file_names1
        where (file_names2.Contains(file_name))
        select file_name;
    List<string> both = both_query.ToList();
}

This method uses DirectoryInfo objects and LINQ to get arrays containing the directories’ files as before. Instead of looping through the arrays of names, this program uses three LINQ queries to select files that are only in the first directory, only in the second directory, or in both directories.

All three methods store their results in Lists rather than displaying the results in a DataGridView like the original example did. The program runs 10 trials of each method to get times big enough to be meaningful.

If you look closely at the picture at the top of the post, you’ll see that the original method was the fastest. The second approach that uses LINQ to select the files took about 50% longer.

The third approach that uses LINQ to decide which files are in which directories took almost 10 times as long. It is certainly simpler, and you may find it easier to read than the first approach, but it’s a lot slower.

(I also tried using the third approach but making the finalk three LINQ queries select data from the first two queries instead of from the arrays of file names. That was MUCH slower. I got tired of waiting after a couple of minutes and stopped the program.)




About RodStephens

Rod Stephens is a software consultant and author who has written more than 30 books and 250 magazine articles covering C#, Visual Basic, Visual Basic for Applications, Delphi, and Java.
This entry was posted in directories, files and tagged , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *