Title: Convert doc files into docx files in C#
This example shows how to write a C# program to automatically convert doc files into docx files.
In a previous version of Windows, whenever I double-clicked a .doc file, I got this error message:
Error
The operating system is not presently configured to run this application.
This has been fixed in recent versions of Windows, but every time that happened I wondered why Microsoft decided to make Word not understand .doc files by default. Perhaps it was to "encourage" people to move to .docx files.
The newer .docx format offers more features (although I'm not sure what they are) so it makes sense to convert doc files to docx files anyway.
This program converts all of the .doc files in a directory into .docx files. The program includes a reference to Microsoft.Office.Interop.Word 12.0.0.0 to allow it to manipulate Microsoft Word files. To add such a reference, open the Project menu and select Add Reference. On the .NET tab, select the most recent version of Microsoft.Office.Interop.Word entry and click OK.
To make using the classes defined in that library easier to use, the program includes the following using statement:
using Word = Microsoft.Office.Interop.Word;
This allows the program to use the prefix Word to indicate classes in the library.
If you enter or select a directory path and click the Convert button, the following code executes.
// Convert the files in the directory.
private void btnConvert_Click(object sender, EventArgs e)
{
// Open the Word server.
Word._Application word_app = new Word.ApplicationClass();
// Make a couple of objects used in method calls.
object missing = System.Reflection.Missing.Value;
object save_changes = false;
// Loop through the files.
int num_converted = 0;
lstFiles.Items.Clear();
DirectoryInfo dir_info = new DirectoryInfo(txtDirectory.Text);
foreach (FileInfo file_info in dir_info.GetFiles("*.doc"))
{
// Skip .docx files.
if (file_info.Extension.ToLower() == ".docx") continue;
// Get the converted file's name.
int name_length =
file_info.FullName.Length - file_info.Extension.Length;
string new_filename =
file_info.FullName.Substring(0, name_length) + ".docx";
// See if this file has already been converted.
if (File.Exists(new_filename))
{
lstFiles.Items.Add("Skipped " + file_info.Name);
}
else
{
lstFiles.Items.Add("Converted " + file_info.Name);
num_converted++;
// Open the file.
object filename = file_info.FullName;
object confirm_conversions = false;
object read_only = true;
object add_to_recent_files = false;
object format = 0;
Word._Document word_doc =
word_app.Documents.Open(ref filename,
ref confirm_conversions, ref read_only,
ref add_to_recent_files, ref missing,
ref missing, ref missing, ref missing,
ref missing, ref format, ref missing,
ref missing, ref missing, ref missing,
ref missing, ref missing);
// Save as a .docx file.
filename = new_filename;
object file_format =
Word.WdSaveFormat.wdFormatDocumentDefault;
word_doc.SaveAs(ref filename, ref file_format,
ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing,
ref missing, ref missing);
// Close the document without prompting.
word_doc.Close(ref save_changes, ref missing,
ref missing);
}
}
// Close the word application.
word_app.Quit(ref save_changes, ref missing, ref missing);
int num_skipped = lstFiles.Items.Count - num_converted;
MessageBox.Show("Converted " + num_converted.ToString() +
" files.\n" +
"Skipped " + num_skipped.ToString() + " files.",
"Done", MessageBoxButtons.OK, MessageBoxIcon.Information);
}
This code creates a Word application object to work with Word. It then defines a couple of variables that it can pass to Word methods. Those methods take almost every parameter by reference (with the ref keyword) so the values must be stored in variables that you can then pass by reference. The variables must have the correct types (object) or the references won't work.
Probably the most important and confusing value is System.Reflection.Missing.Value. If you want to omit a parameter to one of the Word methods, you cannot simply pass null into the method. Instead you must pass this special value by reference.
Having created the two values, the code creates a DirectoryInfo object representing the directory you entered in the text box and uses its GetFiles method to enumerate the files matching the pattern *.doc in the directory.
If a file's extension is .docx, the program uses a continue statement to skip the file and continue the loop.
The code then calculates the file's name with the extension set to .docx. If that file already exists, the program adds a statement saying it skipped the file to the form's ListBox. If the .docx file does not exist, the program adds a message saying it converted the file and then starts doing the real work.
The code uses the Word application's Documents collection's Open method to open the .doc file. (Notice all of the missing parameters passed with the ref keyword.) The code then simply calls the document's SaveAs method to make it re-save the document in the default format, which is the .docx format. The code then closes the document.
After it has processed all of the files, the program closes the Word application server and displays a message telling how many files it converted and skipped.
This program does not delete the old .doc files just to be safe, although you could easily add that feature. (I would probably put a CheckBox on the form so the user can indicate whether the program should delete the files.)
Download the example to experiment with it and to see additional details.
|