Title: Convert docx files to md files with C# and Microsoft Word
Lately I've been working on some projects that need to be written in markdown. I've found it's much faster to write the markdown in Microsoft Word, applying styles to make the different pieces look correct, and then save the final result in a text file with a .md extension. Unfortunately, Microsoft Word "helpfully" adds the txt extension, so I get file names like sample.md.txt and then need to rename them.
It doesn't take too long but it's annoying, so I wrote this program to save a docx file in txt format with the .md extension.
The top of the main program file includes this using directive.
using Word = Microsoft.Office.Interop.Word;
To use this library, right-click References in Solution Explorer and select Add Reference. On the COM tab, select Microsoft Word 14.0 Object Library (or whatever version is installed on your system).
When you invoke File menu's Open command (or press Ctrl+O), the following code executes.
private void mnuFileOpenFiles_Click(object sender, EventArgs e)
{
if (ofdFile.ShowDialog() == DialogResult.OK)
{
foreach (string filename in ofdFile.FileNames)
{
bool found_it = false;
foreach (FileData file_data in clbFiles.Items)
{
if (file_data.FileInfo.FullName == filename)
{
found_it = true;
break;
}
}
if (!found_it) clbFiles.Items.Add(new FileData(filename));
}
clbFiles.CheckAll();
}
}
This code displays a file open dialog so you can select one or more docx files. If you select files and click Open, the program loops through the names of the selected files. For each file, it loops through the items in the checked list box clbFiles to make sure that file is not already in the list. If the file isn't already there, the code adds it.
The program stores file information in the following FileData class.
public class FileData
{
public FileInfo FileInfo;
public FileData(string filename)
{
FileInfo = new FileInfo(filename);
}
public override string ToString()
{
return FileInfo.Name;
}
}
The checked list box (and the regular list box, combo box, and other list-like controls) use ToString to decide what to display. The FileInfo class is handy for storing information about files, but unfortunately it's ToString method returns the file's full name, which is too long to look nice in the checked list box.
The FileData class just holds a FileInfo object and overrides its ToString method to display the file's short name.
When you click the Go button, the following code executes.
private void btnGo_Click(object sender, EventArgs e)
{
// Get the Word application object.
Word._Application word_app = new Word.Application();
// Make Word visible (optional).
//word_app.Visible = true;
// Process the files.
foreach (FileData file_data in clbFiles.CheckedItems)
{
ProcessFile(word_app, file_data.FileInfo.FullName);
}
// Close down Word.
object save_changes = false;
object missing = Type.Missing;
word_app.Quit(ref save_changes, ref missing, ref missing);
MessageBox.Show("Done");
}
This code creates a new Word application server. Uncomment the statement that sets Visible to true if you want to make the server visible. (That's mostly useful for debugging.)
Next the code loops through the files that are checked in the checked list box and calls the ProcessFile method (described shortly) for each file.
After it has processed the files, the code closed the Word application server and displays the message "Done."
The following code shows the ProcessFile method.
private void ProcessFile(Word._Application word_app, string filename)
{
// Create the Word document.
object missing = Type.Missing;
object yes = true;
object no = false;
object filename_obj = filename;
Word._Document word_doc = word_app.Documents.Open(
ref filename_obj, ref missing, ref yes, ref no,
ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing);
string new_filename = filename.Replace(".docx", ".md");
object new_filename_obj = new_filename;
object txt_format = Word.WdSaveFormat.wdFormatText;
word_doc.SaveAs2(ref new_filename_obj, ref txt_format,
ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing);
object save_changes = false;
word_doc.Close(ref save_changes, ref missing, ref missing);
}
This function creates some objects to represent various Word values. The Word library methods generally take references to objects as parameters. For example, you can't simply pass them the value false. Instead you must make an object variable equal to false and then pass if by reference.
Having created those values, the program uses them to opens the Word file. All of those missing values make the Open method provide defaults for those values.
Next, the program replaces ".docx" with ".md" to compose the new file name. It calls the Word document object's SaveAs2 method to save the document as a txt file. Finally, the ProcessFile method closes the word document.
Download the example to experiment with it and to see additional details.
|