Title: Remove non-printable ASCII characters from a string in C#
The following TrimNonAscii extension method removes the non-printable ASCII characters from a string.
public static string TrimNonAscii(this string value)
{
string pattern = "[^ -~]+";
Regex reg_exp = new Regex(pattern);
return reg_exp.Replace(value, "");
}
In ASCII, the printable characters lie between space (" ") and "~". The code makes a regular expression that represents all characters that are outside of that range repeated one or more times. It uses the expression to create a Regex object and then uses its Replace method to replace those characters with an empty string. The method then returns the resulting string.
Note that this method removes many useful Unicode characters such as £, Æ, and ♥, in addition to fonts such as Cyrillic and Kanji. It's mostly useful for standard English text.
I don't know of a simple way to remove Unicode characters in bulk. You would probably need to make a table of characters that you do or do not want to include and then either loop through the string looking for them or use a Regex object to remove the ones you don't want.
Download the example to experiment with it and to see additional details.
|