Remove non-printable ASCII characters from a string in C#

[non-printable ASCII characters]

The following TrimNonAscii extension method removes the non-printable ASCII characters from a string.

public static string TrimNonAscii(this string value)
    string pattern = "[^ -~]+";
    Regex reg_exp = new Regex(pattern);
    return reg_exp.Replace(value, "");

In ASCII, the printable characters lie between space (” “) and “~”. The code makes a regular expression that represents all characters that are outside of that range repeated one or more times. It uses the expression to create a Regex object and then uses its Replace method to remove those characters. The method then returns the resulting string.

Note that this method removes many useful Unicode characters such as £, Æ, and ♥, in addition to fonts such as Cyrillic and Kanji. It’s mostly useful for standard English text.

I don’t know of a simple way to remove Unicode characters in bulk. You would probably need to make a table of characters that you do or do not want to include and then either loop through the string looking for them or use a Regex object to remove the ones you don’t want.

Download Example   Follow me on Twitter   RSS feed   Donate

This entry was posted in extension methods, strings and tagged , , , , , , , , , , , , . Bookmark the permalink.

9 Responses to Remove non-printable ASCII characters from a string in C#

  1. balu says:


  2. Marcone says:

    Thanks, this was the only one code that resolve my problem.

  3. Joseph says:

    This works great but this seems to remove Unicode characters which I don’t want. I only want to remove non printable characters and the following character I have is being remove:

  4. Rod Stephens says:

    What character code is that? I only see a sort of underscore.

    Unfortunately Unicode defines hundreds of non-printable characters such as control characters and formatting characters, and many are kind of scattered around instead of having nice contiguous values. So I don’t know how to filter them out.

    There’s also the issue of which font you are using. For example, you may not have a font for a particular locale installed. In that case text in that font might appear is non-printable on your system.

    Anyway, I don’t know if there’s a simple solution to this issue. This example sort of does this, although it’s not trivial:

  5. andre says:

    thanks works great

  6. Adrian says:

    The Regular Expression in the code uses the “*” (asterisk) quantifier. It means zero-or-more repetitions of the preceding thing. For this character removal task it would be better to use the “+” (plus) quantifier which means one or more repetitions.

    For the given replacement string (i.e. the empty string) the final result is the same. But, having the zero-or-more means that the replacement string will be inserted between every pair of input characters. This is easily seen by changing the return expression to be:

    reg_exp.Replace(value, “=”);

    My experiments show the code using “[^ -~]*” runs significantly slower than the version using “[^ -~]+”. However I suspect that for most programs this speed difference is not important. It would only matter for programs that do large amounts of processing with regular expressions.

  7. Daniel Pinski says:

    Your regex only keeps the values 32 – 126, so all char values > 126 will be filtered out. You can do this instead:

    string pattern = $”[{(char)0}-{(char)31}{(char)127}]+”;

    This removes all characters in the range 0 – 31, the control characters. It additionally removes the DEL character (127). This ensures your unicode characters stay intact. This has one drawback of not filtering out the unicode non-printable characters though.

    Note: The “+” in your regex is unneeded, the function will work the same. I don’t know if there are any speed advantages to that, however.

    -Daniel Pinski

    • RodStephens says:

      The goal was to only keep printable ASCII characters, but your version looks good if you also want other Unicode characters.

      I think I looked around at one time to see if there was a way to remove non-printable Unicode characters and I vaguely remember reading that there was no easy way to do that. I suppose you could print onto a bitmap and see if any of the pixels were changed.

Comments are closed.