Remove non-digits or non-letters from a string in C#

Sometimes you might want to extract only the digits, letters, or some other group of characters from a string. You could loop through the string examining each character individually. Fortunately there’s a much easier way to do this.

The regular expression class Regex provides a static Replace method that replaces characters matching a pattern with a new value. The following code uses that method to replace non-digits and non-letters with “”.

private void btnReplace_Click(object sender, EventArgs e)
{
    // Display only letters.
    txtLetters.Text =
        Regex.Replace(txtString.Text, "[^a-zA-Z]", "");

    // Display only digits.
    txtDigits.Text =
        Regex.Replace(txtString.Text, "[^0-9]", "");
}

The key is the pattern used by Replace. For example, consider the first pattern [^a-zA-Z]. The brackets enclose a pattern giving a list of characters that the group could match. In this case, the pattern includes characters in the ranges a-z and A-Z. The ^ symbol at the beginning means “not” so this pattern matches any single character that is not in the range a-z or A-Z. In other words it matches non-letters. The final parameter replaces any matched character with “” so the result contains only letters.

The second call to Replace uses the pattern [^0-9] to remove non-digits.


Download Example   Follow me on Twitter   RSS feed   Donate




This entry was posted in parsing, regular expressions, strings and tagged , , , , , , , , , , , , , , , . Bookmark the permalink.

4 Responses to Remove non-digits or non-letters from a string in C#

  1. Protiguous says:

    I think the examples might be switched..

    • RodStephens says:

      I think it’s correct. The caret ^ in the regular expression means “not” so, for example, [^0-9] means to match everything that is *not* a digit. It then replaces those characters with an empty string to remove everything that isn’t a digit.

      • Jorge Cordero says:

        Actually ^ denotes beginning of the pattern. $ Denotes end of pattern. That can be easily tested in Visual Studio Code.

        • RodStephens says:

          Actually it’s not quite as simple as that. At the beginning or end of a pattern, ^ and $ match the start and end of the unit of interest. If you’re processing a single string, then they are the beginning and end of that string. If you’re scanning a file line-by-line, then they match the start and end of each line.

          However, at the beginning of a character class like […], ^ means not.

          You’re right, however, that this can be easily tested in Visual Studio code. For example, this example works. Feel free to download the example and experiment with it.

Comments are closed.