Title: Understand string equality testing in C#
To really understand this example, you need to know about string interning, equality testing, and operator overloading.
Interning
First, interning. To save space, .NET uses an intern pool to store literal strings at compile time. If the program contains a string literal, it is added at compile time to the intern pool. Later, if another string literal contains the same value, it uses the same instance within the intern pool.
Note that this only happens to literal values defined at design time. If the program builds a string at run time, it is not placed in the intern pool because that might slow the program down.
Equality Testing
Next, equality testing. There are two kinds of equality: reference equality and value equality. In reference quality, two variables are compared to see if they point to the same object. In value equality, two variables are compared to see if the things they point to have the same values.
For example, suppose you have two different Person objects that both contain the same data. They are equal using value equality because they represent the same values. They are different using reference equality because they are two separate Person objects that just happen to contain the same values.
Operator Overloading
Finally, operator overloading. Classes can overload operators such as == so they perform some specific action. The string class overloads == to make it call the Equals method. That method compares the contents of two strings.
The object class does not overload == and it uses reference equality testing.
The Example
With that as background, you can understand the example. When the program runs, it executes the following code.
// Two equal strings created at run time.
string A = "ABCDEFGHIJ";
string B = "ABCDEFGHIJ";
bool a_eq_b = A == B;
bool a_equals_b = A.Equals(B);
txtAeqB.Text = a_eq_b.ToString();
txtAequalsB.Text = a_equals_b.ToString();
// Two equal strings created at
// design time but stored as objects.
object C = A;
object D = B;
bool c_eq_d = C == D;
bool c_equals_d = C.Equals(D);
txtCeqD.Text = c_eq_d.ToString();
txtCequalsD.Text = c_equals_d.ToString();
// Two equal strings created at run time.
string E = A.Substring(2, 4);
string F = A.Substring(2, 4);
bool e_eq_f = E == F;
bool e_equals_f = E.Equals(F);
txtEeqF.Text = e_eq_f.ToString();
txtEequalsF.Text = e_equals_f.ToString();
// Two equal strings created at
// run time but stored as objects.
object G = E;
object H = F;
bool g_eq_h = G == H;
bool g_equals_h = G.Equals(H);
txtGeqH.Text = g_eq_h.ToString();
txtGequalsH.Text = g_equals_h.ToString();
The code first creates two strings A and B. Because they are literals, they are placed in the intern pool.
Next the code tests A == B. The string class overloads == to use the Equals method, so the test A == B invokes that method and the program knows that they are the same.
When the code tests A.Equals(B), it obviously uses the Equals method so again the program knows that the two values are the same.
Now the program creates object variables C and D, makes them refer to A and B, and then checks C == D. Other discussions I've seen of this on the internet gloss over this test but it's probably the strangest. The test C == D uses reference equality because the two variables are objects. But because A and B were interned, they refer to the same location in the intern pool. That means C and D also refer to the same location in the intern pool and therefore the reference equality test C == D returns true.
Next the program creates two strings at run time by taking a substring of the value A. Because these strings only exist at run time, they are not literals so they are not interned. They refer to different string objects that happen to contain the same values.
The test E == F uses string variables and the string class overloads == to use Equals, so that test returns true.
The test E.Equals(F) also uses Equals so it returns true.
Next the code creates two object variables G and H and sets them equal to E and F. When it tests G == H, the two variables refer to different objects. The == operator tests reference equality for object variables so this test returns false.
Finally the program tests G.Equals(H). The Equals method is virtual, so the code calls the string object's version of the method even though the variable G is a non-specific object. The string version of the method compares the string values so it returns true.
Yes it's complicated. If you don't see why each of the tests returns what it does, you should read the explanation again.
The Moral
So what's the moral of the story? Normally if you treat strings like strings, you don't need to worry about this. You can use == to keep your code easier to read.
However, if you save a string value in an object variable, you need to use Equals to test equality. Some people use Equals all of the time so they don't need to worry about the difference. That seems a bit silly because a typical program will use string variables a lot but will rarely store a string in an object variable. You also can't test A.Equals(B) if A is null.
One time where this makes more sense is if you have a generic method that might need to work with strings as in the following code.
private void Test(T a, T b) where T : class
{
if (a == b)
{
...
}
}
The constraint : class means a and b must be classes and string is a class.
If you pass string literals into this method, it works as expected. However, if you pass strings created at run time into the method, it will not notice if the strings are the same.
As is usually the case, you can avoid the problem most of the time if you use common sense. Use strings when you can and don't make generic methods overly general. If you need to pass objects into a generic method as in this case, use Equals instead of ==. Hopefully if you do run into this weird situation, you'll remember this post and be able to figure out what's happening.
Download the example to experiment with it and to see additional details.
|