Java String Comparisons
Today I thought I would take some time to outline basic String comparison in Java. To many newcomers using Java doing comparisons using String can be confusing. Take for example the following piece of code which compares two String literals and two String objects. To newcomers of the Java language the following is confusing. Let’s take some time to outline how this actually works, and what actually makes it confusing.
<font color="#808080">01</font> <font color="#7f0055"><strong>public class </strong></font><font color="#999999">Test </font><font color="#999999">{</font>
<font color="#808080">02</font> <font color="#7f0055"><strong>public static </strong></font><font color="#7f0055"><strong>void </strong></font><font color="#999999">main</font><font color="#999999">(</font><font color="#999999">String</font><font color="#999999">[] </font><font color="#999999">args</font><font color="#999999">) {</font>
<font color="#808080">03</font> <font color="#999999">String a = </font><font color="#2a00ff">"abc"</font><font color="#999999">;</font>
<font color="#808080">04</font> <font color="#999999">String b = </font><font color="#2a00ff">"abc"</font><font color="#999999">;</font>
<font color="#808080">05</font> <font color="#999999">String c = </font><font color="#7f0055"><strong>new </strong></font><font color="#999999">String</font><font color="#999999">(</font><font color="#2a00ff">"abc"</font><font color="#999999">)</font><font color="#999999">;</font>
<font color="#808080">06</font> <font color="#999999">String d = </font><font color="#7f0055"><strong>new </strong></font><font color="#999999">String</font><font color="#999999">(</font><font color="#2a00ff">"abc"</font><font color="#999999">)</font><font color="#999999">;</font>
<font color="#808080">07</font> <font color="#999999">System.out.println</font><font color="#999999">((</font><font color="#999999">a==b</font><font color="#999999">)</font><font color="#999999">;</font><font color="#3f7f5f">//true</font>
<font color="#808080">08</font> <font color="#999999">System.out.println</font><font color="#999999">((</font><font color="#999999">c==d</font><font color="#999999">))</font><font color="#999999">;</font><font color="#3f7f5f">//false</font>
<font color="#808080">09</font> <font color="#999999">}</font>
<font color="#808080">10</font> <font color="#999999">}</font></code>
<code><font color="#999999">
First let’s look at the instructions used to generate a String literal in Java. It’s fairly straight forward for anyone who’s seen assembly before. We simply store the literal value using the instruction sets store command.
Instruction used to generate String literal
0: ldc #2; //String abc
2: astore_1
On the other hand there are many more instructions used to generate a String using the new operator. Since we are using the new operator we create a new reference to memory location, duplicate it, load the String literal and invoke a special init function on that new memory. In Java terms we are merely creating a new object and invoking that objects constructor with the parameter “abc”.
Instruction used to generate new String()
6: new #3; //class java/lang/String
9: dup
10: ldc #2; //String abc
12: invokespecial #4; //Method java/lang/String."<init>":(Ljava/lang/String;)V
15: astore_3
So the question still remains, why does a==b return true, and c==d return false? First let’s look at the == operator itself. The == operator works on String object references. If two String references point to the same object in memory, the comparison returns a true result. Otherwise, the comparison returns false, regardless of whether the text has the same character values.
The == operator does not compare actual char data. So we know now that the == operator is looking at comparing two object references, but how can a==b be true?
Well this comes down to the different way Java handles literals. The Java platform creates an internal pool for string literals and constants. String literals and constants that have the exact same char values and length will exist exactly once in the pool.
Subsequent comparisons of String literals and constants with the same char values will always be equal. So you see we could create a million String literals with the value “abc” but Java would only have reference in it’s internal pool to a String literal with value “abc” and length 3.
On the other hand c and d are not String literals they are String objects, and therefore are not created in the Java String literal and constant pool. These objects will each have it’s own reference in memory and therefore will have different values for their references.
Using the == operator to check whether their reference points to the same value will never be true since each occupies it’s own slot in memory.
If we wanted to compare the actual values stored within the String instead of their references, we would simply use the equals method or the compareTo method.
I hope this helps clarify some of the mystery behind how Java handles String and I hope I have helped increase your understanding of String comparisons in Java. Next time we’ll take a closer look at Natural Langauge text comparison in the Java language.