Misunderstandings of HashSet (hopefully) put to rest

I continue to see misunderstandings of what a set is. A set is a collection that stores a single set of values. You can have “1,2,3,4,5” but not “1,1,2,3” because in the latter, there are multiple “1”s. If you have any implementation of set (HashSet, TreeSet) and you try to add a value that is already in that set, you will end up with the same thing you had before (i.e. you have “1,2,3” in a set, you try to add 1, you still only have “1,2,3”).

Today on StackOverflow I got into a discussion with another developer who insisted that since a HashMap is the class that is actually backing a HashSet, that you could store multiple values (even the same instance) in it simultaneously. I have also seen misunderstandings of sets professionally. One scenario was when I found some code that had to sort some search results for a schedule of classes (in the sense of classroom and teacher). Apparently the developer who wrote this sort method decided the easy way to do this was to take the Map currently storing the results, put its values in a TreeSet (because when you add items to a TreeSet, it sorts them as it goes), then read them back out. Only problem is, there were duplicate “values” in these search results, because they were for the same class given on the same day, but at different times. Thus, when the sort method was ran, any extra results for the same class later on the same day were discarded. Yikes! The simple answer, and the real easy way to do the sort was to call Collections.sort() on the map.

To illustrate usage of a HashSet, I whipped up this code example. You should run it, view the results, and go over this code to see what is happening if you want to understand HashSet.

A lot of developers get hung up on either what the equals() method does or what the hashCode() method does, which is not as important as what both do in conjunction with each other. As this example illustrates, the simple way to determine if an object will be added to aHashSet is to answer the question, will equals between my object and all existing objects in the HashSet return false AND will hashCode be unique? If the answer is yes to both questions then that object will be added to (become an additional object in) the HashSet. A good review of the contract for implementing hashCode and equals is available.

The bottom line is that you cannot store two objects with the same hashCode value in the HashSet.

Bonus:

The question on Stack Overflow that prompted this post was made with an incorrectly implemented equals method and a test case very similar to the first part (above the dashed line) of this code sample. In the equals method of this object, it basically always returned false. Even if two objects were equal, they would claim to not be equal. So then the test code attempted to add two of the same instance of this object to the HashSet. Surprisingly, only one instance was stored. The reason is that even though the two instances were “not equal” according to their equals method, the internal code for HashMap (which does back HashSet) tested for reference equality, and, finding that the objects had the same reference, proceeded along without storing the duplicate.

Leave a Reply

Your email address will not be published. Required fields are marked *