Friday, April 2, 2010

TreeSet Gotcha

Sometimes API design mistakes can make you go d'oh! In the back of my mind I knew about the weird workings of Java's TreeSet, but today I was bitten by the consequences anyway.

The basic problem is that TreeSet implements a Set in terms of compare() or compareTo(), instead of the usual equals() method (as documented in the TreeSet JavaDoc). In strict terms this makes TreeSet violate the Set interface with regard to the Liskov substitution principle.

A quick example to illustrate the problem. Suppose you have a class holding two pieces of information. You want the objects of the class to have a natural order based on just a single piece of information, so you implement Comparable like this:
public class Foo implements Comparable<Foo> {
 
 private Object data;
 private int num;

 public Foo(Object data, int num) {
  this.data = data;
  this.num = num;
 }
 
 @Override
 public boolean equals(Object obj) {
  if (obj instanceof Foo) {
   Foo other = (Foo) obj;
   return this.data.equals(other.data) && this.num == other.num;
  }
  return false;
 }
 
 @Override
 public int hashCode() {
  return data.hashCode() + num;
 }

 @Override
 public int compareTo(Foo other) {
  return this.num - other.num;
 }
}
It's natural to assume that compareTo() only influences the order of the objects, but does not influence their equality. However, TreeSet uses compareTo() to test for object equality, which means the following test fails unexpectedly:
@Test
public void usageInASetWorksAsExpected() {
 Set<Foo> set = new TreeSet<Foo>();
 set.add(new Foo("bar", 1));
 set.add(new Foo("baz", 1));
 set.add(new Foo("bar", 1));
 set.add(new Foo("bar", 2));
 Assert.assertEquals(3, set.size()); // actual set size will be 2!
}
D'oh!