Saturday, 16 February 2013

SortedSet + Joda DateTime == danger

It's been quite a long time since I wrote something on this blog... Two things occurred that made me do this.
Firstly, I'm going to talk at Java Developer's Conference in Cairo and at Booster conference in Bergen next month, so I want to have some content when I put a link at my slides ;)
Secondly, last week I encountered really weird situation. In fact it was endless loop.
Yep.
In was in rather critical place of our app and it was on semi-production environment so it was quite embarassing. What's more, the code was working before, it was untouched for about half a year, and it had pretty good test coverage. It looked more or less like this (I've left some stuff out, so now it looks too complex for it's task):

def findDates(dates:SortedSet[DateTime],a:List[DateTime])=
  if (dates.isEmpty || dates.head.toMilis < date) {
    (dates, a)
  } else {
    findDates(dates - dates.head, a+dates.head)
  }






Just simple tail recursion, how can it loop endlessly? It turns out it can. Actually, for some specific data dates - dates.head == dates.
Why? The reason is DateTime is not consistent with equals. If you look into Comparable definition, it says:
It is strongly recommended (though not required) that natural orderings be consistent with equals. This is so because sorted sets (and sorted maps) without explicit comparators behave "strangely" when they are used with elements (or keys) whose natural ordering is inconsistent with equals. In particular, such a sorted set (or sorted map) violates the general contract for set (or map), which is defined in terms of the equals method.
What does this mean? That you should only use sorted collections for classes that satisfy following:if a.compareTo(b) == 0 then a.equals(b) == true And in joda's DateTime javadoc you can read:
Compares this object with the specified object for ascending millisecond instant order. This ordering is inconsistent with equals, as it ignores the Chronology.
And it turns out that this was our case - in our data there were dates that were equal with respect to miliseconds, but in different timezones. What's more, not every pair of such dates can lead to disaster. They have to cause some mess in underlying black-red tree... The solution was to introduce some wrapper (we used it anyway actually) that defined comparison consistent with equality...