Thursday, October 25, 2012

More mud: SimpleDateFormat and the Gregorian calendar

I spent another few hours wading through mud at work the other day, SimpleDateFormat related mud this time around. You would assume "yyyy/MM/dd HH:mm z" to be a pretty solid date format for representing a timestamp with minute precision:
  • yyyy/MM/dd: the date, e.g. "2012/10/23"
  • HH:mm: the time in 24-hour format, e.g. "18:32"
  • z: the timezone, e.g. "CET"
This date format can be used to store dates in a text file (or any other textual format for that matter, e.g. XML). Let's go one step further and always use UTC as the time zone for the dates stored in our text file. That avoids all confusion when parsing the file again: all dates are expressed in UTC. Here's a bit of code that does what we need: it takes an input Date object and formats it as a UTC date string, it then parses that UTC date again and verifies that the input and output are the same.
Date input = new Date(-62135773200000L); // "0001/01/01 00:00 CET"

SimpleDateFormat utc = new SimpleDateFormat("yyyy/MM/dd HH:mm z");
utc.setTimeZone(TimeZone.getTimeZone("UTC"));

String str = utc.format(input);

Date output = utc.parse(str);

if (input.equals(output)) {
 System.out.println("Equal!");
} else {
 System.out.println(input + " != " + output);
}
For most dates this would print Equal!. However, I used a special date in the code above: midnight on January 1st of the year 1 expressed in CET. The output on my machine is:
Sat Jan 01 00:00:00 CET 1 != Sun Jan 01 00:00:00 CET 2
What!? Year 1 became 2? Let's look at the string produced by the date formatting:
0001/12/31 23:00 UTC
The time becomes 23:00 UTC because CET is one hour ahaid of UTC. To understand why the day jumped to December 31st of the year 1, you have to realize that the Gregorian calendar, on which UTC and CET are based, does not have a year 0. This means the timeline looks like this (BC is Before Christ, AD is Anno Domini, the era indicator in SimpleDateFormat terms):
..., 2 BC, 1 BC, 1 AD, 2 AD, ...

The date formatting assumes January first of the year 1 to be AD. To move from midnight CET to 23:00 UTC, we end up in the previous day: December 31st of the year 1 BC. However, our date format does not encode the era (BC / AD), so when the date is parsed, year 1 is again assumed to be AD, which leads to the rather surprising result that year 1 AD becomes year 2 AD after the UTC to CET time adjustment.

Luckily fixing the problem is easier than understanding it! You simply need to add the era indicator to the date format pattern: "yyyy/MM/dd HH:mm z G", and the above code will work as expected.

PS: This problem popped up during an XStream version upgrade. Check XSTR-556 and XSTR-711 in the XStream JIRA for more information.