Erwin Vervaet

Thursday, May 13, 2010

Python default argument madness

I've been learning python recently. I'm just getting started so I'm running into all the stupid beginner problems, for instance the insane behaviour of default argument values. Take this example:

from datetime import datetime
import time

class Weird:

 def __init__(self, dt = datetime.now()):
  self.dt = dt

w1 = Weird()
time.sleep(1)
w2 = Weird()
w3 = Weird(datetime.now())

print w1.dt == w2.dt # True
print w1.dt == w3.dt # False

The datetime in the w1 and w2 objects will actually be the same since the default argument value in the method definition is only evaluated once! What!?!? This essentially renders default arguments useless since you always end up writing the following:

class Weird:

 def __init__(self, dt = None):
  self.dt = dt or datetime.now()

I don't understand why you would design a default argument values features in your language like this. Doing it this way certainly violates the principle of least astonishment!

Saturday, May 1, 2010

What is good code?

As a project tech lead I do a lot of code review, so the "What is good code?" question comes up a lot. Of course there are many things that factor into the "good code" equation such as efficiency, correctness and elegance. For me the most important property of good code is that it should adhere to the principle of least astonishment. In other words: good code is code that does what you expect it to do in the way you expect it to be done.

Following this principle has important benefits when it comes to reading, understanding and maintaining a piece of code. It also typically improves the correctness of the code: you tend to get code that obviously has no deficiencies, in stead of no obvious deficiencies (as Tony Hoare said).

I was happy to learn that I'm in good company when it comes to attributing the principle of least astonishment to good code. Both Uncle Bob (in his excellent Clean Code book) and Peter Seibel (in his very interesting Coders at Work) ask several famous programmers about good code. A lot of the interviewees directly or indirectly mention the principle of least astonishment (for instance Ward Cunningham, Joe Armstrong and Simon Peyton Jones).

Friday, April 2, 2010

TreeSet Gotcha

Sometimes API design mistakes can make you go d'oh! In the back of my mind I knew about the weird workings of Java's TreeSet, but today I was bitten by the consequences anyway.

The basic problem is that TreeSet implements a Set in terms of compare() or compareTo(), instead of the usual equals() method (as documented in the TreeSet JavaDoc). In strict terms this makes TreeSet violate the Set interface with regard to the Liskov substitution principle.

A quick example to illustrate the problem. Suppose you have a class holding two pieces of information. You want the objects of the class to have a natural order based on just a single piece of information, so you implement Comparable like this:

public class Foo implements Comparable<Foo> {
 
 private Object data;
 private int num;

 public Foo(Object data, int num) {
  this.data = data;
  this.num = num;
 }
 
 @Override
 public boolean equals(Object obj) {
  if (obj instanceof Foo) {
   Foo other = (Foo) obj;
   return this.data.equals(other.data) && this.num == other.num;
  }
  return false;
 }
 
 @Override
 public int hashCode() {
  return data.hashCode() + num;
 }

 @Override
 public int compareTo(Foo other) {
  return this.num - other.num;
 }
}

It's natural to assume that compareTo() only influences the order of the objects, but does not influence their equality. However, TreeSet uses compareTo() to test for object equality, which means the following test fails unexpectedly:

@Test
public void usageInASetWorksAsExpected() {
 Set<Foo> set = new TreeSet<Foo>();
 set.add(new Foo("bar", 1));
 set.add(new Foo("baz", 1));
 set.add(new Foo("bar", 1));
 set.add(new Foo("bar", 2));
 Assert.assertEquals(3, set.size()); // actual set size will be 2!
}

D'oh!

Tuesday, March 30, 2010

Mac Fanboys in Java Land

Could somebody explain to me why lots of Java (and Ruby, Phyton, ...) programmers seem to think Macs are the best thing since sliced bread? I'll be the first to admit that Apple hardware is excellent and that they have a UI flair that is unrivaled by any other company in the IT industry today.

However, as a programmer I don't care too much about dumbing it all down to a level where my grandmother can use it. I tried the Mac thing and sure it's great for some casual usage, but a number of things just annoy me when I try to get real work done:

First of all: why does Apple treat Java as a second grade citizen?
Why can't I resize my windows on the left side?
Why can't Apple just use a standard PC AZERTY keyboard layout (used in Belgium where I live)? If they did, I would at least be able to find my curly braces and square brackets, and pipe symbol for that matter!
Why does OS X need to deviate from pretty much every common Unix convention like the FHS, making things like the hosts file end up in /private/etc/hosts.
Why is it that I need to copy the entire Eclipse installation just to be able to run two workspaces at the same time?
Why is Apple incapable of putting a standard DVI or VGA adaptor on a MacBook?

And the list goes on! (For those wondering: I currently run Ubuntu on a Dell XPS laptop: no installation or configuration hassles, and usage freedom to boot!)

Friday, March 5, 2010

MQ's MQRFH2 Header and JMS

I've known about the MQRFH2 header used by WebSphere MQ for a long time. I was also aware of the problems this header can cause when exchanging messages between JMS and non-JMS based clients, and how to suppress use of the header by using targetClient=1 (MQJMS_CLIENT_NONJMS_MQ).

What I did not know before today however, is that if a JMS application receives a message not containing the MQRFH2 header, some basic JMS message properties like JMSDestination will not be available! This makes some sense since WebSphere MQ uses the MQRFH2 header to carry JMS specific data associated with the message. If you omit MQRFH2, only the properties that fit in the default MQMD header will be transmitted, the rest will be lost. (The WebSphere MQ documentation covers this in some detail.)

Still, for something basic like JMSDestination this came as a bit of a surprise since this property is set by the JMS service provider when sending the message and you expect it to be available when receiving that message, for instance to be able to do failover processing and send the message back to its original destination at some later time.

Saturday, February 13, 2010

Source Code Management Friendly Design

Good software design, especially proper modularization, brings many benefits such as single points of change, encapsulation and a reduced event horizon, just to name a few. This is clearly a Good Thing at the source code level, but it also has important source code management (SCM) implications. Let's look at an example.

Suppose your code needs to process some data in several different ways. When I say processing think about things like manipulating the data, reacting to it, and so on. The simplest thing that could possibly work would be something like this:

public class DataProcessingEngine {

 public void process(Data data) {
  processOneWay(data);
  processAnotherWay(data);
 }
 
 private void processOneWay(Data data) {
  // ...
 }
 
 private void processAnotherWay(Data data) {
  // ...
 }
}

This code has many issues. Purely at the source code level we have a class that just has too many responsibilities. At the SCM level, this class could become a merging nightmare. Imagine multiple development branches each adding new ways to process the data. Merging these branches back onto the trunk will almost certainly result in merging conflicts.

Of course we can do a lot better. Let's factor out the different ways to process the data into seperate DataProcessors:

public interface DataProcessor {
 void process(Data data);
}

Each different way of processing the data would have its own DataProcessor implementation. The DataProcessingEngine now becomes:

public class DataProcessingEngine {
 
 private List<DataProcessor> dataProcessors;
 
 public DataProcessingEngine(List<DataProcessor> dataProcessors) {
  this.dataProcessors = dataProcessors;
 }

 public void process(Data data) {
  for (DataProcessor dataProcessor : dataProcessors) {
   dataProcessor.process(data);
  }
 }
}

Of course we still need to configure the DataProcessingEngine with the appropriate DataProcessors somewhere:

public class DataProcessingEngineFactory {

 public static DataProcessingEngine create() {
  List<DataProcessor> dataProcessors = new ArrayList<DataProcessor>();
  dataProcessors.add(new OneWayDataProcessor());
  dataProcessors.add(new AnotherWayDataProcessor());
  return new DataProcessingEngine(dataProcessors);
 }
}

You could also use something like Spring to do this for you. In this case you would end up with bean definitions equivalent to the code above. At the code level this refactoring has pretty much solved the problem. We now have a few small classes, each with its own responsibility. However, at the SCM level, the DataProcessingEngineFactory class (or equivalent alternative configuration) still sits in a single file causing merge conflicts.

To solve this problem, we have to make the system a bit more dynamic. If the DataProcessingEngineFactory could automagically detect all available DataProcessor implementations, adding a new way of processing the data would be as simple as adding a new DataProcessor to the classpath. As a result, there would be no need to change the DataProcessingEngineFactory every time, an no more merge conflicts!

In Java you have quite a few options to implement such a dynamic discovery mechanism:

The service provider framework can automatically detect service implementations defined in META-INF/services.
Spring's ResourcePatternResolver can be used for this kind of thing.
Maybe you can even use the annotation processing tool to do something like this?

Using the Java 6 ServiceLoader, the DataProcessingEngineFactory would end up looking something like this:

public class DataProcessingEngineFactory {

 public static DataProcessingEngine create() {
  List<DataProcessor> dataProcessors = new ArrayList<DataProcessor>();
  for (DataProcessor dataProcessor : ServiceLoader.load(DataProcessor.class)) {
   dataProcessors.add(dataProcessor);
  }
  return new DataProcessingEngine(dataProcessors);
 }
}

It's interesting to note that annotation based configuration systems, which are all the rage, typically don't hold all configuration information in a single location, which helps ease your source code management as I've shown. XML based configuration is more problematic in this respect.

Tuesday, February 2, 2010

OSGI Dependency Resolution is NP-Complete

Apparently, the resolution problem in OSGI is NP-Complete, as are similar dependency resolution problems like apt package installation on Debian Linux systems. If you think about it it makes sense, but who would have thought!

I still remember learning about P and NP problems, and the famous P = NP question at university, and being really intrigued by all of it. Lance Fortnow provides an excellent overview of the P = NP problem in Communications of the ACM. It turns out that this is no longer a computer science problem, but one of the fundamental questions in all of science. Fascinating stuff! If only I could find a simple proof :-).