Saturday, February 13, 2010

Source Code Management Friendly Design

Good software design, especially proper modularization, brings many benefits such as single points of change, encapsulation and a reduced event horizon, just to name a few. This is clearly a Good Thing at the source code level, but it also has important source code management (SCM) implications. Let's look at an example.

Suppose your code needs to process some data in several different ways. When I say processing think about things like manipulating the data, reacting to it, and so on. The simplest thing that could possibly work would be something like this:
public class DataProcessingEngine {

 public void process(Data data) {
  processOneWay(data);
  processAnotherWay(data);
 }
 
 private void processOneWay(Data data) {
  // ...
 }
 
 private void processAnotherWay(Data data) {
  // ...
 }
}
This code has many issues. Purely at the source code level we have a class that just has too many responsibilities. At the SCM level, this class could become a merging nightmare. Imagine multiple development branches each adding new ways to process the data. Merging these branches back onto the trunk will almost certainly result in merging conflicts.

Of course we can do a lot better. Let's factor out the different ways to process the data into seperate DataProcessors:
public interface DataProcessor {
 void process(Data data);
}
Each different way of processing the data would have its own DataProcessor implementation. The DataProcessingEngine now becomes:
public class DataProcessingEngine {
 
 private List<DataProcessor> dataProcessors;
 
 public DataProcessingEngine(List<DataProcessor> dataProcessors) {
  this.dataProcessors = dataProcessors;
 }

 public void process(Data data) {
  for (DataProcessor dataProcessor : dataProcessors) {
   dataProcessor.process(data);
  }
 }
}
Of course we still need to configure the DataProcessingEngine with the appropriate DataProcessors somewhere:
public class DataProcessingEngineFactory {

 public static DataProcessingEngine create() {
  List<DataProcessor> dataProcessors = new ArrayList<DataProcessor>();
  dataProcessors.add(new OneWayDataProcessor());
  dataProcessors.add(new AnotherWayDataProcessor());
  return new DataProcessingEngine(dataProcessors);
 }
}
You could also use something like Spring to do this for you. In this case you would end up with bean definitions equivalent to the code above. At the code level this refactoring has pretty much solved the problem. We now have a few small classes, each with its own responsibility. However, at the SCM level, the DataProcessingEngineFactory class (or equivalent alternative configuration) still sits in a single file causing merge conflicts.

To solve this problem, we have to make the system a bit more dynamic. If the DataProcessingEngineFactory could automagically detect all available DataProcessor implementations, adding a new way of processing the data would be as simple as adding a new DataProcessor to the classpath. As a result, there would be no need to change the DataProcessingEngineFactory every time, an no more merge conflicts!

In Java you have quite a few options to implement such a dynamic discovery mechanism:

Using the Java 6 ServiceLoader, the DataProcessingEngineFactory would end up looking something like this:
public class DataProcessingEngineFactory {

 public static DataProcessingEngine create() {
  List<DataProcessor> dataProcessors = new ArrayList<DataProcessor>();
  for (DataProcessor dataProcessor : ServiceLoader.load(DataProcessor.class)) {
   dataProcessors.add(dataProcessor);
  }
  return new DataProcessingEngine(dataProcessors);
 }
}

It's interesting to note that annotation based configuration systems, which are all the rage, typically don't hold all configuration information in a single location, which helps ease your source code management as I've shown. XML based configuration is more problematic in this respect.

Tuesday, February 2, 2010

OSGI Dependency Resolution is NP-Complete

Apparently, the resolution problem in OSGI is NP-Complete, as are similar dependency resolution problems like apt package installation on Debian Linux systems. If you think about it it makes sense, but who would have thought!

I still remember learning about P and NP problems, and the famous P = NP question at university, and being really intrigued by all of it. Lance Fortnow provides an excellent overview of the P = NP problem in Communications of the ACM. It turns out that this is no longer a computer science problem, but one of the fundamental questions in all of science. Fascinating stuff! If only I could find a simple proof :-).