Original URL: http://www.theregister.co.uk/2006/06/27/getting_together_time/

Building cohesion into programming

Time is of the essence

By Kevlin Henney

Posted in Developer, 27th June 2006 14:10 GMT

There are many reasons for getting cohesive.

The principle of locality, which is normally considered with respect to locality of reference when using resources such as memory, can also apply to the organisation of APIs and classes and the partitioning of packages, components and header files. This criterion for cohesion is based on the idea that things that are used together belong together.

Popular as the practice appears to be, it makes little sense to bundle all of your exception classes into a single package named exceptions or all of your compile-time constants in a single header file named "constants.h".

The cohesion is coincidental and doesn't reflect how the code is used or what it means: If usage of a particular class can result in a specific exception, why is the exception not defined close to the class? If you need a particular constant, such as a service name, why should that also bring in unrelated constants, such as a default buffer length?

It turns out that there is in another criterion that can be used to arrive at the same conclusion in this case: stability. Or, put another way, put things together that change together. The change in question is not runtime change but development-time change: the change that code endures over the software lifecycle.

The other side of the partitioning coin from cohesion is coupling, and stability also applies here: a unit (function, class, header, package, layer, etc) should ideally depend on units that are more, not less, stable than itself. Put simply: prefer to build on solid ground.

In the case of the exceptions package, all the feature packages whose classes need to throw exceptions that are defined in the exceptions package depend on the exceptions package, as do the users of the feature packages. This is likely to make the exceptions package one of the most, if not the most, heavily depended upon packages in a system.

Unfortunately, it is also likely to be one of the least stable: any new exception for a feature package will affect the exceptions package, as will the addition of any new feature package that needs new exception types. This conceptual instability can manifest itself concretely if the compiled classes for lower layers are deployed separately from those that are higher up, i.e. placing the code for application features in one JAR file and the code for so-called utility classes, such as exceptions, in another.

The same problem exists for a "constants.h" header, but the churn problem shows up sooner during the compilation–link cycle: every time the header file is modified, a rebuild is triggered, regardless of whether or not an including source file depends on the constant in question. Changing a default buffer length will still cause a rebuild for files whose only interest is in a service name. So the lack of cohesion, from the perspective of common use, makes changes more likely: it reinforces the lack of cohesion from the perspective of stability.

The solution in both cases is to split up the package and the header and relocate their constituent parts according to the features they relate to.

In the case of the constants, there is one more refinement that can further reduce coupling and isolate change... but we'll discuss that another day.

The advice to take away is that organising with respect to rate of change is a form of cohesion that can improve the stability of individual abstractions and stabilise the dependencies between them (hence the reason that these have been termed the Stable Abstractions Principle and the Stable Dependencies Principle).

What happens if you apply these principles consistently and in the large? Above the level of individual classes and packages, across a whole system, the result is a layered architecture whose layers are related by rate of change (the Shearing Layers pattern describes this in detail, relating it to a similar observation about building architecture). Rate of change often aligns with other layering criteria, such as layering of kinds of abstractions, separation of technologies, grouping of developer skills, organisational structure, etc, allowing the same design to be reached and supported by different lines of reasoning.

So, how do you design a system so that it is organised with respect to rate of change? The empirical answer is that you observe the change and respond accordingly, using refactoring as the means by which you let volatile elements bubble up and stable elements sink through the layers. Iterative development lifecycles offer a useful cyclic timeline against which stability can be assessed. Reports on relative stability can be made against source code version history. From a micro-process perspective, Test-Driven Development also offers useful feedback.

It is also possible to be proactive in trying to establish an architecture based on stability. There are a number of established design practices that promote partitioning styles that are already in tune with this idea. For example, separating the things that change from the things that do not is a recurring theme in many descriptions of polymorphism.

The Gang-of-Four's advice to "program to an interface, not an implementation" encourages a style of class hierarchy design that ensures that the root of a hierarchy is as stable as possible. Having only a pure interface at the root of a hierarchy, rather than a mix of interface and implementation, ensures that the root of the hierarchy (and its dependents) removes any instability that arises through changes in implementation at the root level.

This OO approach of separating interface from implementation is normally motivated as an extension of information hiding, which in turn is normally associated with the concept of modular design.

Although David Parnas was not responsible for coming up with the concept of the module, he was responsible for promoting the concept and encouraging a reasoned approach to modularity. If we look closely at the original motivation for information hiding, we find that we come full circle back to the notion of designing in terms of stability. The whole point of introducing separations and boundaries was to deal with "difficult design decisions or design decisions which are likely to change", partitioning so that "each module is then designed to hide such a decision from the others".

So, what then is cohesion all about? It's all about time. Build times, stability over time, and the time taken to understand a piece of code are all related. ®