Original URL: http://www.theregister.co.uk/2006/08/31/cohesion_coupling/

Up with cohesion, down with coupling

We all love cohesion and hate coupling, don't we?

By Kevlin Henney

Posted in Developer, 31st August 2006 14:56 GMT

The standard advice on cohesion and coupling is that a design should strive to maximise cohesion and minimise coupling. This is a fine mantra, but, as is so often the case, without a good understanding of what is really intended, it becomes either misguidance or perceived as academic and irrelevant.

A simple characterisation is that coupling is the degree of interconnectedness of parts in a system and cohesion is the degree of intraconnectedness of those parts.

Getting cohesive you want components — in the general sense of the word rather than the more specialised component-based development sense of the word — to be focused and crisply defined rather than diluted with multiple unrelated responsibilities.

For example, the root of an inheritance hierarchy should normally present an interface and represent a responsibility that all subclasses in the hierarchy can fulfil meaningfully. You should try to avoid having optional features in the root interface of the hierarchy that are only relevant to some subclasses but not others: this makes both using and writing classes in the hierarchy awkward.

For hierarchy users, they cannot confidently program to an interface without either feature testing or receiving some kind of "feature not supported" error.

For class implementers, if a superclass or base interface offers features that don't make sense for a particular subclass, either they have to make up some kind of fictional behaviour that sort of makes sense or they have to raise a "feature not supported" error. Either way, it's hard working in and around such hierarchies.

As an aside, you may already recognise this particular example as the principle of substitution (the Liskov Substitution Principle, or LSP, to be precise) for class hierarchies. It is commonly taught in its somewhat abridged sound-bite form: only use inheritance for "is-a" relationships.

It is interesting to note that rather than being a principle separate from maximise cohesion, LSP arises naturally from maximising cohesion, so it can be considered a specific application of maximise cohesion in the context of type hierarchies.

Anyway, what about coupling? You want components to be loosely coupled rather than tightly coupled. If they are tightly coupled, the internal structure of the resulting code base is intertwined, subtle, difficult to comprehend, hard to change, and many other costly and awkward etceteras.

For example, having one part of the code base depend on an incidental data representation decision in another part of the code base is a pain when you need to change the representation, even if the usage is stable. Hence, the common recommendation to keep data representation decisions private is a guideline that reduces coupling.

However, don't get too carried away: low coupling does not mean no coupling. The goal is reduction rather than elimination of coupling: a system with no coupling is, by definition, not a system.

There are many forms of coupling, and some are explicit whereas others are implicit. For example, the inheritance relationship expressed with extends in Java or the file dependency expressed using #include in C or C++ are examples of dependencies that are directly declared and visible in code.

But don't assume that the only form of coupling is the explicit stuff. Many years ago I encountered a team that did not appreciate this implicit aspect of coupling, and drew completely the wrong conclusions about what to do with a C++ header file that contained an enum with all the error codes for a whole program. Bundling all the error codes for a whole system in one place smacks of coincidental cohesion and couples unrelated parts of a system to a relatively unstable component.

There are many different and reasonable ways to cut this dependency knot, should you chose to do so. Hopefully you'll agree that replacing the enum with an int and replacing named constants with magic numbers to represent different errors is not one of them. The reasoning was that because they had eliminated the type and all those pesky named constants, there was no longer any need to have a header file, which meant, by definition, that different parts of the program were no longer coupled to a common header. Hey presto! Or not.

Instead, they had a program littered with magic numbers that were implicitly coupled to one another by ad hoc usage and convention and they had thrown away the ability of the compiler to perform any checking: "OK everyone, team meeting, listen in. Please try to be consistent in your use of the number three to indicate file writing errors (it was two last week, but two is now for file reading errors), 17 to indicate dropped connections...".

The point about the header file is that it made a conceptual coupling that existed within the program explicit and, therefore, visible. Brushing it under the carpet did not reduce the coupling, it just made it harder to see and deal with. So, taken by itself, the header file was highlighting the problem rather than necessarily being the problem. That said, given some of the other design decisions the team took and the code I saw, perhaps the real problem was that they had access to keyboards. But I digress.

Now, armed with a reasonable and reasoned understanding of what's meant by maximise cohesion and minimise coupling, what is the relationship between coupling and cohesion?

The received wisdom is quite simple: when one goes up, the other goes down; tight coupling and weak cohesion go together, as do loose coupling and strong cohesion. Most of the time that correlation holds, but there are a number of well-known design examples that are not quite as straightforward.

Consider, first of all, the Composite design pattern. This design allows code to treat individual objects and groups of objects the same way through a common interface. Instead of having duplicate code where individuals and groups are handled in similar ways, code only has to be written once. Instead of explicit selection based on a runtime type check, such as instanceof in Java or dynamic cast in C++, polymorphic dispatch determines the right code to execute. So, instead of coupling to the specific types of individual objects and of groups of objects, usage code is coupled only to the interface.

From the perspective of usage code, lower coupling is typically a consequence of using a Composite. What about cohesion? Here the case is not so clear cut. Looking at the common interface implemented by classes for individual objects classes and for groups it appears that the common operations are not always, well, common.

In most Composite implementations the common interface provides a way to traverse groups of objects, whether using Iterator objects or via callbacks from Enumeration Methods, optionally expressed in terms of Visitor to separate callbacks on different object types.

This is all very well for groups of objects, but what does it mean to traverse an individual object? Not much. You have to invent a behaviour, such as doing nothing, returning null, returning a Null Object, throwing an exception, or whatever is appropriate for the language and iteration style of choice. In other words, to achieve lower coupling, some cohesion has also been traded in.

And speaking of iteration, the common-or-garden Iterator pattern offers another example of a trade-off between coupling and cohesion.

A collection that holds its elements but doesn't allow you to traverse them is unlikely to prove popular. There are many ways to offer traversal, but if the caller needs to be able to know the position of elements in some way there are essentially only three general designs that keep the collection's internal representation hidden from the caller.

First, it is possible to use an index into the collection. The advantage of this is that it appears simple and there is no coupling to the collection's data structure. The main disadvantage is that it is only practical if indexing is a constant-time operation, which it won't be for linked data structures.

The second approach is to internalise a cursor within the collection. This cursor can be moved efficiently to and fro by the caller without revealing the internal data structure. However, it can't support more than one pass at a time — which fails in interesting ways for nested traversals — and for collections that are supposed to be read-only you have the additional question of whether or not you consider the cursor to be part of the collection's state (it turns out that you have problems either way).

And the third option is to introduce an Iterator object.

An Iterator object represents a clean separation of concerns, resulting in two kinds of objects, each with clear responsibilities: objects whose responsibility is to iterate, with an interface to match, and objects whose responsibility is to collect, with an interface to match. The trade-off here is that although the collection's data structure is not revealed to the world, it is revealed to the Iterator, which is closely coupled to it. ®