Original URL: http://www.theregister.co.uk/2005/12/11/cohesive_code_packages/

Let's get cohesive

Biting into sound principles

By Kevlin Henney

Posted in Developer, 11th December 2005 11:19 GMT

Comment It is entirely possible to write applications as a monolithic slab of undifferentiated code. Indeed, for some this appears to approach an art form, with a stream-of-consciousness code style not dissimilar to Jack Kerouac's spontaneous, amphetamine-fuelled writing style.

Caffeine is the more likely the drug of choice in programming, and the abandonment of form might be seen as rebellion against the discipline (perhaps, tyranny!) of packages, files, classes, subroutines, table normalisation, testability, teams, sustainable development, and so on. But writing code in this state is not where most of us want to be and, unlike Jack Kerouac's writing, there's little to enjoy in such code — except perhaps a shallow flush of schadenfreude when the code is somebody else's problem.

Most of us prefer to partition code into smaller units — packages, files, etc — each of which is intended to be more easily comprehensible and easier to work with than the monolithic alternative.

This is a human consideration: a concern with the developmental quality of code rather than its runtime behaviour. Although the "principle of locality" is important in the design of processor caching and virtual memory systems, it does not (yet) confuse or upset machines when programmers choose to define APIs of a thousand functions in a single header file; or when code is deployed in DLLs with apparently arbitrary contents (Graham Lea reported Microsoft doing this here). But the fact that an approach is possible doesn't make it effective or exemplary.

This is where cohesion enters the picture. Cohesion is in some sense about "things sticking together". The oft-stated principle here is to "maximise cohesion". Of course, maximising cohesion does not imply just throwing loads of stuff together to see if it sticks — down that path lies the urban sprawl of <windows.h>, which spans neighbourhoods of, at best, coincidentally related features. There has to be a reason that things are either put together or separated, and, preferably, a sound and useful reason. In other words, cohere coherently. It's this coherence that makes a modular partition easier to work with (read, write, discover, test, evolve, explain, etc).

Sound-bite principles

On its own, saying, "maximise cohesion" isn't enough to inspire or educate programmers, whatever their level of experience. This sound bite needs further clarification, some good examples to back it up; and a few counterexamples as lighthouses to warn of known trouble spots.

But useful as counterexamples are, the problem with learning by example is the disproportionate representation of counterexamples in published APIs - and not all of them are recognised as such. Keeping in mind that people are most influenced by what they see around them, a typical programmer's perception of cohesion and quality of cohesion is more likely to be shaped by the APIs he or she uses all the time, than by sound-bite principles.

The meaning of "maximise cohesion" is often clarified as "be or do one thing well". This clearly distinguishes between the relative cohesiveness of realloc and free, both found in the standard C library. free clearly does one thing well: it deallocates the result of a previous memory allocation.

By contrast, depending on how you call it, realloc behaves like an allocator (malloc)... or a deallocator (free)... or as a reallocator of memory (which is what you'd expect, given the name). This surprisingly inclusive portfolio of behaviour inspired Steve Maguire to dub it the "one function memory manager"

On the other hand, it is all too easy to weasel-word your way past a simply worded sound-bite recommendation: if the goal is to "be the Windows OS API", then <windows.h> could be said to meet this goal pretty well, and could therefore be considered "cohesive". However, it aggregates a slop bucket of features that are generally unrelated and unlikely to always be used together — COM, memory management, file I/O, DDE, multithreading, windowing, etc. There is no simple sense in which <windows.h> could be said to cohere, but the unintentional lesson that a programmer may take away from working with it is to "put everything in your application/subsystem/library in the same place".

To sharpen our concept of cohesion a little more, we can draw on another criterion, which is that of common use: if you're going to use one feature of a cohesive module, you should be just as likely to use another. In other words, move things together that are used together and separate those that are not.

This is not a question of whether or not an application as a whole would or would not use a set of features together; but whether one of its parts would. You may also recognise this co-dependency approach by another name: normalisation. An application may use both threading and file I/O functionality, but it is not inevitable that a class in that application that depends on using threading, will also depend on using file I/O. Threading and file I/O stand out as obvious and separate candidates for common use, and so each represents a separate cohesive concept.

By contrast, a not uncommon habit on C projects is to put all the typedefs in one header file ("typedefs.h") or all the constants in a header file ("constants.h"). While it is clear that all the features in the header file have something in common, it is not clear that this is actually useful except in the most trivial sense.

This question of common use (also documented, a little misleadingly, as the Common Reuse Principle) highlights another consideration as well: the criteria used for separating or combining parts should be visible and positively defined.

Consider, by way of counterexample, the java.util package, which appears to contain as ragtag an assortment of unrelated classes as you are ever likely to find. What is the common theme that binds the collection classes together in the same package as, for instance, the calendar facilities? It isn't that they are utilities, because this theme cannot be applied consistently or meaningfully — everything in software is ultimately a utility to something else.

The implication of the name util fails to explain just why the many collection classes are merged in a package of unrelated utilities that clearly do not measure up against a yardstick of common use; but the BigInteger and BigDecimal utility classes get a package — albeit inaccurately named — pretty much to themselves.

By way of contrast, the criterion used for the partitioning of the System.Collections namespace in the .NET library is clear. The criterion used to define the content of java.util is a somewhat accidental one that can be summarised as "it contains utility classes that are not defined in other packages".

Yes, it's some kind of design reasoning, but one that's difficult to employ constructively. But this is not (just) a rant about things named "util" or similar. This problem of arbitrary cohesion is found in libraries and applications that many programmers commonly work with. The presence or absence of a particular feature in such libraries and classes is more a matter of lottery than of reasoning. For example, the C library's <stdlib.h> header can be characterised as "holding all the standard library features that are not already defined in other standard headers (except for the couple that are)".

A lot of learning is from example. Lessons are absorbed, often passively and unconsciously, from our environment. For programmers, this environment includes the standard APIs they work with, which is why we should value a good understanding of their developmental strengths and weaknesses, and focus on more than just their functional behaviour. If it has sufficient functionality, it is always possible to work with uncohesive code. But effective design is more than just affording the basic "possibility of use": ease of use, errors in use, cost of use, delight in use, and so on, are all part of the picture. ®