Original URL: http://www.theregister.co.uk/2007/07/28/what_are_your_units/

One programmer's unit test is another's integration test

Word games

By Kevlin Henney

Posted in Developer, 28th July 2007 08:02 GMT

The question of what units you are working with is one that will at one time or other have plagued anyone who studied a science or a branch of physical engineering.

Teachers go to great lengths to make sure students remember to specify their units. It is not enough to say that the answer is 42. Forty-two what? 42 metres? 42 electronvolts? 42 furlongs per fortnight? Without a clear understanding of what units are involved, certain results and claims can be meaningless, misleading or simply expensive.

And so it is with software testing.

When it comes to programmer testing (the slightly ambiguous term that is used to describe the act of programmers testing, rather than the act of testing programmers) there are normally three categories that questions fall into: why, how, and what.

The question of why programmers should test their own code should not really be a question that needs asking, but some folk seem to think that all forms and levels of testing should be handled only by people whose sole role is testing. This view is founded on various misconceptions about the roles, nature, and economics of software development. It is nothing to do with agile, fragile, or any other kind of development - it is simply about professional responsibility.

The question of how to test covers a range of questions from the use of automated testing to how testing relates to other development activities, which results in a distinction between test-driven development and development-driven testing. However, the question of what to test is largely independent of this question of approach, and is the question we're going to focus on here.

There are many things that can be tested about a software system, so what should be tested? There is a surprisingly simple answer to this question: anything that is considered significant, directly or indirectly, for the successful development and acceptance of the software. If performance is a critical feature of a given application, there should be tests for performance.

If scalability matters, there should be equipment dedicated to that cause. If usability is supposed to be the unique selling point of a shrink-wrapped application, an empirical approach founded on usability testing should be employed rather than the usual usability conjecture. If code quality matters, there should be tests that are code-centric rather than system-centric, as well as static analysis and peer review. And so on.

These different kinds of tests differ in their degree of automation, their scale, and the roles responsible for them, but they all share a common aim. Put another way, a test is a way of demonstrating that something is important. It shows you care.

Conversely, the absence of a particular kind of test or measure indicates that a particular aspect is not seen as critical. For applications that are not performance critical, having a battery of performance tests would not offer a particularly useful return on investment.

We can also use this way of thinking about tests as a way of deconstructing a project's actual priorities, as distinct from its advertised priorities. If someone on a project states that meeting customer requirements is the most important thing, but there are no tests defined with respect to the requirements, then meeting customer requirements is not actually as important as they would like you to believe.

Returning to the question of programmer testing and what to test, if we believe that the quality of class interfaces, class implementations, class and object relationships, and so on matter, we should demonstrate this care and attention with appropriate tests. This care and attention is normally described in terms of unit tests.

However, there is a (not so) small matter of terminology we ought to clear up. In spite of the relative maturity of testing as a discipline, there is no single or standard accepted meaning of the term unit test. It is very much a humpty-dumpty term. But this is not to say that all definitions are arbitrary and any definition will do. When it comes to unit test what distinguishes one definition from another is its utility. If a definition does not offer us something we can work with and use constructively, then it is not a particularly useful definition.

One circular definition of unit test is any test that can be written using something that claims to be a unit-testing framework. Such a definition is often implicitly assumed by many users of the xUnit family of frameworks – it's got unit in the name, so it must be a unit test, right? This means that a unit can be pretty much anything, including a whole system.

While we can say that this is not an unreasonable interpretation of the word unit as an English word, it does not give us a particularly useful definition of unit testing to work with in software. It singularly fails to distinguish code-focused tests from external system-level tests, for example.

Kent Beck tends to treat any code-focused test that is smaller than the whole system as a unit test, which at least distinguishes between whole-system tests of the software and internal tests of the code, but accommodates perhaps coarser "units" than would fit many people's intuitive notion of what a unit test ought to cover.

At the other end of the scale, there are definitions of kinds of tests that identify many different levels of granularity and scale. The problem with too many levels is not only is there even less consensus about the terms, but the distinctions are, in practice, either not useful or not consistent.

For example, one scheme for classifying tests differentiates unit testing, component testing, integration testing and system testing. This may or may not be useful depending on what we mean by unit versus what we mean by component. If a unit is taken to be a function or a class in strict isolation from other classes, and component simply means a combination of units, we have a definition, albeit not a very useful one. There are very few classes that are not built in terms of other classes, so everything becomes a component test and the idea of a unit test is irrelevant and impractical, except for the most trivial classes.

If, on the other hand, a component is taken to be a DLL or other well-defined and deployable unit of executable code, as is commonly meant in the context of component-based development, then that clearly distinguishes between units which are classes (and may be built from other classes that are also unit tested) and components that contain them and can be independently deployed.

While this definition makes sense for a typical .NET or COM project, it doesn't necessarily make sense for other platforms and systems. So, although there are conventions that locally give concrete meaning to these terms, there's enough variation across development platforms and application architectures to mean that the fine distinctions are perhaps too fine for us to use this scheme more generally. The variety of meanings associated with unit test is rivalled only by the diversity of meanings component has gathered in the context of software development.

Of course, in the middle of all these word games, there is an obvious and pragmatic question waiting to be answered: so long as we're testing, does it even matter what we call our tests or what kinds of tests they are? Inasmuch as the test is more important than the unit, it doesn't matter. On the other hand, if we care about communication and the broader implications of our testing styles, it does matter. Without a consistent vocabulary, it's easy to talk at cross purposes: one programmer's unit test could be another's integration test. The meaning of the terms has implications not only for testing but, more importantly, implications for design. A different definition can suggest and encourage a different style of design.

In other words, what we mean by unit test can have architectural implications. If we are looking for loosely-coupled code, a definition of test granularity should ideally help us to achieve this goal.

Here is a simple definition that I have used for a couple of years that seems to offer a simple litmus test for whether or not something is a unit test, and which has some other useful consequences to boot: A unit test is a test of behaviour (normally functional) whose success or failure is wholly determined by the correctness of the test and the correctness of the unit under test.

Consequently, the unit under test has no external dependencies across boundaries of trust or control, which could introduce other sources of failure. Such external dependencies include networks, databases, files, registries, clocks and user interfaces. In other words, the unit in question can be isolated at a level below the whole system and its interactions with the environment. By implication, an integration test is one that does not fully satisfy this definition, where reasons for failure could be environmental, but it is below the level of the whole system.

This brief definition is also largely compatible with the definition offered by Michael Feathers. The architectural implications for both views are pretty much the same: the more loosely-coupled the code of a system is, the more of it can be described and checked in terms of unit tests. An architect cannot reasonably support the claim that a system is loosely-coupled when it is not unit testable.

Many of the "unit tests" being written in projects today are not unit tests by the definitions we've homed in on, but they probably should be. The tests are consequently harder to write and slower to run – they touch the file system, network, etc.

In other words, unit testability offers feedback on the quality of coupling in a system. However, being able to write unit tests against such code does not come for free, because the design needs to be changed.

Techniques such as interface extraction and dependency inversion [pdf download] have an important role to play.

It is not that the coupling in such systems is necessarily messy – for example, filled with dependency cycles between packages – it is that even in a conventionally layered system there's more transitive coupling across layers than is strictly necessary.

Although the code in one class may depend only on the public interface of another class, the code base itself depends on the other class's private section and those dependencies in turn until the dependency horizon is reached. Introduce a pure interface at the right point and the chain of dependencies is broken. This reduction in coupling simplifies many things. For testing it becomes possible to mock or stub, as appropriate, the dependencies that cross boundaries of trust and control without undue effort. ®