Related topics
  • ,
  • ,
  • ,

Green Bar of Shangri-La

On unit testing and code coverage

head shot of Matt Stephens smilingThe Agile Iconoclast How often have you encountered variations on this comment: “I’ve made a small change to the code; now I run the tests… green bar!

Great, my change didn’t introduce any new bugs.” While a comprehensive unit test suite can increase your confidence that you’re not introducing any new bugs, it isn’t a guarantee; and thus, that confidence is potentially misplaced.

As far back as 1984 and beyond, it was widely recognised that software is too complex to be guaranteed error-free:

“If the objective of testing were to prove that a program is free of bugs, then not only would testing be practically impossible, but it would also be theoretically impossible.” - Boris Beizer, Software System Testing and Quality Assurance (International Thomson Publishing, 1984).

Even NASA, which places a huge emphasis on creating zero-defect code using cleanroom software engineering, get it catastrophically wrong from time to time.

Complete code coverage is simply an unrealistic expectation for testing. Code coverage doesn't just mean that every public method has a unit test – there are plenty of open source code coverage tools for that – it means that all possible states, paths and permutations are covered and tested for. The sheer number of permutations that a reasonably complex program may run through is virtually infinite; and expecting a test harness to cover all eventualities would be about as silly as trying to run the London Marathon dressed as a gorilla. (As if!)

Automated unit tests do get you closer to that unattainable goal, but, to paraphrase Douglas Adams, “any finite number compared with infinity is as near to nothing as makes no odds.” (Adams was of course arguing that the population of the universe is effectively zero, and therefore anyone you meet is merely the product of a deranged imagination – although he never states whose deranged imagination).

Like many other Java developers, I’m an avid JUnit user. Whether you see JUnit as a design tool or a regression testing tool, it adds a certain level of rigour to your development. However, it isn’t the holy grail of testing by any means; and it’s easy to fall into the trap of seeing JUnit’s green bar as meaning “all’s well” (as in, “this program is free of bugs”) instead of “the finite junctures of software behaviour that I’m testing for have passed the relatively small number of tests that I’ve written”, which is what the green bar really means.

In other words, a unit test framework doesn’t provide a formal proof that your program is bug-free: this is probably why XPers operate on warm fuzzy feelings that their software is correct, rather than unattainable proofs.

This isn’t so much of an issue if unit tests are used merely for what they’re intended for: as a programmer’s lightweight safety net. However, if unit testing is relied upon as a method of proving that your software is defect-free, then the approach is flawed. If you’re using a test-driven, evolutionary design approach, your reliance on unit tests is much higher than with other approaches to design: it’s supposedly okay to go ahead and make changes to the system because the unit tests will catch the errors for you.

The green bar is like a drug: it lures you in. When you’ve put the effort into creating a comprehensive test suite, you’ll desperately want to believe that the green bar means “that change I just made is guaranteed not to have introduced any bugs.” Don’t be suckered into its green-eyed charms.

Unit tests catch only the bugs that you have anticipated; or those that you’ve discovered, fixed and written a new test for (i.e. a regression testing framework). Not all code can be unit tested, at least not easily, e.g. asynchronous/event-driven or multi-threaded code. Unit tests by their nature are fine-grained, testing for minutiae on individual method calls. But an effectively atomic operation (inputprocessresult) may be at a broader level, incorporating methods across multiple classes and even across multiple threads. A change that breaks code at this level may still result in a “green bar” because, within a single-threaded test environment, the individual functions didn’t fail.

This isn't a case against unit tests by any means: they’re an important part of any software developer’s armoury; and the existence of well-written unit tests is a good sign that the team is taking their development seriously. But this article could be considered to be a cautionary note on over-reliance on unit testing, or on the mistaken belief that unit tests can prove that a system is bug-free. ®

Sponsored: Today’s most dangerous security threats