Bayesian analysis, by Pan Pantziarka
Some of the hardest questions to answer in development are about whether testing is "finished": Have we done enough testing, where should we concentrate testing effort, and when do we release the software?
There are usually countless pressures influencing these decisions – with enormous penalties in terms of loss of prestige as well as financial consequences if the decisions are badly wrong – and yet very often we depend on "gut feel" for an answer.
Even when software is passed through a formal testing process, the question of when to stop testing is not an easy one to answer. Does the fact that a component or module has had a lot of defects picked up (and corrected) during testing, tell us more about the quality of the component or the efficacy of the tests?
Given the reality that we can never get the resources required to test as much as we would want, and, just as importantly, that the testing process is itself imperfect, is there anything better than intuition to help developers gauge when software is ready to roll?
One of the things that would help is an objective model of the quality of a package at any given phase of the development lifecycle. Such a model can then be used to predict accurately the number of defects that remain to be discovered at any stage in the development lifecycle. It then becomes possible to base the "when do we release" decision on something other than gut instinct.
This is precisely the task that Paul Krause of University of Surrey set out to do with the Philips Software Centre (PSC) with Martin Neil and Norman Fenton of Agena Ltd.
Using Bayesian Networks, they have developed a general model of the development processes at PSC, which has been applied to a number of different software projects (see the detailed research paper here, together with the references therein). Similar work has also been done at Motorola Research Labs in Basingstoke and at Qinetiq.
Bayesian Networks, also known as Bayesian Belief Networks or graphical probabilistic models, are ideal for tasks of this kind. They are a technique for representing causal relationships between events and utilising probability theory to reason about these events in the light of available evidence.
Set of nodes
A Bayesian Network consists of a set of nodes which represent the events of interest, and directed arrows which represent the influence of one event on another. Each node may take on a range of values or states – a node which represents a thermostat, for example, may have states corresponding to "hot" or "cold", or it could represent different temperature ranges or even a continuous temperature scale.
Probabilities are assigned to each node corresponding to a belief that the states it represents will take on those values. Where a node is influenced by other nodes, (i.e. it has inputs from other nodes), it is necessary to compute the conditional probability it takes on a given state based on the states of those causal nodes.
Bayes' Theorem is used to simplify the calculation of these conditional probabilities. When a node takes on a given state – for example thermostat with only two states reads "hot" – the probability for that state is set to one and the probability for the "cold" state is set to zero. This information is propagated through the network updating the other nodes to which it is connected, resulting in a new set of "beliefs" about the domain being modelled.
Bayesian Networks can be used in a number of ways. Firstly, the structure of the network and the various probabilities mean that it is possible to use them for predictive purposes. In other words, one can say that given this structure and these facts, event x has y chance of occurring. Alternatively, the same network can be used to explain event x took place because of the influence of events y and z. Reasoning can move in either direction between causes and effects.
Applying these principles to software development at Philips, the team created, and linked, Bayesian Networks for every stage in the lifecycle – from specification through to design and coding, unit test and integration. Using an approach pioneered in previous research projects, the sub-networks for each phase were constructed from a set of templates, leading to an approach that Fenton and Neill dubbed object-oriented Bayesian Networks.
The end-result was called AID (Assess, Improve, Decide). The model takes in data about the type of product (number and scale of components, experience of the developers etc), and other data relevant for each phase of the life-cycle and is able to deliver an estimate of the number defects at any point in the process. The network was validated by using historical data for a number of projects and comparing estimated defects with those actually found.
The results have been very encouraging and the AID tool is being further developed so it can be used in a production environment. One other property of Bayesian Networks is that techniques for "theory revision" – or learning from experience – exist, so that data from each project can be used to refine and improve the network.
Many of the lessons learned from the work at Philips – such as dynamic discretisation of probability intervals - have been incorporated into AgenaRisk, a tool which can be used to build software defect risk models. While we are a long way from having such Bayesian models available as Eclipse or Visual Studio add-ins, the work is progressing in the right direction and once the results start to trickle out from research labs and into the wild perhaps the answers to those hard questions won't seem so shrouded in doubt after all.
Next page: Formal methods, by David Norfolk
Next page: Formal methods, by David Norfolk
Well, I've written about Formal Methods before. And yes, we only scratched the surface - but a) I thought that treating one company in reasonable (for some value of reasonable) depth was better than the usual whirlwind tour of companies most people haven't heard of, with a bare para on each one; and b) this article was already about as long as could work online.
But I'm happy to cover more formal approaches and in more detail, if those involved can put their case forward as well as, say, Praxis can (I didn't find Praxis through press channels, I was originally impressed by hearing Rod Chapman speaking at a BCS meeting). And, of course, if enough Reg Dev readers express an interest...
BTW, I do realise that Microsoft has now discovered formal methods (I did mention it) but I'm waiting for the weekly bugfix stream to dry up before I get too enthusiastic.
And I'll check out that URL - thanks.
Ahead of the game?
Well, probably CbyC and Bayesian analysis are still ahead of the game. There are certainly development shops where even test automation would be a radical innovation. And possibly, quality in consumer software isn't that important (except that the games industry has to take bugs seriously - and silly bugs in wordprocessors and spreadsheets can really spoil your day).
But I think it's the "attitude" I'd complain about - not that of "QA Helper", who obviously understands the issues and has made a reasoned choice (not necessarily one that I'd agree with, but I could be wrong) as to what is appropriate in particular environments; but the attitude that says "software has bugs, always has, always will; so let's not worry about it".
Software doesn't have to have bugs in it; so if it does, hopefully someone has done a proper risk assessment for defect removal vs. delivery; but perhaps someone just decided that the customer wouldn't mind; or that the customer prefers low cost to quality. Or perhaps the developers are oncompetant - or. more likely, untrained and/or or badly led.
Unrealistic? Well, possibly, but I mostly remember people regretting not getting it right - and stuff delivered too quickly taking even longer to deliver in the end. I also remember Toyota's success with "Lean" and how competitive British Leyland was in comparison.
Remember, (and I do mention cultural issues) that there is some evidence that removing defects with formal methods is CHEAPER than doing it by testing (although you can't replace testing entirely). But perhaps this is only in fields where quality matters - not testing anything is cheapest of all, if someone thinks that production failures don't cost anything.
Nice article, shame it barely scratched the surface
It's nice to see mathematical approaches being mentioned in the mainstream e-press at last. What a shame that the piece on formal methods concentrated on just one company's system and failed to touch on the many other techniques being used.
SPARK may be great for embedded aerospace systems, but if you are developing a large commercial application, you probably want at least basic OO features, like inheritance and polymorphism. Enter Verified Design-by-Contract, ESC/Java, Perfect Developer and Microsoft's Spec# (yes, even Microsoft is looking at FM these days).
Or maybe you want to prove that your software can't get locked up in states from which there is no way out. Try a model checking tool - choose from FDR, Spin, SMV, SLAM... (that last one is from Microsoft again - maybe they're on to something!).
QA Helper argues that writing a formal specification is expensive and time-consuming. Sure it is - but the formal spec is shorter than the code, and quicker to write. The trick is to use the spec to greatly shorten the coding time. We generate most of our code directly from the specification.
Lot more info can be found at the directory of formal methods maintained by Jonathan Bowen of London South Bank University - http://www.afm.sbu.ac.uk/.