How can you possibly test modern software fully?
Pairing up for fun and profit
The common assumption about software testing is that "more is better", and testing all the possible states and variable combinations guarantees you will find all the bugs.
In the real world, however, there is not enough time or enough testers to test every combination of every variable. Not all bugs will be found, making quality assurance a risk management discipline. How can you validate that your product is ready to ship within reasonable time and cost parameters? In other words, how can you manage the risk of not testing everything? One solution is to use structured testing methodologies, supported by proper tools, which help you quantifiably manage this risk.
Practically speaking, the role of quality assurance is to reduce the risk of these bugs ending up in the final product. Software complexity puts a huge burden on QA teams, which are typically much smaller than the development teams writing the software (it's even worse if there isn't a QA team and developers take on the role part time). It is also very easy for one developer to write a small amount of code that requires a significant amount of testing to ensure it functions properly in all situations.
For example, you have to test a dialog box with three drop-down lists to see if any of the combinations cause the program component to fail. The first list has five options, the second has eight options, and the third has three; see Figure 1:
To determine all the possible combinations, you can create a matrix like the following (Figure 2):
As you continue adding combinations, you discover that 120 test cases are required to cover all the possible combinations. You can also determine the number of combinations by multiplying the number of values available in each option (5 x 8 x 3 = 120). If each test takes around two minutes to perform, you are faced with about 4 hours of testing on a simple dialog box. What if you need to test 100 dialog boxes? What if some dialog boxes contain 15 options instead of three?
Now take the concept of complete coverage a step further and consider environmental variables such as operating system, database, and hardware components. How do you ensure you find a bug that occurs only when the application is running on Windows XP and is using MySQL without testing all the possible OS and database combinations?
These examples demonstrate how quickly complete coverage becomes unmanageable. Luckily, you can find most bugs without testing all the combinations. The simplest bugs are single-mode faults, which occur when one option causes a problem regardless of the other settings. For example, a printout is always smeared when you choose the duplex option in the print dialog box regardless of the printer or the other selected options.
Another type of bug is one that occurs when two options are combined - the printout is only smeared when duplex is selected and the printer is a model 394. These are called double-mode faults. Finally, multi-mode faults, which occur when three or more settings produce the bug, are the types of problems that make complete test coverage seem necessary.
However, complete coverage is usually not necessary. A study by Telcordia Technologies found that "most field faults were caused by either incorrect single values or by an interaction of pairs of values" (Cohen, et al. 1996). Another study of the software in medical devices showed that only three of the 109 failures resulted from the combination of more than two conditions (Wallace, 2000).
If you have limited time and resources, you want to find the most common bugs and those that present the highest risk. Suppose the printer error only occurs when the operating system is Windows, the print option is set to duplex, the print quality is draft, and the collate option is not selected. Is it worth your time to find that bug? Does the bug present a big enough risk to the user or application that it will even require a software fix?
Except in the rare cases where life and death are at stake, you can achieve a statistically acceptable level of quality by testing less than 100 per cent of the combinations. One approach to doing this is called pair-wise or all-pairs testing.
Sponsored: RAID: End of an era?