Red rag, meet bull: The software resilience gamble
This topic really got you going
New research alert You, the fine Reg readers, recently regaled us with the gory details of your application failures - and it ain't too pretty. It turns out that a large majority of you find business is disrupted by app failure way too often.
Of the 1200+ readers who took part in the research, a whopping 84 per cent said their business suffered disruption caused by application failure at least once a quarter, with 33 per cent complaining of the same once a month and 24 per cent once a week.
But not every application failure is catastrophic, and it’s important to look at impact as well as frequency. The majority of failures simply result in a degree of user inconvenience. However, incidents with more serious consequences occur more frequently than most people might imagine. For example, one in five organisations confess to suffering tangible business damage from application failure on at least a quarterly basis.
Size of business has little impact on frequency, though different availability hotspots do exist. It was interesting to see that email availability is the bugbear for smaller businesses. That’s not funny if, like Freeform, email is a crucial lubricant to doing business.
So from a business perspective, it makes for dismal reading. No business would consciously sign up to this level of application failure.
Judging by the number and length of open comments we received from you, the work being caused in this area fosters a fair degree of frustration among IT professionals.
So how is this burden on the IT department being generated?
The first and obvious culprit is ‘stuff just fails occasionally’. Is resiliency, then, a well-worn topic within the software development lifecycle? Nope. It doesn’t get a look-in in most organisations when scoping and budgets are laid out. Essentially, ‘insurance’ (think of things like automatic failover) isn’t given much thought until after things have fallen over.
What else is adding to the frustration? We picked up a lot of anecdotal evidence that operational IT feels largely ignored during the software development lifecycle. So it ends up managing applications which are not ‘designed for operations’ and represent a risk, both to the company and to their own workload.
At the coal face, we got the impression that the SLAs in play in many organisations aren’t worth the pdf they’re written on. Neither is the monitoring. Why else would 76 per cent of respondents tell us they don’t get enough warning of problems?
On the upside, the data showed very clearly that minimising exposure to failure can be achieved through a combination of good process and appropriate technology to provide system resiliency and/or rapid recovery. A structured approach to defining and specifying application software projects needs to include input from the right people (ie you lot); Resiliency and availability need to be considered early in the project lifecycle, and explicit investment in appropriate fault-tolerance and recovery solutions can all have a significant impact on reducing the frequency of disruption due to system failure.
Sadly, there’s a significant gap between where most organisations are and their ideal position. The evidence, however, points to a need for some practical but fundamental changes which IT can drive to help businesses take the gamble out of software resilience.
Get your mitts on the full report right here.
As usual, feedback very welcome. ®
Sponsored: RAID: End of an era?