New and inventive code is transforming your business – and bringing with it new and inventive ways for things to fail

Here's how to ensure biz continuity at your workplace

Got Tips? 7 Reg comments
Disaster in the data center

Opinion Businesses are becoming increasingly digitalised, with operations and customer experiences relying on data and devices being online all the time.

Application architectures and practices are evolving around high-availability and low-touch administration to allow rapid change, while also serving a variety of platforms.

Increasingly, this is putting tier-one business applications, such as enterprise resource planning and collaboration, on the frontline. These tier-one apps are the lifeblood of digital business: of real-time manufacturing, of decision taking, of customer interaction.

So, OK, you’ve digitalised your operation, but what if your system goes down or data is lost? Remember WannaCry in 2017: critical hospital systems locked down by a global ransomware outbreak? Ransomware is on the rise, and WannaCry became the most highly publicised example, but there are plenty of others. The US hospital that paid $60,000 to end a ransomware infection, the UK local authority unable to function following an attack, charities also targeted. And there are many more similar examples that aren’t public enough to make the press.

No less real are outages caused by a range of other factors that include human error and natural disaster, such as extreme weather.

These examples point to the key problem that face organisations moving to a digital-centric way of working: when the applications or the data suddenly become unavailable to the operation, the risk that the business will stop functioning suddenly becomes a real and frightening one.

No matter how many separate data centres you have, nor how many diversely routed connections you use to hang them together, you are in trouble if that data cannot be retrieved and if operations cannot continue unless you have policies, procedures and infrastructure in place that can help you recover immediately. Indeed, more data centres can actually complicate things: the more layers of infrastructure you employ, the larger your number of applications and data sources, the greater the challenge becomes of not just delivering a suitable recovery policy but of doing so in a way that’s simple to administer and affordable.

So what are the rules of business continuity for this digitalised world? To an extent, the new rules are a continuation of the best of existing practices. Digitalisation might be changing business, but many of the traditional concepts of business continuity still apply – particularly the non-technical ones. Your core continuity team must run the crisis process and your divisions must agree to provide staff for the “war room” – the unit at your organisation that coordinates the response to any major emergency.

There are other considerations, though.

The 'R' word

Risk assessments are, and always have been, the core element of business continuity – and this remains, but with a greater emphasis. The key is to conduct regular risk assessments: while the likelihood of a technology failure may not change all that much over time, the potential impact of an issue is likely to grow with the level of reliance you place on the systems.

If you replace more and more manual or paper systems with automated electronic ones, but fail to reevaluate the risk, your mitigating actions – which include the business continuity plan – will be out of date and most likely inadequate. Greater reliance on technology equals a greater need for risk evaluation.

Reliance on suppliers

Service-level agreements are important, and if they don’t keep up then you’re going backwards. Regular reviews of SLAs are therefore needed to ensure they match requirements. With my ISO 27001 hat on, I can’t help referring to sections like A.15.2.1, which says: organizations shall regularly monitor, review and audit supplier service delivery. For the average company this is one of the most crucial aspects of information security, but it applies equally across the entire spectrum of system provision.

The other 'R' word

The more reliance that’s placed on digital systems, the more one must consider the resilience of the design of those systems. Perhaps a dual-server setup was fine when you ran half of your business on a hypervisor platform, but putting more eggs into the same number of baskets requires that those baskets are bulletproof and can survive failure quickly or avoid failure completely. You might, therefore, need to revisit the fundamentals: upgrade to modern infrastructure using all-flash storage or increase the throughput speeds and the volume of traffic that your network is capable of processing.

Assume the technology could die

It should be clear by now that one of the possibilities in a disaster is that some or all of the technology systems may well be unavailable – perhaps for an extended period. And this may not be because there’s been some disaster that wiped out both your primary and secondary systems: in my experience, a far more common experience is for one side of the operation to keel over but for the switch-over of services to the secondary element to fail. You’ll therefore need business continuity capable of responding to this kind of unexpected scenario.

Keep an eye on the new ways for things to go wrong

The development of new and inventive code and algorithms is helping the world in its quest for digital transformation. The downside, though, is it brings new and inventive ways for things to go wrong thanks to greater complexity, new and inventive hacks, and more. The rule, then, is to beware: expect the unwanted will take place when new applications and algorithms hit your infrastructure, and begin to put in place measures to protect against potential exposure.

Risk treatment

Systems are muscles and the sinews of your business, and so the development of those systems has to take place in the context of the business. You must, therefore, take into account the risk requirements of your business as a whole to build the kind of business continuity system that therefore suits your business. For each risk you identify you need to decide how you’re going to deal with it and document the approach for your business continuity strategy.

It’s this concept of risk that leads us on to the area where the rules are of business continuity are really changing in the wake of digitalisation.

As said, digitalisation means the impact of a failure is becoming greater, meaning it’s no longer sufficient to build and deploy systems with just the technical or business considerations in mind and without considering them both - holistically.

How many system managers think real-world business continuity when designing a new data center? Of course they consider factors such as resilience and failover, and they might have a concept of stuff like Recovery Point Objectives. These are indeed elements of business continuity but these elements and the assumptions that comprise the business continuity plan only come into effect once the finished product - the data center - is running and in the event of an actual problem. It’s now that the design meets the real-world and it’s here where the practicalities can trip up planning – those assumptions made purely in those technology or business silos without holistic consideration.

In the digitalised business the concepts of business continuity don’t change much – but where and how they are implemented has. We need to be thinking of how things would happen in a scenario right up-front when we’re designing systems. The operators of the systems need to be conscious of new threats and feed them into the business-continuity world.

In this world, centralised monitoring of a large and digitalised infrastructure means quick response. Only proper centralised management, easily accessible from anywhere – including out-of-band capabilities – lets you access kit that’s the wrong side of a failed WAN link. Configuring devices over the network from a central image gets you back up and running when you’ve replaced the kit in the data room that burned down. And considering the business continuity requirements up front and never losing sight of them in operations maximises your chances of minimising the impact.

DevOps for the Business Continuity world

If you’re starting to think: “Hang about, that sounds like DevOps” – a philosophy that’s making inroads in software development - then you should be, because that’s where we are.

Much like in the world of software, where there is increasingly no cut off between development of software code and those in operations who take over once developers have finished building, so business continuity can no longer be hatched in one part of an organisation for delivery to another for implementation or action. Business continuity can no longer be a post-go-live concept.

It’s a core requirement that needs to form part of the entire lifecycle of systems – from initiation, through development to improvement. In this world, there’s no single, silver-bullet product or service to delivering business continuity, rather it’s a culture. And that culture comes not just from within - from your planning and preparations - but also from without, from your technology suppliers: you must pick the vendors that fit your needs and the business-continuity culture you’re building.

If you don’t, then your business-continuity strategy won’t just stand still, it’ll be moving backwards as the world moves on around it – and that’s bad for business. Your business. ®

Sponsored: Practical tips for Office 365 tenant-to-tenant migration


Biting the hand that feeds IT © 1998–2020