Whoops, my cloud's just gone titsup. Now what?
No email? No CRM? No Daily Mail iPad edition? You need a plan
“We apologise for the disruption. We have identified the cause and are working to restore the service as quickly as possible.”
Attempting to log onto your cloud service and being faced with a message like that is guaranteed to strike fear into the heart of anybody that has trusted all or just part of their company's CRM, email or A.N. Other piece of the critical IT infrastructure to some hosted or software-as-a-service provider.
And yet that is precisely the message that greeted Adobe customers logging on to its Creative Cloud service back in May. Adobe blamed "database maintenance activity" for causing a massive outage of the service, which went down for more than a day. The outage even prevented the interactive edition of the Daily Mail - Daily Mail Plus- from appearing.
The Daily Mail is one of Britain's best-selling newspapers and its cousin online an astonishingly successful website, both owned by the success-hungry Associated Newspapers for whom failure is not an option. It delivered Daily Mail Plus for Amazon, Android and iPad devices in February 2013 precisely to emulate the success of the printed edition and the website version, and it did so not without a little cooing.
Associated had used Adobe's publishing suite with the tablet edition published using Creative Cloud - Adobe's suite for graphic design, video and photography, web development and cloud services tailor-made for media types. Accounts are managed through Adobe online log-ins with the whole pile running yup - another cloud: Amazon's AWS.
The result was described as Daily Mail "re-imagined" on the tablet.
Just over a year later, however, when the Adobe log-ins and the online publishing suite froze, Associated was left red-faced as readers got locked out of their accounts and advertisers went bereft of an audience.
Associated is a high-profile victim. Outages have taken down lesser names more often. With each outage of Salesforce or Office 365, the companies' user forums quickly become clogged with painful stories of entire organisations without email or customer data for whole days at a stretch. Productivity is lost along with businesses as companies grope in the dark or re-discover what life was like before the browser.
Despite service providers pushing the reliability of their services, outages are a very likely reality for those using cloud services. With household name players including Adobe, Facebook, Microsoft, Salesforce and eBay all suffering “events” in recent months, experts warn that the chance of being affected by an outage is something you need to plan for, particularly as adoption of cloud services gains traction.
With cloud computing accepted by many as a natural evolution of IT service delivery and poised for explosive growth over the next two years, lawyers are ramping up their focus on the contractual considerations thrown up by the possibility of outages.
Richard Nicolas, an IT partner at law firm Browne Jacobson who also lectures on contract and IT law, warns that as adoption of the cloud moves from toe-dipping in relatively peripheral applications such as CRM to more mission-critical systems, planning your outage survival strategy isn’t something that can be left to chance.
It's an outrage: Adobe's Creative Cloud outage swallowed Daily Mail for tablets
“With the public cloud, your protection in the event of an outage is minimal," Nicolas told The Reg. "But they’re usually quite cheap services. You really get what you pay for. There may be SLAs but often they contain a lot of get-out clauses including force majeure, but can be expanded to cover a wide range of issues that make it very difficult to complain for an outage."
Prevention is, as the saying goes, better than cure. Performing due diligence on suppliers is vital to taking as much risk as possible out of the equation. Cloud providers are increasingly following the example set by Microsoft on its Office 365 service, publishing quarterly stats on its availability. It’s a useful guide. Bear in mind an SLA of 99.99 still means 52 minutes of downtime in a year.
As the industry matures, experts predict that service will increasingly become a differentiator for forward-thinking cloud providers struggling to compete on price. We’re likely to see some suppliers beefing up their SLAs as they grapple to instil greater confidence among customers by allowing them to protect against the financial risks associated with the loss of IT services.
And about time too: 80 per cent of companies perceive current SLAs as insufficient for their needs and say they fail to address the risks of moving and managing applications into the cloud, according to research from Compuware. Nearly three-quarters said they believe their cloud providers could be hiding problems at an infrastructure or platform level that impact on the performance of applications.
For customers faced with an array of cloud service options, investing time and money in product evaluation will pay off, according to Compuware's director of cloud solutions Michael Allen.
“Set up your own benchmark with three or four different Content Delivery Network [CDN] providers - but don’t put users on it," Allen said. "Apply scale to the app and get the geographic coverage as well. Allow a month for testing. The longer you evaluate, the more accurate a picture of performance and availability you will get.”
When it comes to surviving the worst, having a business continuity and disaster recovery plan is key. Most resilient business continuity will involve multiple availability zones and backups, ideally through secondary providers.
Connectivity improvements driven by fibre optic rollouts will offer companies a long-awaited lower-cost alternative to a lease line for back up purposes. While investing in two lease lines from different suppliers might sound like a failsafe option, bear in mind they may be carried on the same route, and a digger through the pipe won’t leave you much in the way of options.
BT offers diverse fibre carried on two different routes but the cost-prohibitive nature of the service means it’s only really an option for the businesses with the fattest pockets.
People, not process, will help get you through
Bear in mind, too, that a manual workaround may be perfectly adequate for the duration of the outage. Paul Burns, national technical director at IT services company TSG, told The Reg he believes that we’ve become so reliant on IT we tend to forget that an automated failover to something else may not necessarily be the best solution.
A disaster recovery plan is only the start: communicating it, educating staff and running regular simulations – every six to 12 months depending on how critical the service is to your business - are all essential, too. “Often SLAs only kick in when you put in the call to the service provider to it’s about knowing how to contact the service provider and making sure you have concrete evidence about the fault,” Allen said.
“Knowing when to flick the switch to your back-up is one of the most difficult decisions to make, particularly as you’re unlikely to get much reliable information from the cloud provider about when things are likely to be back up and running,” Burns added.
The skill-sets needed to manage business in the cloud are a potential stumbling block: not only will you need to understand the reputational risk of services not being available but you'll also have to manage the commercial and service delivery aspects of the cloud. Bloor Research reckons you need what it calls a collaborative approach to resolving problems. Bloor Research associate analyst Kevin Borley said: “It’s no good saying: 'OK Google, have all my apps’ without updating and rethinking your disaster recovery.”
Companies also need to do some corporate soul-searching to work out if the cost of guaranteeing resilience balances the delivery requirements of that service. Customers should consider how quickly they need to be back up and running so as not to negatively impact the business.
Needless to say, the implications of downtime can be serious – not just in terms of direct costs to the business, but also the financial knock-on effect of services being unavailable. According to the Technology Business Management Council, which develops best practices for measuring the cost of IT including cloud services, understanding cost measurement is essential not just to proper contingency planning but to gaining increased leverage in negotiations over compensation if things do go wrong.
No customer data? Better check the Salesforce admin portal
“You need a system in place that is as transparent with the cost and the performance of a cloud service, so if there’s an outage they get an appropriate amount of compensation,” Council founder and president Chris Pick said.
Few agree that compensation is an avenue worth pursuing. While promises of compensation by suppliers may give you reassurance about the reliability of their service, in reality it’s all rather theoretical, legal expert Nicolas warns. Public cloud providers will have an absolute exclusion for all types of loss and a cap on liability that is usually very low and normally relates to the amount of time the service is unavailable.
Adobe pledged to consider compensating customers on a "case-by-case basis", but it's not known which customers were refunded or with which amount.
For public cloud services, typically you’ll be entitled to one day of payment back for every hour the service was down to a maximum of 30 days. Those using hybrid and private cloud services might be able to negotiate a better deal in terms of the limits of liability.
If there are no exclusions in the contract, you could potentially recover direct losses from the breach which may include a certain amount of loss of profit, compensation for lost staff productivity, and even reimbursement of compensation you’ve had to pay your customers. “Even then it’s very unlikely that a supplier would agree to cover your loss of data,” Nicolas warns.
And while SLAs may compensate paltry sums for loss of access to the service, the cost of reputational damage and the intangible loss of business could reach into millions of pounds.
Outages undoubtedly inflict untold misery on those customers left floundering.
Still, service providers are working on their reliability and the biggest names in the game (Amazon, Apple, eBay, Facebook, Google, Microsoft and Yahoo!) invested more than $27bn on data centres in 2013, a 30 per cent increase over the previous year, according to 451 Research.
Also, use of cloud computing services is expected to grow as customers turn off their own servers in exchange for something they pay for as a service.
With that in mind, it's worth thinking of cloud services as if they were utility providers. You probably wouldn’t even try to sue an electricity supplier if you had a power cut. But you would expect the power to come back on and if it was that critical to you, you’d have laid on some back options in case of an outage. ®