Feeds

Amazon fine print limits potential credits for cloud outage

Rackspace CTO talks EC2 failure

  • alert
  • submit to reddit

Next gen security for virtualised datacentres

Amazon's EC2 contract promises its infrastructure cloud will provide 99.95 per cent "uptime" over the course of a year. But that doesn't mean the company will dish out credits in the wake of the outage that affected some users for as many as four days, if not more.

Though the EC2 service level agreement says users will be eligible to receive credits if the service doesn't meet a 99.95 per cent "annual uptime percentage" within a particular geographical region, this only applies to users who have spread their applications across multiple "availability zones" – subsections of Amazon's regional services designed not to fail at the same time.

The outage did hit multiple zones in EC2's East Region – served up from at least one facility in Northern Virginia – but it appears that multiple zones were affected for only about three hours.

Amazon has yet provide details about the outage, and many third-party commentators have failed to realize that the service level agreement is more complex that it seems. The availability zone setup continues to cause confusion, in part because people don't actually read SLAs, but also because Amazon has yet to describe how the zones are designed and how they operate.

At 1:41am Pacific time on Thursday, Amazon said with a post to its status page that it was investigating connectivity issues with its Elastic Compute Cloud (EC2) service, which provides on-demand access to processing power across the net. According to one status message, the problem began with a "network event" that caused the service to re-mirror a large number of Elastic Block Storage volumes in the East Region. Elastic Block Storage provides storage that's independent of particular server instances on EC2.

Amazon divides EC2 into multiple geographic regions, and some regions – including the East Region – are divided into multiple "availability zones". Amazon has always said that these zones are protected from each other's outages. "Availability Zones are distinct locations that are engineered to be insulated from failures in other Availability Zones," the company's website reads. But the East Region outage spread across multiple zones.

Some felt that Amazon had broken its promise over availability zones. But the particulars of the service-level agreement add a new twist to this discussion. "'Annual Uptime Percentage' is calculated by subtracting from 100% the percentage of 5 minute periods during the Service Year in which Amazon EC2 was in the state of 'Region Unavailable'," the agreement reads. "'Region Unavailable'...means that more than one Availability Zone in which you are running an instance, within the same Region, is 'Unavailable' to you."

According to Amazon's status messages, multiple availability zones experienced problems for about three hours on Thursday, then the problem was isolated in the zone where it began. John Engates, the chief technology officier at Rackspace, which operates a cloud service similar to Amazon's, believes Amazon is unlikely to provide many credits in the wake of the outage.

"More than one availability would have to go down for you to receive a credit, and you have to be down for a considerable about of time," Engates told us during a conversation at this week's OpenStack design summit in Santa Clara, California. "I really doubt they're pay a lot on credits."

Rackspace's Cloud Servers service does not provide a setup analogous to Amazon's availability zones. The Rackspace service-level agreement guarantees uptime for particular components within each service region, including its network, its data center infrastructure, and individual hosts. The company operates separate data centers in Texas, Chicago, and London.

Judging from Amazon's status messages, Engates says, he believes that Amazon's outage spread across multiple availability zones because the company was using availability zones to mirror Elastic Block Storage data for other zones. "Rather than replicating data within a zone, I think they were replicating between zones," he said. "And it seems that when they had a failure in one zone, traffic waterfalled into the other zones. It's like if there was a fire in a hotel. We would have to evacuate to the hotel across the street, and there may not be enough room in the hotel across the street for everyone to get a room."

It appears that the outage affected only those who were using Amazon's Elastic Block Storage service.

Engates says that Amazon's cloud service and its service-level agreement is set up in such as way that users must ensure redundancy across zones – if not across entire regions. "You have to think about how to allocate your application across multiple resources to maximize that SLA," he said. "Those that did so – NetFlix is one example of a big customer – did not experienced the same kind of outages as people who were very localized. You could put some of the blame on Amazon, but some of the blame on the customer."

Yes, multiple zones were hit by the outage. But Amazon does not promise 100 per cent availability. The company has said, however, that it is unable to restore EBS volumes for some customers. About 0.07 per cent of EBS volumes in the East Region, a status message indicates, "will not be fully recoverable". ®

5 things you didn’t know about cloud backup

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Docker kicks KVM's butt in IBM tests
Big Blue finds containers are speedy, but may not have much room to improve
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Gartner's Special Report: Should you believe the hype?
Enough hot air to carry a balloon to the Moon
Flash could be CHEAPER than SAS DISK? Come off it, NetApp
Stats analysis reckons we'll hit that point in just three years
Dell The Man shrieks: 'We've got a Bitcoin order, we've got a Bitcoin order'
$50k of PowerEdge servers? That'll be 85 coins in digi-dosh
prev story

Whitepapers

Endpoint data privacy in the cloud is easier than you think
Innovations in encryption and storage resolve issues of data privacy and key requirements for companies to look for in a solution.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Solving today's distributed Big Data backup challenges
Enable IT efficiency and allow a firm to access and reuse corporate information for competitive advantage, ultimately changing business outcomes.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.