Feeds

Amazon fine print limits potential credits for cloud outage

Rackspace CTO talks EC2 failure

  • alert
  • submit to reddit

Internet Security Threat Report 2014

Amazon's EC2 contract promises its infrastructure cloud will provide 99.95 per cent "uptime" over the course of a year. But that doesn't mean the company will dish out credits in the wake of the outage that affected some users for as many as four days, if not more.

Though the EC2 service level agreement says users will be eligible to receive credits if the service doesn't meet a 99.95 per cent "annual uptime percentage" within a particular geographical region, this only applies to users who have spread their applications across multiple "availability zones" – subsections of Amazon's regional services designed not to fail at the same time.

The outage did hit multiple zones in EC2's East Region – served up from at least one facility in Northern Virginia – but it appears that multiple zones were affected for only about three hours.

Amazon has yet provide details about the outage, and many third-party commentators have failed to realize that the service level agreement is more complex that it seems. The availability zone setup continues to cause confusion, in part because people don't actually read SLAs, but also because Amazon has yet to describe how the zones are designed and how they operate.

At 1:41am Pacific time on Thursday, Amazon said with a post to its status page that it was investigating connectivity issues with its Elastic Compute Cloud (EC2) service, which provides on-demand access to processing power across the net. According to one status message, the problem began with a "network event" that caused the service to re-mirror a large number of Elastic Block Storage volumes in the East Region. Elastic Block Storage provides storage that's independent of particular server instances on EC2.

Amazon divides EC2 into multiple geographic regions, and some regions – including the East Region – are divided into multiple "availability zones". Amazon has always said that these zones are protected from each other's outages. "Availability Zones are distinct locations that are engineered to be insulated from failures in other Availability Zones," the company's website reads. But the East Region outage spread across multiple zones.

Some felt that Amazon had broken its promise over availability zones. But the particulars of the service-level agreement add a new twist to this discussion. "'Annual Uptime Percentage' is calculated by subtracting from 100% the percentage of 5 minute periods during the Service Year in which Amazon EC2 was in the state of 'Region Unavailable'," the agreement reads. "'Region Unavailable'...means that more than one Availability Zone in which you are running an instance, within the same Region, is 'Unavailable' to you."

According to Amazon's status messages, multiple availability zones experienced problems for about three hours on Thursday, then the problem was isolated in the zone where it began. John Engates, the chief technology officier at Rackspace, which operates a cloud service similar to Amazon's, believes Amazon is unlikely to provide many credits in the wake of the outage.

"More than one availability would have to go down for you to receive a credit, and you have to be down for a considerable about of time," Engates told us during a conversation at this week's OpenStack design summit in Santa Clara, California. "I really doubt they're pay a lot on credits."

Rackspace's Cloud Servers service does not provide a setup analogous to Amazon's availability zones. The Rackspace service-level agreement guarantees uptime for particular components within each service region, including its network, its data center infrastructure, and individual hosts. The company operates separate data centers in Texas, Chicago, and London.

Judging from Amazon's status messages, Engates says, he believes that Amazon's outage spread across multiple availability zones because the company was using availability zones to mirror Elastic Block Storage data for other zones. "Rather than replicating data within a zone, I think they were replicating between zones," he said. "And it seems that when they had a failure in one zone, traffic waterfalled into the other zones. It's like if there was a fire in a hotel. We would have to evacuate to the hotel across the street, and there may not be enough room in the hotel across the street for everyone to get a room."

It appears that the outage affected only those who were using Amazon's Elastic Block Storage service.

Engates says that Amazon's cloud service and its service-level agreement is set up in such as way that users must ensure redundancy across zones – if not across entire regions. "You have to think about how to allocate your application across multiple resources to maximize that SLA," he said. "Those that did so – NetFlix is one example of a big customer – did not experienced the same kind of outages as people who were very localized. You could put some of the blame on Amazon, but some of the blame on the customer."

Yes, multiple zones were hit by the outage. But Amazon does not promise 100 per cent availability. The company has said, however, that it is unable to restore EBS volumes for some customers. About 0.07 per cent of EBS volumes in the East Region, a status message indicates, "will not be fully recoverable". ®

Internet Security Threat Report 2014

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
IBM storage revenues sink: 'We are disappointed,' says CEO
Time to put the storage biz up for sale?
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.