Feeds

Amazon cloud fell from sky after botched network upgrade

'Catholic penance' awards 10 days of credit

Secure remote control for conventional and virtual desktops

Clouds as dominos

This caused a kind of domino effect. The EBS cluster couldn't handle API requests to create new volumes, and as these requests backed up in a queue, it couldn't handle API requests from other availability zones. At 2:40 am, engineers disabled all requests to create new volumes in the affected availability zone, and ten minutes later, the company said, requests from other zones were operating normally.

But then EBS nodes in the affected zone started failing, and at about 5:40 am, this again caused problems in other zones. Amazon said that within about 3 hours, engineers began to lower error rates and latencies in those other zones and that by 12:04 pm, they had isolated the problem in the original zone. For about 11 hours that morning, users were also unable to launch new EBS-backed EC2 instances in the affected zone.

Just after noon, about 13 per cent of EBS volumes in the original zone remained "stuck" and EBS APIs remained disabled. By 12:30pm on April 22 (the next day), all but 2.2 per cent of EBS volumes were restored. By 2 pm on April 24, all but 0.07 per cent was restored, and these, Amazon said, won't be restored. The company did not explain why.

The outage also affected Amazon's Relational Database Service (RBS), as RBS relies on EBS for storage.

Amazon said it will automatically provide customers with 10 days of credit to equal to 100 per cent of their usage of EBS volumes, EC2 instances and RDS database instances that were running in the affected availability zone at the time of the outage. It did not mention credits for services operating in the other availability zones.

The company did say that availability zones are physically separate from each other, but did not elaborate. It's unclear whether they're in separate data centers. In the post mortem, the company also said it intends to improve the design of the availability zones so that an EBS outage like this cannot spread from one zone to another.

Amazon also promises to expose additional APIs that will allow customers to more easily determine whether their instances are affected by an outage. This move was applauded by FathomDB's Santa Barbara, but he believes that the world should consider alternatives to Amazon, which pioneered the infrastructure cloud market and controls the largest market share.

"Amazon has been open about admitting to failure, and has promised to expose more of the private APIs so that customers and partners can be better able to help themselves in future outages, without relying on AWS to do so," he said.

"This is reassuring, though I believe that in the long term customers will be looking at other cloud operators and technologies, for redundancy and different philosophies in terms of timely and open customer communication, but also in terms of relying on well-understood technologies and on the broader community of engineering talent rather than just those at AWS." ®

Internet Security Threat Report 2014

More from The Register

next story
The cloud that goes puff: Seagate Central home NAS woes
4TB of home storage is great, until you wake up to a dead device
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
Intel offers ingenious piece of 10TB 3D NAND chippery
The race for next generation flash capacity now on
Want to STUFF Facebook with blatant ADVERTISING? Fine! But you must PAY
Pony up or push off, Zuck tells social marketeers
Oi, Europe! Tell US feds to GTFO of our servers, say Microsoft and pals
By writing a really angry letter about how it's harming our cloud business, ta
SAVE ME, NASA system builder, from my DEAD WORKSTATION
Anal-retentive hardware nerd in paws-on workstation crisis
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Choosing a cloud hosting partner with confidence
Download Choosing a Cloud Hosting Provider with Confidence to learn more about cloud computing - the new opportunities and new security challenges.
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.