Feeds

Amazon cloud fell from sky after botched network upgrade

'Catholic penance' awards 10 days of credit

Gartner critical capabilities for enterprise endpoint backup

Clouds as dominos

This caused a kind of domino effect. The EBS cluster couldn't handle API requests to create new volumes, and as these requests backed up in a queue, it couldn't handle API requests from other availability zones. At 2:40 am, engineers disabled all requests to create new volumes in the affected availability zone, and ten minutes later, the company said, requests from other zones were operating normally.

But then EBS nodes in the affected zone started failing, and at about 5:40 am, this again caused problems in other zones. Amazon said that within about 3 hours, engineers began to lower error rates and latencies in those other zones and that by 12:04 pm, they had isolated the problem in the original zone. For about 11 hours that morning, users were also unable to launch new EBS-backed EC2 instances in the affected zone.

Just after noon, about 13 per cent of EBS volumes in the original zone remained "stuck" and EBS APIs remained disabled. By 12:30pm on April 22 (the next day), all but 2.2 per cent of EBS volumes were restored. By 2 pm on April 24, all but 0.07 per cent was restored, and these, Amazon said, won't be restored. The company did not explain why.

The outage also affected Amazon's Relational Database Service (RBS), as RBS relies on EBS for storage.

Amazon said it will automatically provide customers with 10 days of credit to equal to 100 per cent of their usage of EBS volumes, EC2 instances and RDS database instances that were running in the affected availability zone at the time of the outage. It did not mention credits for services operating in the other availability zones.

The company did say that availability zones are physically separate from each other, but did not elaborate. It's unclear whether they're in separate data centers. In the post mortem, the company also said it intends to improve the design of the availability zones so that an EBS outage like this cannot spread from one zone to another.

Amazon also promises to expose additional APIs that will allow customers to more easily determine whether their instances are affected by an outage. This move was applauded by FathomDB's Santa Barbara, but he believes that the world should consider alternatives to Amazon, which pioneered the infrastructure cloud market and controls the largest market share.

"Amazon has been open about admitting to failure, and has promised to expose more of the private APIs so that customers and partners can be better able to help themselves in future outages, without relying on AWS to do so," he said.

"This is reassuring, though I believe that in the long term customers will be looking at other cloud operators and technologies, for redundancy and different philosophies in terms of timely and open customer communication, but also in terms of relying on well-understood technologies and on the broader community of engineering talent rather than just those at AWS." ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Object storage bods Exablox: RAID is dead, baby. RAID is dead
Bring your own disks to its object appliances
Nimble's latest mutants GORGE themselves on unlucky forerunners
Crossing Sandy Bridges without stopping for breath
A beheading in EMC's ViPR lair? Software's big cheese to advise CEO
Changes amid rivalry in the storage snake pit
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 10 endpoint backup mistakes
Avoid the ten endpoint backup mistakes to ensure that your critical corporate data is protected and end user productivity is improved.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.