Feeds

Amazon cloud fell from sky after botched network upgrade

'Catholic penance' awards 10 days of credit

Internet Security Threat Report 2014

Clouds as dominos

This caused a kind of domino effect. The EBS cluster couldn't handle API requests to create new volumes, and as these requests backed up in a queue, it couldn't handle API requests from other availability zones. At 2:40 am, engineers disabled all requests to create new volumes in the affected availability zone, and ten minutes later, the company said, requests from other zones were operating normally.

But then EBS nodes in the affected zone started failing, and at about 5:40 am, this again caused problems in other zones. Amazon said that within about 3 hours, engineers began to lower error rates and latencies in those other zones and that by 12:04 pm, they had isolated the problem in the original zone. For about 11 hours that morning, users were also unable to launch new EBS-backed EC2 instances in the affected zone.

Just after noon, about 13 per cent of EBS volumes in the original zone remained "stuck" and EBS APIs remained disabled. By 12:30pm on April 22 (the next day), all but 2.2 per cent of EBS volumes were restored. By 2 pm on April 24, all but 0.07 per cent was restored, and these, Amazon said, won't be restored. The company did not explain why.

The outage also affected Amazon's Relational Database Service (RBS), as RBS relies on EBS for storage.

Amazon said it will automatically provide customers with 10 days of credit to equal to 100 per cent of their usage of EBS volumes, EC2 instances and RDS database instances that were running in the affected availability zone at the time of the outage. It did not mention credits for services operating in the other availability zones.

The company did say that availability zones are physically separate from each other, but did not elaborate. It's unclear whether they're in separate data centers. In the post mortem, the company also said it intends to improve the design of the availability zones so that an EBS outage like this cannot spread from one zone to another.

Amazon also promises to expose additional APIs that will allow customers to more easily determine whether their instances are affected by an outage. This move was applauded by FathomDB's Santa Barbara, but he believes that the world should consider alternatives to Amazon, which pioneered the infrastructure cloud market and controls the largest market share.

"Amazon has been open about admitting to failure, and has promised to expose more of the private APIs so that customers and partners can be better able to help themselves in future outages, without relying on AWS to do so," he said.

"This is reassuring, though I believe that in the long term customers will be looking at other cloud operators and technologies, for redundancy and different philosophies in terms of timely and open customer communication, but also in terms of relying on well-understood technologies and on the broader community of engineering talent rather than just those at AWS." ®

Internet Security Threat Report 2014

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
IBM storage revenues sink: 'We are disappointed,' says CEO
Time to put the storage biz up for sale?
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
VMware's tool to harden virtual networks: a spreadsheet
NSX security guide lands in intriguing format
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.