SaaS

This article is more than 1 year old

Amazon cloud fell from sky after botched network upgrade

'Catholic penance' awards 10 days of credit

Fri 29 Apr 2011 // 18:42 UTC

Clouds as dominos

This caused a kind of domino effect. The EBS cluster couldn't handle API requests to create new volumes, and as these requests backed up in a queue, it couldn't handle API requests from other availability zones. At 2:40 am, engineers disabled all requests to create new volumes in the affected availability zone, and ten minutes later, the company said, requests from other zones were operating normally.

But then EBS nodes in the affected zone started failing, and at about 5:40 am, this again caused problems in other zones. Amazon said that within about 3 hours, engineers began to lower error rates and latencies in those other zones and that by 12:04 pm, they had isolated the problem in the original zone. For about 11 hours that morning, users were also unable to launch new EBS-backed EC2 instances in the affected zone.

Just after noon, about 13 per cent of EBS volumes in the original zone remained "stuck" and EBS APIs remained disabled. By 12:30pm on April 22 (the next day), all but 2.2 per cent of EBS volumes were restored. By 2 pm on April 24, all but 0.07 per cent was restored, and these, Amazon said, won't be restored. The company did not explain why.

The outage also affected Amazon's Relational Database Service (RBS), as RBS relies on EBS for storage.

Amazon said it will automatically provide customers with 10 days of credit to equal to 100 per cent of their usage of EBS volumes, EC2 instances and RDS database instances that were running in the affected availability zone at the time of the outage. It did not mention credits for services operating in the other availability zones.

The company did say that availability zones are physically separate from each other, but did not elaborate. It's unclear whether they're in separate data centers. In the post mortem, the company also said it intends to improve the design of the availability zones so that an EBS outage like this cannot spread from one zone to another.

Amazon also promises to expose additional APIs that will allow customers to more easily determine whether their instances are affected by an outage. This move was applauded by FathomDB's Santa Barbara, but he believes that the world should consider alternatives to Amazon, which pioneered the infrastructure cloud market and controls the largest market share.

"Amazon has been open about admitting to failure, and has promised to expose more of the private APIs so that customers and partners can be better able to help themselves in future outages, without relying on AWS to do so," he said.

"This is reassuring, though I believe that in the long term customers will be looking at other cloud operators and technologies, for redundancy and different philosophies in terms of timely and open customer communication, but also in terms of relying on well-understood technologies and on the broader community of engineering talent rather than just those at AWS." ®

Topics

Special Features

Vendor Voice

Resources

SaaS

Amazon cloud fell from sky after botched network upgrade

'Catholic penance' awards 10 days of credit

Clouds as dominos

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Amazon to lure upstarts with $500K in AWS AI credits each

GenAI will be bigger than the cloud or the internet, Amazon CEO hopes

Snowmobile, Amazon's truck-powered migration service, reaches the end of the road

Protecting distributed branch office environments from ransomware

Irish power crunch could be prompting AWS to ration compute resources

US-EAST-1 region is not the cloudy crock it's made out to be, claims AWS EC2 boss

AWS severs connection with several hundred staff

Amazon search results now less self-centered, boffin says

AWS must pay $525M to cloud storage patent holder, says jury

Amazon finishes pumping $4B into AI darling Anthropic

UK govt office admits ability to negotiate billions in cloud spending curbed by vendor lock-in

Microsoft hiring Inflection team triggers interest from EU's antitrust chief

About Us

Our Websites

Your Privacy