Feeds

AWS stops some EC2 servers without warning

‘Retirement’ notifications not entirely accurate or reliable

Build a business case: developing custom apps

If you’re thinking about heading to the cloud for über-reliability and an environment in which anything that happens to hardware is someone else’s problem, think again: Amazon Web Services sometimes replaces the hardware virtual servers run on and switches those servers off without elegant or accurate notifications of what’s about to happen.

AWS calls this ‘Instance retirement’ and makes it happen when the physical server an elastic compute cloud (EC2) instance runs on ‘degrades’ and is in danger of experiencing hardware failure. Which is very useful indeed, but also a little worrying as the cloud company does not always retire instances elegantly.

Social marketing analysis firm awe.sm recently blogged about the problem here, describing the symptoms as follows:

“Virtual hardware doesn’t last as long as real hardware. Our average observed lifetime for a virtual machine on EC2 over the last 3 years has been about 200 days. After that, the chances of it being ‘retired’ rise hugely. And Amazon’s ‘retirement’ process is unpredictable: sometime they’ll notify you ten days in advance that a box is going to be shut down; sometimes the retirement notification email arrives 2 hours after the box has already failed.”

A trawl through AWS’ support forums suggests that the company isn’t switching off servers without notifications every day, but threads pop up quite regularly in which users complain about servers disappearing.

One such thread (which we shan’t link to because it includes some personal details), complains that “One of our EC2 instance[s] hung and retired an hour before receiving notification from AWS.”

Such an incident would not be entirely painful if the user in question’s instance used Elastic Block Store (EBS), as users with that arrangement need only stop and restart the instance and it will resume operations on new hardware. Performing that action takes mere minutes, so if the hang didn't interrupt important operations the disruption would be slight.

But users whose instances run an Amazon Machine Image from the instance store have a harder task before them. AWS emails on the topic say “If your instance's root device is an instance store, it will be terminated after the retirement date. We recommend that you launch a replacement instance from your most recent AMI and migrate all necessary data to the replacement instance before this time.” It’s also possible to convert AMI instances to EBS instances, but that’s a bit of a chore, as detailed here.

The need to revert to backups or convert to EBS instances is taking some users by surprise as they don’t have backups, as this thread shows.

Of course those without backups have only themselves to blame. EC2 users are also notified of imminent retirement by the AWS console, so again there’s an element of personal responsibility that needs to be considered here ... although the fact that some of AWS’ retirement notifications seem not to be timely is bothersome.

It’s also worth noting that the retirement process isn’t well-documented – we could find only seven mentions of it in the AWS support database and it doesn’t get a mention in the EC2 FAQ, although the terms and conditions for AWS go out of their way to point out the service can experience interruptions.

None of which means that AWS is placing users in peril. But the fact that cloud servers can be halted without prior notification and in ways that require a fair bit of work to repair is surely something to take into account when considering just what the cloud means for your operations. ®

Build a business case: developing custom apps

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
Microsoft says 'weird things' can happen during Windows Server 2003 migrations
Fix coming for bug that makes Kerberos croak when you run two domain controllers
Cisco says network virtualisation won't pay off everywhere
Another sign of strain in the Borg/VMware relationship?
VVOL update: Are any vendors NOT leaping into bed with VMware?
It's not yet been released but everyone thinks it's the dog's danglies
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Backing up Big Data
Solving backup challenges and “protect everything from everywhere,” as we move into the era of big data management and the adoption of BYOD.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.