Lightning strikes cloud: Amazon, MS downed
Down and still out in Dublin
Microsoft has been left reeling again after another BPOS crash but at least on this occasion it was not alone, as Amazon's EC2 web services were also downed by the same act of God in Europe.
A bolt of lightning struck a transformer at a power utility provider in Dublin, causing an explosion that took down the back-systems last night for the region.
Amazon admitted to having issues at 7pm last night and told users via its service health dashboard that under such circumstances, a power cut would usually be "seamlessly picked up by backup generators".
"The transient electric deviation caused by the explosion was large enough that it propagated to a portion of the phase control system that synchronises the backup generator plant, disabling some of them," it stated.
Power sources needed to be "phase synchronised" before being brought online to load, which needed to be done manually, causing delays to the resumption of services in Amazon's Elastic Cloud Compute and Relational Database Service.
"Due to the scale of the power disruption, a large number of EBS servers lost power and require manual operations before volumes can be restored. Restoring these volumes requires that we make an extra copy of all data, which has consumed most spare capacity and slowed our recovery process," said Amazon.
Amazon added that it was installing extra capacity onsite and from other data centres, but added: "While many volumes will be restored over the next several hours, we anticipate that it will take 24-48 hours until the process is completed."
One source commented that an outage of such a magnitude "can't be good for the S3 (simple storage service) Cloud setup, if they can't keep their own site online".
Ironically, Microsoft told BPOS users via Twitter that a "Europe data centre power issues was affecting services", just after tweeting about ramp-up tips for Office 365, the latest cloud service that replaces BPOS.
Redmond has come under criticism for the number of service interruptions to BPOS, and this latest incident will cause a few more blushes. That said, Microsoft recently admitted power outages in Office 365 "will happen".
It confirmed to El Reg today that the "widespread power outage in Dublin caused connectivity issues for European BPOS customers" for near on seven hours.
"Throughout the incident, we updated our customers regularly on the issue via our normal communication channels," Microsoft stated.
Angela Eager, research director at TechMarketView, said the incident will rightly lead to questions about the "viability of the cloud as a delivery platform" but added outages were not a sign that the cloud does not work.
"Outages will not go away. The onus is on service providers to react quickly in terms of providing status updates – using social media where its own service has failed – and getting everything online again.
"There is also a need to address business contingency on behalf of customers through the use of backup and mirrored facilities. That costs but it is a necessary cost and underlines the need for a web of alliances between application providers and cloud service and infrastructure providers to allow switching in the event of a failure." ®