Feeds

Windows Azure Compute cloud goes TITSUP PLANET-WIDE

Looks like a distributed system, breaks like a single tenant

Reducing the cost and complexity of web vulnerability management

Microsoft's Windows Azure cloud was hit by a worldwide partial compute outage today, calling into question how effectively Redmond has partitioned its service.

The problems emerged at 2.35AM UTC, and were still ongoing as of 10.20PM UTC the same day, according to the company's service dashboard.

"Manual actions to perform Swap Deployment operations on Cloud Services may error, which will then restrict Service Management functions," the company said.

Every single Azure region – a geographically distant and independent set of data centers – was affected, but for posterity that included: West US, West Europe, Southeast Asia, South Central US, North Europe, North Central US, East Asia, and East US.

"We are taking all necessary steps to mitigate this incident for the affected hosted services as soon as possible. Further updates will be published within 2 hours to keep you apprised of the situation. We apologize for any inconvenience this causes our customers," the company wrote at 10PM UTC.

Swap Deployment operations let developers initiate a virtual IP address swap between staging and production environments for services. Swap Deployment is an asynchronous operation that interacts with an Azure management service. Though not a main component of the IaaS cloud, an outage would be irritating for some heavy users, and a global outage is likely to damage confidence in Microsoft's ability to manage services at scale.

WindowsAzureFail

Dashboard dashed ... a global failure is the absolute worst thing that can happen to a cloud

Alongside a global fail to a sub-component of Compute, the Azure cloud's Website feature also reported a global problem with "FTP data access" which began at 7PM UTC, suggesting a cascading fail from some part of the problem that downed Swap Deployment.

The antithesis of cloud computing is a problem cropping up that affects all regions simultaneously, and yet this marks the second time in under a year that Microsoft has had a concurrent global fail.

Last time we had a Blue Sky of Death it was due to a lapsed security certificate which downed all worldwide Windows Azure storage services. This time a much more minor component of the cloud has gone down, but the fact it has failed globally is a severe indictment against the partitioning policies Microsoft may have put in place. ®

Choosing a cloud hosting partner with confidence

Whitepapers

Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
Security and trust: The backbone of doing business over the internet
Explores the current state of website security and the contributions Symantec is making to help organizations protect critical data and build trust with customers.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.