Feeds

Microsoft's own code should prevent an Azure SSL fail: So what went wrong?

Cloud service fell over despite cert automation in Server 2012

The essential guide to IT transformation

Sysadmin blog Server 2012 is the Microsoft operating system that, in my opinion, makes cloud computing a reality. As far as I am concerned it is as big a leap over Server 2008 R2 as that OS was over Server 2003. With it you can build anything from a small cluster to a service as big as Microsoft's own Azure platform.

Which is why I am completely baffled as to how it is possible that Azure was knocked offline by last week's SSL cock-up.

Let me start out by saying that I have the utmost sympathy – and respect – for the poor bastards working behind the scenes to fix this particular embarrassing incident. I'm not too proud to admit that I have done the exact same thing; like Microsoft, I've accidentally let a HTTPS certificate lapse more than once.

I could throw up excuses such as the ever infamous "I was too busy". I could even hand-wave at Apache's maddening certificate management (which makes it easy to miss a node) or RapidSSL's long delays in verifying the certs.

I could make those excuses, but I won't; none of them are valid. I screwed up because I was lazy, and any users trying to access an Outlook Web App late at night last Christmas (and the one before) were terribly inconvenienced for nearly six hours. The bit that bothers me about this snafu is that Microsoft doesn't even get to try those excuses. Not only can Microsoft sign its own damned certs, Server 2012 makes this whole process so simple web administrators will weep.

Microsoft has code to save itself from this sort of blunder

One of the features buried inside the release notes for Server 2012 is Centralized SSL Certificate (CSC) management. You can run a farm of up to 10,000 IIS web server nodes off a single CSC server; each of them can be directed to automatically contact the server to receive their certs from a single server that gives you a reasonably simple interface to direct a symphony of re-validation.

Considering everything in Microsoft's new cloudy world is PowerShell scriptable, you can even stagger renewals so that no one certificate expiration can tank everything. Microsoft doesn't have to worry about licensing Microsoft's own kit, so how exactly did this happen?

Even if it was the cryptographic certificate upstream from the end nodes that expired, why wasn't the CSC server auto-renewing from elsewhere? Since Redmond can sign its own certs, then between CSC and Server 2012's more traditional certificate manager you could have a great big circle jerk with servers auto-renewing in an endless frolic of crypto-hedonism.

So let's set this aside for the moment and assume that for whatever reason someone somewhere decided that it was vitally important to manually update a certificate along the chain. What could have prevented them from doing so? Maybe it was the data centre edge blacklist that Office 365 users can't control. Nah; you'd think that the cert guy would have an internal staff list that would tell him where to send the bottle of scotch to make sure that the people who try to send him email actually can.

Still working on the assumption that an expired cert was at fault, last I checked, Microsoft had some money lying around, so if it was getting the certificate verified by an external entity it should have been possible to pay the bill. Laziness? I doubt it. Surely Microsoft pays its systems administrators enough to actually care about their job. It is highly unlikely to be the fault of any one person not pulling the trigger on the update.

That leaves me with two remaining possibilities. The first: Microsoft isn't using its own rather excellent technology to handle these certs. I'm not fully sure of the underpinnings of Azure; does it run on Server 2012? Bing.com does. Even if Azure isn't using off-the-shelf Windows Server, there would be a delicious irony if Microsoft – enthusiastic player of the constant, cacophonous drumbeat of "upgrade for your own good" – had failed to take advantage of technology it invented to solve this exact problem.

I find it hard to buy that Microsoft doesn't have a version of CSC for their Azure infrastructure, leaving me with only one solid hypothesis about Azure's outage. I believe Microsoft is coming face to face with the fact that when pretty much all automation relies on scripting - using PowerShell or otherwise - a simple change to one line of code in one script can topple the mightiest cloud. Even one built on a foundation as solid as Server 2012.

I have a lot of respect for the systems administrators running Azure. That's a big, complicated job with an enormous amount of pressure. Right now, they are probably getting emotionally flayed alive - I won't envy them for the next few weeks. I would, however, like to offer a suggestion to Microsoft - especially the script-all-the-things happy server division. Pick up the phone and call Luke Kanies over at PuppetLabs.

Ask him nicely for an education on why enforced states are better than scripts. Learn from those who have solved the problem of leaving the reputation of their flagship cloud service hanging on a single forgotten semicolon. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Munich considers dumping Linux for ... GULP ... Windows!
Give a penguinista a hug, the Outlook's not good for open source's poster child
Intel's Raspberry Pi rival Galileo can now run Windows
Behold the Internet of Things. Wintel Things
Linux Foundation says many Linux admins and engineers are certifiable
Floats exam program to help IT employers lock up talent
Microsoft cries UNINSTALL in the wake of Blue Screens of Death™
Cache crash causes contained choloric calamity
Eat up Martha! Microsoft slings handwriting recog into OneNote on Android
Freehand input on non-Windows kit for the first time
Linux kernel devs made to finger their dongles before contributing code
Two-factor auth enabled for Kernel.org repositories
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 10 endpoint backup mistakes
Avoid the ten endpoint backup mistakes to ensure that your critical corporate data is protected and end user productivity is improved.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.