Feeds

Microsoft's own code should prevent an Azure SSL fail: So what went wrong?

Cloud service fell over despite cert automation in Server 2012

Providing a secure and efficient Helpdesk

Sysadmin blog Server 2012 is the Microsoft operating system that, in my opinion, makes cloud computing a reality. As far as I am concerned it is as big a leap over Server 2008 R2 as that OS was over Server 2003. With it you can build anything from a small cluster to a service as big as Microsoft's own Azure platform.

Which is why I am completely baffled as to how it is possible that Azure was knocked offline by last week's SSL cock-up.

Let me start out by saying that I have the utmost sympathy – and respect – for the poor bastards working behind the scenes to fix this particular embarrassing incident. I'm not too proud to admit that I have done the exact same thing; like Microsoft, I've accidentally let a HTTPS certificate lapse more than once.

I could throw up excuses such as the ever infamous "I was too busy". I could even hand-wave at Apache's maddening certificate management (which makes it easy to miss a node) or RapidSSL's long delays in verifying the certs.

I could make those excuses, but I won't; none of them are valid. I screwed up because I was lazy, and any users trying to access an Outlook Web App late at night last Christmas (and the one before) were terribly inconvenienced for nearly six hours. The bit that bothers me about this snafu is that Microsoft doesn't even get to try those excuses. Not only can Microsoft sign its own damned certs, Server 2012 makes this whole process so simple web administrators will weep.

Microsoft has code to save itself from this sort of blunder

One of the features buried inside the release notes for Server 2012 is Centralized SSL Certificate (CSC) management. You can run a farm of up to 10,000 IIS web server nodes off a single CSC server; each of them can be directed to automatically contact the server to receive their certs from a single server that gives you a reasonably simple interface to direct a symphony of re-validation.

Considering everything in Microsoft's new cloudy world is PowerShell scriptable, you can even stagger renewals so that no one certificate expiration can tank everything. Microsoft doesn't have to worry about licensing Microsoft's own kit, so how exactly did this happen?

Even if it was the cryptographic certificate upstream from the end nodes that expired, why wasn't the CSC server auto-renewing from elsewhere? Since Redmond can sign its own certs, then between CSC and Server 2012's more traditional certificate manager you could have a great big circle jerk with servers auto-renewing in an endless frolic of crypto-hedonism.

So let's set this aside for the moment and assume that for whatever reason someone somewhere decided that it was vitally important to manually update a certificate along the chain. What could have prevented them from doing so? Maybe it was the data centre edge blacklist that Office 365 users can't control. Nah; you'd think that the cert guy would have an internal staff list that would tell him where to send the bottle of scotch to make sure that the people who try to send him email actually can.

Still working on the assumption that an expired cert was at fault, last I checked, Microsoft had some money lying around, so if it was getting the certificate verified by an external entity it should have been possible to pay the bill. Laziness? I doubt it. Surely Microsoft pays its systems administrators enough to actually care about their job. It is highly unlikely to be the fault of any one person not pulling the trigger on the update.

That leaves me with two remaining possibilities. The first: Microsoft isn't using its own rather excellent technology to handle these certs. I'm not fully sure of the underpinnings of Azure; does it run on Server 2012? Bing.com does. Even if Azure isn't using off-the-shelf Windows Server, there would be a delicious irony if Microsoft – enthusiastic player of the constant, cacophonous drumbeat of "upgrade for your own good" – had failed to take advantage of technology it invented to solve this exact problem.

I find it hard to buy that Microsoft doesn't have a version of CSC for their Azure infrastructure, leaving me with only one solid hypothesis about Azure's outage. I believe Microsoft is coming face to face with the fact that when pretty much all automation relies on scripting - using PowerShell or otherwise - a simple change to one line of code in one script can topple the mightiest cloud. Even one built on a foundation as solid as Server 2012.

I have a lot of respect for the systems administrators running Azure. That's a big, complicated job with an enormous amount of pressure. Right now, they are probably getting emotionally flayed alive - I won't envy them for the next few weeks. I would, however, like to offer a suggestion to Microsoft - especially the script-all-the-things happy server division. Pick up the phone and call Luke Kanies over at PuppetLabs.

Ask him nicely for an education on why enforced states are better than scripts. Learn from those who have solved the problem of leaving the reputation of their flagship cloud service hanging on a single forgotten semicolon. ®

Internet Security Threat Report 2014

More from The Register

next story
Microsoft WINDOWS 10: Seven ATE Nine. Or Eight did really
Windows NEIN skipped, tech preview due out on Wednesday
Business is back, baby! Hasta la VISTA, Win 8... Oh, yeah, Windows 9
Forget touchscreen millennials, Microsoft goes for mouse crowd
Apple: SO sorry for the iOS 8.0.1 UPDATE BUNGLE HORROR
Apple kills 'upgrade'. Hey, Microsoft. You sure you want to be like these guys?
ARM gives Internet of Things a piece of its mind – the Cortex-M7
32-bit core packs some DSP for VIP IoT CPU LOL
Microsoft on the Threshold of a new name for Windows next week
Rebranded OS reportedly set to be flung open by Redmond
Lotus Notes inventor Ozzie invents app to talk to people on your phone
Imagine that. Startup floats with voice collab app for Win iPhone
'Google is NOT the gatekeeper to the web, as some claim'
Plus: 'Pretty sure iOS 8.0.2 will just turn the iPhone into a fax machine'
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.
The next step in data security
With recent increased privacy concerns and computers becoming more powerful, the chance of hackers being able to crack smaller-sized RSA keys increases.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.