Why did Visual Studio Marketplace go down in the Great Azure TITSUP? Ask Azure DevOps

Failover is not an option

Young guy facepalms while holding a laptop

The team behind Microsoft's Visual Studio Marketplace has issued an explanation as to why it also took the day off after Azure's weather-based wobble.

In a commendable act of openness, the software giant laid out what went wrong with its Marketplace in a blog post that points a shaky finger at Azure DevOps for the outage back on 4 September.

Visual Studio Marketplace serves up extensions for Visual Studio and Azure DevOps. Want a bit of vim in your life or fancy plugging the Python Extensions Pack into Visual Studio Code? The odds are that you'll be using the Marketplace. Unless you were trying to get to it between 0945 UTC on 4 September to 0430 UTC on 5 September, of course.

Even with the Azure region recovered, it wasn't until 1000 UTC on 5 September that Microsoft reckons things got back to normal; more than a day later. The delay was caused by high load as developers tried to reconnect and saw the back-end DB pass 95 per cent in CPU utilisation. Engineers throttled things back until the database could be scaled out. Ouch.

There was one bright spot for developers unable to browse, download or update extensions (or for the unfortunate publishers, who were also locked out) – already acquired extensions still functioned, and the content delivery network used by the Marketplace service meant then even extensions with a dependency on the service continued to work.

Why didn't Marketplace simply failover to another region? Well, that was down to Azure DevOps infrastructure. Though the team admitted that, yes, the mystery errors and timeouts seen by developers could have been a bit more helpful.

For its part, Azure DevOps has already had its long, dark teatime of the soul, facing up to the fact that it has quite the single point of failure in the form of the South Central US region. As with other Azure services, failover wasn't an option (as Gene Kranz has never said) for fear of losing precious customer data.

As for why Azure DevOps and its dependencies, such as Marketplace, aren't using Azure Availability Zones, there was a bit of shuffling of feet and a muttering that the functionality wasn't generally available until March this year.

So, the plan for Marketplace and DevOps is to shift over to an Availability Zone, which should stay up unless an entire region went down. Unsurprisingly, the South Central US region is not, as yet, an Availability Zone. Storage will see asynchronous writes to both primary and secondary regions and Azure SQL will get active geo-replication.

Microsoft also has the hugely distributed Cosmos DB at its disposal, but has no plans to take advantage of its freshly announced 99.999 per cent read and write availability for Azure DevOps just yet. ®




Biting the hand that feeds IT © 1998–2018