Original URL: http://www.theregister.co.uk/2012/07/27/twitter_outage_explained/

Twitter titsup: Our failover was actually just FAIL ALL OVER

Double trouble after data centre double-whammy

By Kelly Fiveash

Posted in Servers, 27th July 2012 11:48 GMT

Twitter fell offline last night for several hours because - the company has now confirmed - redundancy in the micro-blogging site's data centres failed to kick in.

The result was a catastrophic system collapse, Twitter's engineering veep Mazen Rawashdeh explained:

The cause of today’s outage came from within our data centers. Data centers are designed to be redundant: when one system fails (as everything does at one time or another), a parallel system takes over. What was noteworthy about today’s outage was the coincidental failure of two parallel systems at nearly the same time.

The company is now "aggressively" investigating what Rawashdeh described as an "infrastructural double-whammy" to find out what went wrong with its failover system and to prevent it happening in the future.

"On behalf of our infrastructure team, we apologise deeply for the interruption you had today. Now - back to making the service even better and more stable than ever," the exec added.

The outage was the second one to hit Twitter in a little over a month. Last time the culprit was a cascaded bug. ®