Feeds

Google says it's sorry for Monday's hours-long Gmail delays

Dual networking outage won't happen again, honest

Beginner's guide to SSL certificates

Google apologized on Tuesday for a networking glitch that prevented emails from reaching many Gmail users' accounts for as much as two hours or even longer.

"The message delivery delays were triggered by a dual network failure," Gmail site reliability engineer Sabrina Farmer wrote in a blog post. "This is a very rare event in which two separate, redundant network paths both stop working at the same time."

Email delivery broke down at around 5:54am Pacific time on Monday, Farmer said, and the online ad giant didn't get on top of the problem until around 1pm the same day. The full message backlog wasn't cleared until around 4pm.

The service interruption only affected around 29 per cent of messages passing through Gmail, Farmer said, and of those, the typical message was only delayed by 2.6 seconds. But some messages were left hanging much longer, and in the worst cases – about 1.5 per cent of the total – they were delayed for more than two hours. In addition, some users who tried to download large attachments from their Gmail accounts experienced errors.

A contrite Farmer expressed Google's regrets. "We realize that our users rely on Gmail to be always available and always fast, and for several hours we didn't deliver," she wrote.

But to be fair, she said, even several hours' worth of spotty networking had negligible impact on Gmail's overall uptime stats. "Gmail remains well above 99.9% available," she wrote, "and we intend to keep it that way!"

Users weren't locked out of their Gmail accounts during the incident, and they were able to read email that had already been delivered and even send new messages of their own.

Gmail has certainly dealt with worse. In 2012, a misconfigured sync server triggered a system-wide Gmail outage and caused the Chrome browser to spontaneously crash at the same time. And just last month, all of Google's services mysteriously vanished from the net at once, causing not just Gmail but 40 per cent of internet traffic to go dark for a few minutes.

Still, Farmer says Gmail will be updating its network capacity and adjusting its infrastructure so that mail delivery will be "more resilient," even in the event of a dual network failure. What's more, the Chocolate Factory will rejigger its internal practices so that the next time something like this happens, its engineering teams will be quicker to respond.

Shame on you, Google. Shame on you. ®

Secure remote control for conventional and virtual desktops

Whitepapers

Driving business with continuous operational intelligence
Introducing an innovative approach offered by ExtraHop for producing continuous operational intelligence.
Why CIOs should rethink endpoint data protection in the age of mobility
Assessing trends in data protection, specifically with respect to mobile devices, BYOD, and remote employees.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Reducing the cost and complexity of web vulnerability management
How using vulnerability assessments to identify exploitable weaknesses and take corrective action can reduce the risk of hackers finding your site and attacking it.