Feeds

Google says it's sorry for Monday's hours-long Gmail delays

Dual networking outage won't happen again, honest

Internet Security Threat Report 2014

Google apologized on Tuesday for a networking glitch that prevented emails from reaching many Gmail users' accounts for as much as two hours or even longer.

"The message delivery delays were triggered by a dual network failure," Gmail site reliability engineer Sabrina Farmer wrote in a blog post. "This is a very rare event in which two separate, redundant network paths both stop working at the same time."

Email delivery broke down at around 5:54am Pacific time on Monday, Farmer said, and the online ad giant didn't get on top of the problem until around 1pm the same day. The full message backlog wasn't cleared until around 4pm.

The service interruption only affected around 29 per cent of messages passing through Gmail, Farmer said, and of those, the typical message was only delayed by 2.6 seconds. But some messages were left hanging much longer, and in the worst cases – about 1.5 per cent of the total – they were delayed for more than two hours. In addition, some users who tried to download large attachments from their Gmail accounts experienced errors.

A contrite Farmer expressed Google's regrets. "We realize that our users rely on Gmail to be always available and always fast, and for several hours we didn't deliver," she wrote.

But to be fair, she said, even several hours' worth of spotty networking had negligible impact on Gmail's overall uptime stats. "Gmail remains well above 99.9% available," she wrote, "and we intend to keep it that way!"

Users weren't locked out of their Gmail accounts during the incident, and they were able to read email that had already been delivered and even send new messages of their own.

Gmail has certainly dealt with worse. In 2012, a misconfigured sync server triggered a system-wide Gmail outage and caused the Chrome browser to spontaneously crash at the same time. And just last month, all of Google's services mysteriously vanished from the net at once, causing not just Gmail but 40 per cent of internet traffic to go dark for a few minutes.

Still, Farmer says Gmail will be updating its network capacity and adjusting its infrastructure so that mail delivery will be "more resilient," even in the event of a dual network failure. What's more, the Chocolate Factory will rejigger its internal practices so that the next time something like this happens, its engineering teams will be quicker to respond.

Shame on you, Google. Shame on you. ®

Top 5 reasons to deploy VMware with Tegile

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.