Google says it's sorry for Monday's hours-long Gmail delays
Dual networking outage won't happen again, honest
Google apologized on Tuesday for a networking glitch that prevented emails from reaching many Gmail users' accounts for as much as two hours or even longer.
"The message delivery delays were triggered by a dual network failure," Gmail site reliability engineer Sabrina Farmer wrote in a blog post. "This is a very rare event in which two separate, redundant network paths both stop working at the same time."
Email delivery broke down at around 5:54am Pacific time on Monday, Farmer said, and the online ad giant didn't get on top of the problem until around 1pm the same day. The full message backlog wasn't cleared until around 4pm.
The service interruption only affected around 29 per cent of messages passing through Gmail, Farmer said, and of those, the typical message was only delayed by 2.6 seconds. But some messages were left hanging much longer, and in the worst cases – about 1.5 per cent of the total – they were delayed for more than two hours. In addition, some users who tried to download large attachments from their Gmail accounts experienced errors.
A contrite Farmer expressed Google's regrets. "We realize that our users rely on Gmail to be always available and always fast, and for several hours we didn't deliver," she wrote.
But to be fair, she said, even several hours' worth of spotty networking had negligible impact on Gmail's overall uptime stats. "Gmail remains well above 99.9% available," she wrote, "and we intend to keep it that way!"
Users weren't locked out of their Gmail accounts during the incident, and they were able to read email that had already been delivered and even send new messages of their own.
Gmail has certainly dealt with worse. In 2012, a misconfigured sync server triggered a system-wide Gmail outage and caused the Chrome browser to spontaneously crash at the same time. And just last month, all of Google's services mysteriously vanished from the net at once, causing not just Gmail but 40 per cent of internet traffic to go dark for a few minutes.
Still, Farmer says Gmail will be updating its network capacity and adjusting its infrastructure so that mail delivery will be "more resilient," even in the event of a dual network failure. What's more, the Chocolate Factory will rejigger its internal practices so that the next time something like this happens, its engineering teams will be quicker to respond.
Shame on you, Google. Shame on you. ®
Sponsored: IBM FlashSystem V9000 product guide