Google Cloud rolls back changes after 18-hour load balancer brownout
VMs across US, Europe and Asia all unable to “connect to backends”
Google Cloud's load balancers have suffered a lengthy connectivity problem.
First reported at 00:52 Pacific time on August 30th, the incident is still unresolved as of 19:18.
Google's struggled with this one. At 06:00 the company said it had “determined the infrastructure component responsible for the issue and mitigation work is currently underway.”
But at 07:00 the message changed to “Our previous actions did not resolve the issue. We are pursuing alternative solutions.”
At 08:30 the message changed to “We have identified the event that triggers this issue and are rolling back a configuration change to mitigate this issue.” Half an hour later, that change was implemented and Google started mopping up with “further measures to completly resolve the issue.”
That fix meant that no new instances would have the problem, but instances running when the problem struck were still stricken. Google then advised that users should do the following:
Create a new TargetPool. Add the affected VMs in a region to the new TargetPool. Wait for the VMs to start working in their existing load balancer configuration. Delete the new TargetPool. DO NOT delete the existing load balancer config, including the old target pool. It is not necessary to create a new ForwardingRule.
Which is just the kind of thing cloud users pay not to have to do in a hurry. And also not clear enough for some users, because half an hour later Google re-wrote the instructions “with better formatting.”
At the time of writing, Google says the issue “should be resolved for all regions except for < 10% of affected Network Load Balancers in us-central1.”
Google's all-but-admitted an update of its own making was the source of this mess, making it likely this is another case of the company breaking its own cloud as happened in April 2016, July 2016, August 2016 and September 2016.
Time to reset the “Days since last self-inflicted cloud crash” counter to zero, guys. ®
Sponsored: Becoming a Pragmatic Security Leader