Google fingers wetware, yet again, as the reason for a cloudy outage
Users didn't get all the cloud they paid for thanks to algorithmic error
“People,” sang Nick Cave, “They ain't no good.” And Google's learning that applies double to clouds, after again finding human intervention caused an outage.
The problem, titled Quotas were reset to default values for some Customers lasted most of Tuesday's business hours on the US West Coast and meant “7.8% of Google Compute Engine projects had reduced quotas.”
Which meant that you couldn't get all the cloudy power you'd paid for. Or as Google puts it, “If reduced quota was applied to your project and your usage reached this reduced quota you would have been unable to create new resources during this incident.” And your workloads would therefore have had to get by with fewer resources than you'd designed them to require. Which would not have been pleasant or made for a great user experience.
To understand the cause of the outage, go find a file marked “Road to hell paved with good intentions” because Google says the problem occurred because “In order to maximize ease of use for Google Compute Engine customers, in some cases we automatically raise resource quotas” and “... then provide exclusions to ensure that no quotas previously raised are reduced.”
Those increases and exclusions are handled by an algorithm, which Google “tunes” from time to time.
You can guess the rest but we'll quote it anyway “This incident occurred when one such change was made but a bug in the aforementioned exclusion process allowed some projects to have their quotas reduced.”
In other words, someone cocked up either the algorithm, its implementation, or all of the above.
Which is good news inasmuch as as at least a human was the source to the problem. And bad news that the mistake slipped through testing. And probably a pain in the posterior if you're a Google Cloud user.