Bungled storage upgrade led to Google cloud brownout
Credentials copied to new storage, but software looked for the old storage
Google's 'fessed up to another bungle that browned-out its cloud.
The incident in question meant Google App Engine applications “ received errors when issuing authenticated calls to Google APIs over a period of 17 hours and 3 minutes.”
The cause, Google says was that its “... engineers have recently carried out a migration of the Google Accounts system to a new storage backend, which included copying API authentication service credentials data and redirecting API calls to the new backend.”
“To complete this migration, credentials were scheduled to be deleted from the previous storage backend.”
You can guess what came next, namely “a software bug” that meant “the API authentication service continued to look up some credentials, including those used by Google App Engine service accounts, in the old storage backend. As these credentials were progressively deleted, their corresponding service accounts could no longer be authenticated.”
Google's again promised to review its processes to make sure things like this don't happen again, and again said “Our customers rely on us to provide a superior service and we regret we did not live up to expectations in this case.”
Nor in the case when a single case-sensitive variable name took down some of the company's cloud. Or the many other brownouts and errors the company's reported this year.
Google does publish rather more detailed error reports than its rivals, often when a small portion of its customers are affected. While doing so is laudable, it also reveals simple mistakes are more often the problem rather than lightning strikes. Incoming cloud czarina Diane Greene has an interesting gig ahead of her. ®