Google code cloud punts on-demand embarrassment

Mountain View's Sarah Palin moment

Combat fraud and increase customer satisfaction

Fail and You Last week, users of Google App Engine - Google's application hosting platform - discovered a new feature in the product: downtime. App Engine was offline for roughly six hours, and for much of that time, even the status page which tells users about downtime was unavailable. Now that's a strong way to send a message.

As a reminder: Google App Engine is Google's response to Amazon Web Services. Amazon has set up a scheme where customers can have full access to virtual computers and can also pay for scalability-included services like S3 for storage and a messaging system. It's a fair balance between automatic scaling and control. Google, on the other hand, offers developers a Python and Java API to its database back-end and absolutely zero control over the machines on which your application is running.

App Engine developers must go through the effort to contort their program to Google's data storage mechanism, which in some cases can be a far cry from SQL. The benefit to this is that you don't have to worry about scalability, ever. Allegedly. It's sort of like how a heroin addiction means that you don't have to worry about reality, ever.

As with anything that flies through a cloud, Google App Engine can suffer a double flame-out and crash to the ground, killing hundreds and swearing a large subset of the population off of air travel for quite some time. Google has paying customers for App Engine, and maybe Wonka doesn't quite understand this, but when people pay you for a service, they expect a certain amount of transparency and honesty.

Watching Google's response to the App Engine downtime reminded me of the cruel 2008 US Vice Presidential debates, where everyone watching just wanted to pull Sarah Palin aside and say "Sweetie, this is a grown-up event. You need to use your big-girl words now." Google's explanation for six hours of downtime was basically, "Shit got ill."

The meat of Google's postmortem on the failure was this, a message posted to the App Engine e-mail group: "There was a serious issue in one of App Engine's datacenters with GFS, Google's low level storage system. GFS underlies Bigtable, which in turn underlies App Engine's Datastore. GFS also provides storage for our application serving infrastructure, so GFS unavailability caused problems for Datastore reads and writes, as well as application serving."

Let's say that you were tasked with maintaining the computing platform for your company's web services. After six hours of service outage, your supervisor asked you for an explanation of what happened, and you follow Google's lead. You say, "There was a serious issue with one or more of our computers." Ass, meet curb.

Almost a year ago, Amazon's S3 storage service suffered roughly eight hours of downtime. Amazon's postmortem on the failure included details about specific bugs in their message passing system, and how their wonderfully scalable system could also scale errors quite wonderfully. Amazon identified an oversight in their own code with respect to error checking, so as a customer, you could be sure that somebody is on that shit.

Google, on the other hand, doesn't feel like telling anyone exactly why GFS failed. Was it a bug in the code? Was it a traffic issue? Did Augustus Gloop fall into the chocolate river?

As far as preventing future such failures, Google is equally as tight-lipped. Their postmortem only says this: "The team has been actively working on a solution in the medium-term that would allow us to switchover data centers immediately without consistency problems."

Um, fantastic. When will that be deployed? Why specifically could you not fail over in this case? Do you realize that you're being paid for this? By comparison, Amazon outlined four changes they made to their system, in both code and monitoring, to prevent that type of failure from happening again.

One could argue over the necessity of up-time for the types of apps that appear on App Engine. After all, the world only needs so many RSS readers and Twitter clones. But this highlights the greater risk of hosting your applications in the - and oh it pains me to say this, but for the sake of brevity - cloud. Every time there is downtime like this, be it Google App Engine or Amazon Web Services or Microsoft Azure, tech pundits all tell us that it's not ready for prime time. What the fuck does that even mean? My guess: "My editor wanted me to cover this story and I lack the originality to make any meaningful contribution."

I'll go out on a limb here. Hosting production services on platforms like App Engine is never a good idea. It may be fine for some toy application or a web service that you never plan to make any money from, but when your livelihood depends on it, will you really trust the business to a company whose failure response is the technical version of "whoooa, sorry bro, my bad?" Clearly, you can draw a line when it comes to outsourcing. But for serious business, if you can't put your hands on the metal - or order someone else to put their hands on the metal - then you're due for an embarrassment. And it's nobody's fault but your own.

Google's sell really appeals to the engineers, but I hope that the decision makers can see through the bullshit. Automatic scalability? Really? Or did you guys drop too much capital expenditure on machines and have to come up with a way to make a return on that investment? Maybe it's less of a tell than I think it is, but the App Engine main product page has a prominent link to the terms of service at the top, and no link or contact information for support. Google's introverted population certainly knows that it's easier and cheaper to legalese your way out of a customer's problem than it is to hire a person to pick up the phone.

Man, supporting real people is way harder than selling ads. ®

Ted Dziuba is a co-founder at Milo.com You can read his regular Reg column, Fail and You, every other Monday.

Combat fraud and increase customer satisfaction

More from The Register

next story
Ubuntu 14.04 LTS: Great changes, but sssh don't mention the...
Why HELLO Amazon! You weren't here last time
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Next Windows obsolescence panic is 450 days from … NOW!
The clock is ticking louder for Windows Server 2003 R2 users
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
OpenBSD founder wants to bin buggy OpenSSL library, launches fork
One Heartbleed vuln was too many for Theo de Raadt
Got Windows 8.1 Update yet? Get ready for YET ANOTHER ONE – rumor
Leaker claims big release due this fall as Microsoft herds us into the CLOUD
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
Red Hat to ship RHEL 7 release candidate with a taste of container tech
Grab 'near-final' version of next Enterprise Linux next week
Apple inaugurates free OS X beta program for world+dog
Prerelease software now open to anyone, not just developers – as long as you keep quiet
prev story


Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.