Google code cloud punts on-demand embarrassment

Mountain View's Sarah Palin moment

Combat fraud and increase customer satisfaction

Fail and You Last week, users of Google App Engine - Google's application hosting platform - discovered a new feature in the product: downtime. App Engine was offline for roughly six hours, and for much of that time, even the status page which tells users about downtime was unavailable. Now that's a strong way to send a message.

As a reminder: Google App Engine is Google's response to Amazon Web Services. Amazon has set up a scheme where customers can have full access to virtual computers and can also pay for scalability-included services like S3 for storage and a messaging system. It's a fair balance between automatic scaling and control. Google, on the other hand, offers developers a Python and Java API to its database back-end and absolutely zero control over the machines on which your application is running.

App Engine developers must go through the effort to contort their program to Google's data storage mechanism, which in some cases can be a far cry from SQL. The benefit to this is that you don't have to worry about scalability, ever. Allegedly. It's sort of like how a heroin addiction means that you don't have to worry about reality, ever.

As with anything that flies through a cloud, Google App Engine can suffer a double flame-out and crash to the ground, killing hundreds and swearing a large subset of the population off of air travel for quite some time. Google has paying customers for App Engine, and maybe Wonka doesn't quite understand this, but when people pay you for a service, they expect a certain amount of transparency and honesty.

Watching Google's response to the App Engine downtime reminded me of the cruel 2008 US Vice Presidential debates, where everyone watching just wanted to pull Sarah Palin aside and say "Sweetie, this is a grown-up event. You need to use your big-girl words now." Google's explanation for six hours of downtime was basically, "Shit got ill."

The meat of Google's postmortem on the failure was this, a message posted to the App Engine e-mail group: "There was a serious issue in one of App Engine's datacenters with GFS, Google's low level storage system. GFS underlies Bigtable, which in turn underlies App Engine's Datastore. GFS also provides storage for our application serving infrastructure, so GFS unavailability caused problems for Datastore reads and writes, as well as application serving."

Let's say that you were tasked with maintaining the computing platform for your company's web services. After six hours of service outage, your supervisor asked you for an explanation of what happened, and you follow Google's lead. You say, "There was a serious issue with one or more of our computers." Ass, meet curb.

Almost a year ago, Amazon's S3 storage service suffered roughly eight hours of downtime. Amazon's postmortem on the failure included details about specific bugs in their message passing system, and how their wonderfully scalable system could also scale errors quite wonderfully. Amazon identified an oversight in their own code with respect to error checking, so as a customer, you could be sure that somebody is on that shit.

Google, on the other hand, doesn't feel like telling anyone exactly why GFS failed. Was it a bug in the code? Was it a traffic issue? Did Augustus Gloop fall into the chocolate river?

As far as preventing future such failures, Google is equally as tight-lipped. Their postmortem only says this: "The team has been actively working on a solution in the medium-term that would allow us to switchover data centers immediately without consistency problems."

Um, fantastic. When will that be deployed? Why specifically could you not fail over in this case? Do you realize that you're being paid for this? By comparison, Amazon outlined four changes they made to their system, in both code and monitoring, to prevent that type of failure from happening again.

One could argue over the necessity of up-time for the types of apps that appear on App Engine. After all, the world only needs so many RSS readers and Twitter clones. But this highlights the greater risk of hosting your applications in the - and oh it pains me to say this, but for the sake of brevity - cloud. Every time there is downtime like this, be it Google App Engine or Amazon Web Services or Microsoft Azure, tech pundits all tell us that it's not ready for prime time. What the fuck does that even mean? My guess: "My editor wanted me to cover this story and I lack the originality to make any meaningful contribution."

I'll go out on a limb here. Hosting production services on platforms like App Engine is never a good idea. It may be fine for some toy application or a web service that you never plan to make any money from, but when your livelihood depends on it, will you really trust the business to a company whose failure response is the technical version of "whoooa, sorry bro, my bad?" Clearly, you can draw a line when it comes to outsourcing. But for serious business, if you can't put your hands on the metal - or order someone else to put their hands on the metal - then you're due for an embarrassment. And it's nobody's fault but your own.

Google's sell really appeals to the engineers, but I hope that the decision makers can see through the bullshit. Automatic scalability? Really? Or did you guys drop too much capital expenditure on machines and have to come up with a way to make a return on that investment? Maybe it's less of a tell than I think it is, but the App Engine main product page has a prominent link to the terms of service at the top, and no link or contact information for support. Google's introverted population certainly knows that it's easier and cheaper to legalese your way out of a customer's problem than it is to hire a person to pick up the phone.

Man, supporting real people is way harder than selling ads. ®

Ted Dziuba is a co-founder at Milo.com You can read his regular Reg column, Fail and You, every other Monday.

High performance access to file storage

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
Windows 8.1, which you probably haven't upgraded to yet, ALREADY OBSOLETE
Pre-Update versions of new Windows version will no longer support patches
Microsoft TIER SMEAR changes app prices whether devs ask or not
Some go up, some go down, Redmond goes silent
Red Hat to ship RHEL 7 release candidate with a taste of container tech
Grab 'near-final' version of next Enterprise Linux next week
Ditch the sync, paddle in the Streem: Upstart offers syncless sharing
Upload, delete and carry on sharing afterwards?
prev story


Designing a defence for mobile apps
In this whitepaper learn the various considerations for defending mobile applications; from the mobile application architecture itself to the myriad testing technologies needed to properly assess mobile applications risk.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.