Google code cloud punts on-demand embarrassment

Mountain View's Sarah Palin moment

3 Big data security analytics techniques

Fail and You Last week, users of Google App Engine - Google's application hosting platform - discovered a new feature in the product: downtime. App Engine was offline for roughly six hours, and for much of that time, even the status page which tells users about downtime was unavailable. Now that's a strong way to send a message.

As a reminder: Google App Engine is Google's response to Amazon Web Services. Amazon has set up a scheme where customers can have full access to virtual computers and can also pay for scalability-included services like S3 for storage and a messaging system. It's a fair balance between automatic scaling and control. Google, on the other hand, offers developers a Python and Java API to its database back-end and absolutely zero control over the machines on which your application is running.

App Engine developers must go through the effort to contort their program to Google's data storage mechanism, which in some cases can be a far cry from SQL. The benefit to this is that you don't have to worry about scalability, ever. Allegedly. It's sort of like how a heroin addiction means that you don't have to worry about reality, ever.

As with anything that flies through a cloud, Google App Engine can suffer a double flame-out and crash to the ground, killing hundreds and swearing a large subset of the population off of air travel for quite some time. Google has paying customers for App Engine, and maybe Wonka doesn't quite understand this, but when people pay you for a service, they expect a certain amount of transparency and honesty.

Watching Google's response to the App Engine downtime reminded me of the cruel 2008 US Vice Presidential debates, where everyone watching just wanted to pull Sarah Palin aside and say "Sweetie, this is a grown-up event. You need to use your big-girl words now." Google's explanation for six hours of downtime was basically, "Shit got ill."

The meat of Google's postmortem on the failure was this, a message posted to the App Engine e-mail group: "There was a serious issue in one of App Engine's datacenters with GFS, Google's low level storage system. GFS underlies Bigtable, which in turn underlies App Engine's Datastore. GFS also provides storage for our application serving infrastructure, so GFS unavailability caused problems for Datastore reads and writes, as well as application serving."

Let's say that you were tasked with maintaining the computing platform for your company's web services. After six hours of service outage, your supervisor asked you for an explanation of what happened, and you follow Google's lead. You say, "There was a serious issue with one or more of our computers." Ass, meet curb.

Almost a year ago, Amazon's S3 storage service suffered roughly eight hours of downtime. Amazon's postmortem on the failure included details about specific bugs in their message passing system, and how their wonderfully scalable system could also scale errors quite wonderfully. Amazon identified an oversight in their own code with respect to error checking, so as a customer, you could be sure that somebody is on that shit.

Google, on the other hand, doesn't feel like telling anyone exactly why GFS failed. Was it a bug in the code? Was it a traffic issue? Did Augustus Gloop fall into the chocolate river?

As far as preventing future such failures, Google is equally as tight-lipped. Their postmortem only says this: "The team has been actively working on a solution in the medium-term that would allow us to switchover data centers immediately without consistency problems."

Um, fantastic. When will that be deployed? Why specifically could you not fail over in this case? Do you realize that you're being paid for this? By comparison, Amazon outlined four changes they made to their system, in both code and monitoring, to prevent that type of failure from happening again.

One could argue over the necessity of up-time for the types of apps that appear on App Engine. After all, the world only needs so many RSS readers and Twitter clones. But this highlights the greater risk of hosting your applications in the - and oh it pains me to say this, but for the sake of brevity - cloud. Every time there is downtime like this, be it Google App Engine or Amazon Web Services or Microsoft Azure, tech pundits all tell us that it's not ready for prime time. What the fuck does that even mean? My guess: "My editor wanted me to cover this story and I lack the originality to make any meaningful contribution."

I'll go out on a limb here. Hosting production services on platforms like App Engine is never a good idea. It may be fine for some toy application or a web service that you never plan to make any money from, but when your livelihood depends on it, will you really trust the business to a company whose failure response is the technical version of "whoooa, sorry bro, my bad?" Clearly, you can draw a line when it comes to outsourcing. But for serious business, if you can't put your hands on the metal - or order someone else to put their hands on the metal - then you're due for an embarrassment. And it's nobody's fault but your own.

Google's sell really appeals to the engineers, but I hope that the decision makers can see through the bullshit. Automatic scalability? Really? Or did you guys drop too much capital expenditure on machines and have to come up with a way to make a return on that investment? Maybe it's less of a tell than I think it is, but the App Engine main product page has a prominent link to the terms of service at the top, and no link or contact information for support. Google's introverted population certainly knows that it's easier and cheaper to legalese your way out of a customer's problem than it is to hire a person to pick up the phone.

Man, supporting real people is way harder than selling ads. ®

Ted Dziuba is a co-founder at Milo.com You can read his regular Reg column, Fail and You, every other Monday.

Top three mobile application threats

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
Oh no, Joe: WinPhone users already griping over 8.1 mega-update
Hang on. Which bit of Developer Preview don't you understand?
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
IRS boss on XP migration: 'Classic fix the airplane while you're flying it attempt'
Plus: Condoleezza Rice at Dropbox 'maybe she can find ... weapons of mass destruction'
Ditch the sync, paddle in the Streem: Upstart offers syncless sharing
Upload, delete and carry on sharing afterwards?
New Facebook phone app allows you to stalk your mates
Nearby Friends feature goes live in a few weeks
prev story


Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.