The Register® — Biting the hand that feeds IT

Feeds

Google code cloud punts on-demand embarrassment

Mountain View's Sarah Palin moment

Regcast training : Hyper-V 3.0, VM high availability and disaster recovery

Fail and You Last week, users of Google App Engine - Google's application hosting platform - discovered a new feature in the product: downtime. App Engine was offline for roughly six hours, and for much of that time, even the status page which tells users about downtime was unavailable. Now that's a strong way to send a message.

As a reminder: Google App Engine is Google's response to Amazon Web Services. Amazon has set up a scheme where customers can have full access to virtual computers and can also pay for scalability-included services like S3 for storage and a messaging system. It's a fair balance between automatic scaling and control. Google, on the other hand, offers developers a Python and Java API to its database back-end and absolutely zero control over the machines on which your application is running.

App Engine developers must go through the effort to contort their program to Google's data storage mechanism, which in some cases can be a far cry from SQL. The benefit to this is that you don't have to worry about scalability, ever. Allegedly. It's sort of like how a heroin addiction means that you don't have to worry about reality, ever.

As with anything that flies through a cloud, Google App Engine can suffer a double flame-out and crash to the ground, killing hundreds and swearing a large subset of the population off of air travel for quite some time. Google has paying customers for App Engine, and maybe Wonka doesn't quite understand this, but when people pay you for a service, they expect a certain amount of transparency and honesty.

Watching Google's response to the App Engine downtime reminded me of the cruel 2008 US Vice Presidential debates, where everyone watching just wanted to pull Sarah Palin aside and say "Sweetie, this is a grown-up event. You need to use your big-girl words now." Google's explanation for six hours of downtime was basically, "Shit got ill."

The meat of Google's postmortem on the failure was this, a message posted to the App Engine e-mail group: "There was a serious issue in one of App Engine's datacenters with GFS, Google's low level storage system. GFS underlies Bigtable, which in turn underlies App Engine's Datastore. GFS also provides storage for our application serving infrastructure, so GFS unavailability caused problems for Datastore reads and writes, as well as application serving."

Let's say that you were tasked with maintaining the computing platform for your company's web services. After six hours of service outage, your supervisor asked you for an explanation of what happened, and you follow Google's lead. You say, "There was a serious issue with one or more of our computers." Ass, meet curb.

Almost a year ago, Amazon's S3 storage service suffered roughly eight hours of downtime. Amazon's postmortem on the failure included details about specific bugs in their message passing system, and how their wonderfully scalable system could also scale errors quite wonderfully. Amazon identified an oversight in their own code with respect to error checking, so as a customer, you could be sure that somebody is on that shit.

Google, on the other hand, doesn't feel like telling anyone exactly why GFS failed. Was it a bug in the code? Was it a traffic issue? Did Augustus Gloop fall into the chocolate river?

As far as preventing future such failures, Google is equally as tight-lipped. Their postmortem only says this: "The team has been actively working on a solution in the medium-term that would allow us to switchover data centers immediately without consistency problems."

Um, fantastic. When will that be deployed? Why specifically could you not fail over in this case? Do you realize that you're being paid for this? By comparison, Amazon outlined four changes they made to their system, in both code and monitoring, to prevent that type of failure from happening again.

One could argue over the necessity of up-time for the types of apps that appear on App Engine. After all, the world only needs so many RSS readers and Twitter clones. But this highlights the greater risk of hosting your applications in the - and oh it pains me to say this, but for the sake of brevity - cloud. Every time there is downtime like this, be it Google App Engine or Amazon Web Services or Microsoft Azure, tech pundits all tell us that it's not ready for prime time. What the fuck does that even mean? My guess: "My editor wanted me to cover this story and I lack the originality to make any meaningful contribution."

I'll go out on a limb here. Hosting production services on platforms like App Engine is never a good idea. It may be fine for some toy application or a web service that you never plan to make any money from, but when your livelihood depends on it, will you really trust the business to a company whose failure response is the technical version of "whoooa, sorry bro, my bad?" Clearly, you can draw a line when it comes to outsourcing. But for serious business, if you can't put your hands on the metal - or order someone else to put their hands on the metal - then you're due for an embarrassment. And it's nobody's fault but your own.

Google's sell really appeals to the engineers, but I hope that the decision makers can see through the bullshit. Automatic scalability? Really? Or did you guys drop too much capital expenditure on machines and have to come up with a way to make a return on that investment? Maybe it's less of a tell than I think it is, but the App Engine main product page has a prominent link to the terms of service at the top, and no link or contact information for support. Google's introverted population certainly knows that it's easier and cheaper to legalese your way out of a customer's problem than it is to hire a person to pick up the phone.

Man, supporting real people is way harder than selling ads. ®

Ted Dziuba is a co-founder at Milo.com You can read his regular Reg column, Fail and You, every other Monday.

Agentless Backup is Not a Myth

Latest Comments

A fail and you I liked. Shurely shome mishtake?

Day-umn, Ted, there's been more abuse aimed at someone elses negative comment than at your article. A first for a F&Y post, surely?

Actually, I usually think Fail and You is a vehicle for your self-righteous drivel, but I liked/agreed with this one:

1. It made sense, and;

2. your swear box only has a couple of quid in it.

I don't know if that means you consider you missed your target audience? :)

0
0

CBG

Best Dziubaism Ever.

@Kanhef: Hello, you must be new to the world of software development. The customers *do* need to know. Everyone gives some details out about failures. That's how the customers gauge reliability of a company. A company that slaps an eternal beta onto their product names and is always tight lipped about failures or snafus is not a trustworthy company. I understand that this may go against the fanboi credo if you are one, but my advice to you is - never form your own company.

0
0

@Apocalypse Later

Ok, I'll bite, even though everyone else seems to have beaten me to it. You keep putting liberalism and socialism into the same boat. Classic Liberalism is a philosophy that believes in small government, individual freedoms, and free markets. Socialism is about significant state control, hence large government, and enforced equality for all (flatter wage structures etc.). So socialism is very different to liberalism. Your conflation of the 2 concepts shows a deep lack of understanding, and makes you into the biggest fail I've seen on these forums in a long time.

0
0

More from The Register

Bjarne Again: Hallelujah for C++
Plus: Now officially OK to admit you never used STL algorithms
Interwebs taunt Sir Jony over Apple eye candy makeover
Hey Ive, Ive... add more unicorns, willya?
SCO vs. IBM battle resumes over ownership of Unix
Zombie lawsuit back and wants to suck the brains out of Linux
Red Hat to ditch MySQL for MariaDB in RHEL 7
So long, Oracle! Don't let the door hit you on the way out
Shy? Socially inadequate? Fiddling with your phone could help
App 'tells the brutal truth' about social inadequates' chatup lines
Java EE 7 melds HTML5 with enterprise apps
New release arrives with GlassFish, NetBeans support
 breaking news
'Office Facebook' firm Tibbr wants you to PAY for mobe-meetings app
Great idea. Punters won't cough for it though
 breaking news
The only Waze is Google: Ad giant tipped to gobble map app 'for $1.3bn'
Pac-Man-satnav-ish upstart in bidding war with Apple, Facebook
 breaking news
PM Cameron calls for modern, programmable computers! (We think)
IT education musings to G8 chiefs to mystify IT industry
Apple at WWDC: Sleek new iOS, death of the big cats, pint-sized Mac Pro
CEO Cook: 'The biggest change to iOS since the introduction of the iPhone'