App Engine: Google's deepest secrets as a service
The software scales. But will the Google rulebook?
Er, where's MySQL?
For the average developer, one of the biggest challenges is writing for App Engine's Datastore API. On App Engine, you can't write to the file system. You have to use the Datastore API, the Memcache API, or other services provided by Google, and the Datastore API – based on BigTable – uses a data model vastly different from the relational databases most coders are used to.
"It takes some wrestling from a design standpoint, because it tends to want you to design your back end in a de-normalized sort of way," says Will Merydith, who's using App Engine to build a gaming platform known as SuperKablamo. "Unlike a MySQL environment, where you can do all these really interesting joins, you can't really do that with Datastore."
But you can scale. And with the rise of open source databases such as MongoDB and HBase, which grew out of Google's BigTable research paper, many developers are already adjusting to the non-relational mindset. "Two or three years ago, people said: 'Moving off relational? That's a pretty high hurdle. I don't think that's going to happen'. But now it's definitely happening," says Dwight Merriman, the CEO and cofounder of 10gen, MongoDB's chief steward.
"There is a pretty big hurdle to clear," he says. "Everybody learned relational in school and there's a lot of tooling around it that already exists and there's a lot of legacy code that uses it. ... But what has helped people get over the hurdle is the scaling imperative. Computer architectures are changing, and in the cloud we need horizontal scaling. Really, there's no choice."
Sean Lynch believes that if you've used MongoDB or HBase, you'll take to the Datastore API rather easily. "While it is close to a couple of competitors out there in terms of functionality, it's not identical, so there are things that would be specific to App Engine when you're using something like the Datastore," he says. "But it wouldn't be a big conceptual leap if you were moving from MongoDB or HBase to App Engine. The model should feel very much the same."
But even if you're steeped in this sort of non-relational data model, App Engine takes some getting used to. You can't throw any old code onto App Engine. You can use Python and many languages that compile to Java byte code, including Java, JRuby, Jython, Scala, and Groovy. And now there's an SDK for Go. But that's it. And you can only use certain Python libraries and only certain Java byte code languages, libraries, and frameworks.
What's more, App Engine restricts you to a tight sandbox. Your application can only access other net machines through App Engine's URL fetch, email, and XMPP services. Other machines can only connect to your application via HTTP requests on the standard ports. And an application code only runs in response to a web request, a queued task, or a scheduled task. An app must return a response to a web request within 30 seconds, while tasks (queued or scheduled) have ten minutes to complete.
These restrictions are in place, Lynch says, so that applications can scale, but also for security reasons. "There are some limitations around how long the code will run, and you can't use sockets. If you're used to just dropping in some code to call some socket somewhere, that may not work," he says. "But this is a balancing act. Do you give as much flexibility as possible or do you bake in security from the get-go? It's much easier to build in the security from the base layer and build from there, than it is to try and add the security layer later.
"The other nice thing is that we've been able to scale up over the last couple of years with a very high confidence that things aren't affecting each other when they're scaling. When they're moving around, there are not ways you could touch someone else's memory."
He adds that, as this confidence builds, Google is "peeling back" some restrictions. With App Engine version 1.4.3, it added the ability to do concurrent requests with Java applications (concurrency for Python is on the way). And in version 1.5.0, it introduced "Backends" – long-running, high-memory instances. These have a 24-hour request deadline, and they can use between 128MB and 1GB of memory and a proportional amount of CPU power.
Previously, says developer Jeff Schnitzer, he couldn't run an in-memory index on the service. "You have these little instances that are limited to about 100MB of RAM, and you don't really have a lot of control over how many of them are, and you can't individually address them. For a whole variety of reasons, you just don't have access to large quantities of RAM," he explains. But with Backends, such an index is possible.
The added catch is that – at least for now – Backends are too expensive for Schnitzer's purposes. He has already moved his in-memory index to Amazon. For Schnitzer, at least, App Engine's limitations ensure that applications often spill over into other clouds – or local servers. "I think this is something that anyone who spends a lot of time with app engine realizes," he says. "App Engine isn't really a complete set of tools. You have to have parts of an app in App Engine and parts in other places."
Sponsored: RAID: End of an era?