Original URL: http://www.theregister.co.uk/2011/06/08/google_software_infrastructure_dubbed_obsolete_by_ex_employee/

Ex-Google engineer dubs Goofrastructure 'truly obsolete'

MapReduce and BigTable as 'ancient, creaking dinosaurs'

By Cade Metz

Posted in Cloud, 8th June 2011 18:53 GMT

A former Google engineer who worked on a library at the heart of "nearly every Java server at Google" has dubbed the company's much-ballyhooed backend software "well and truly obsolete".

In a blog post published earlier this week, Dhanji R. Prasanna announced that he had resigned from the company, and though he praised Google in many ways, he made a point of saying that the company's famously distributed back-end is behind the times.

"Here is something you may have heard but never quite believed before: Google's vaunted scalable software infrastructure is obsolete," he wrote. "Don't get me wrong, their hardware and datacenters are the best in the world, and as far as I know, nobody is close to matching it. But the software stack on top of it is 10 years old, aging and designed for building search engines and crawlers. And it is well and truly obsolete."

As a member of the Google Wave team, Prasanna helped build the search and indexing pipelines for the ill-fated effort to reinvent communication on the web, but he also worked on Guice, a library "at the heart of nearly every single Java server at Google".

Prasanna did not immediately respond to a request to discuss his post. But he goes on to describe Google's Protocol Buffers, BigTable distributed database, and MapReduce distributed number-crunching platform as "ancient, creaking dinosaurs", compared with outside open source projects like MessagePack, JSON, and Hadoop, which is based on the ideas behind Google's MapReduce and distributed file system.

Google has previously acknowledged some short comings with the likes of MapReduce. But Prasanna went so far that newer Google infrastructure projects such as Megastore as well as developer tools such as Google Web Toolkit and Closure were "sluggish, overengineered Leviathans" compared to projects like MongoDB and jQuery. He complained that Google's new projects are "designed by engineers in a vacuum, rather than by developers who have need of tools."

Google is famously secretive about its back-end software infrastructure. It has published research papers on platforms such as the Google File System, Google MapReduce, and BigTable, but it otherwise says very little about how these platforms are used within the company. And, yes, the platforms are closed source.

On the public mailing list for Google App Engine – an online service that lets you run your own applications atop Google's infrastructure – Google developer programs engineer Ikai Lan took issue with at least some of Prasanna's post.

"The bit about Hadoop, for instance, raised a lot of eyebrows amongst Googlers who have extensive use of both (new hires with a few years Hadoop experience)," he said. "I'd also disagree that we are not rebuilding things. In fact, Google has the opposite problem of other technology companies: instead of 'don't touch it, it works!', we err on the side of 'it can be better, we should improve it - mid flight!'"

Prasanna did not actually say that Google has failed to rebuild its platforms. At one point, he specifically mentioned Megastore, a real-time, high-replication layer built atop BigTable. But he did imply that efforts to rebuild at Google are slow.

"In the short time I've been outside Google I've created entire apps in Java in the space of a single workday," he said. "I've gotten prototypes off the ground, shown it to people, or deployed them with hardly any barriers." This, however, would seem to describe a switch from any large corporation.

Google downs shot of espresso

Last year, in an interview with the Association for Computer Machinery (ACM), a Google engineer acknowledged that GFS was unsuited for low-latency, real-time applications like YouTube and Gmail, and he said that Google was working to build a new version of the file system.

Googler Matt Cutts later told The Register that this "GFS 2" was part of the company's new search infrastructure codenamed Caffeine.

Several months after that, at the launch of Google's Instant search interface, Eisar Lipkovitz, a senior director of engineering at the company, told us that within the company, GFS 2 is known as "Colossus" and that it moves the company's search indexing system off of MapReduce and onto BigTable.

A few weeks later, Google published a paper on Colossus and a new distributed data processing system known as Percolator. But according to Lipkovitz, these platforms were built specifically for search and may or may not be applied to other Google services.

For years, database guru Mike Stonebraker has criticized MapReduce and GFS, and Lipkovitz told us that Google has made "similar observations". MapReduce, he told us, is not suited to calculations that need to occur in near realtime.

Google has also said that the single-master design of GFS is a major limitation. "A single point of failure may not have been a disaster for batch-oriented applications, but it was certainly unacceptable for latency-sensitive applications, such as video serving," said Google's Sean Quinlan in his interview with the ACM. Colossus does not have this limitation.

At the moment, the open source version of Hadoop is burdened with single points of failure. But Facebook is running a version that eliminates these limitations.

In a recent conversation with The Register, Dwight Merriman, the CEO of 10gen, the company that founded the open source MongoDB distributed database, argued that MongoDB is superior to BigTable because it uses a document-oriented data model rather than tabular model.

"Today, 95 per cent of the code we're writing is in an object-oriented language," he said. "We're to the point where object-oriented programming is ubiquitous enough, having a database that works well with that sort of thing is important."

He said that Megastore is an improvement on BigTable, but that it doesn't change the database's fundamental tabular setup, and he added that most of the improvements provided by Megastore are already a part of MongoDB.

Google's coding culture

With his blog post, Prasanna was equally critical of Google's coding culture. But, he says, this was a function of the company's size. "The nature of a large company like Google is such that they reward consistent, focused performance in one area. This sounds good on the surface, but if you're a hacker at heart like me, it's really the death knell for your career.

"It means that staking out a territory and defending it is far more important than doing what it takes to get a project to its goal," he said. "Engineers who simply staked out one component in the codebase, and rejected patches so they could maintain complete control over design and implementation details had much greater rewards."

Prasanna says that he voices these opinions without bitterness. And his post does have a rather even-handed tone. In the past month or two, he says, eight of his colleagues who worked on Google Wave have left the company. Which is hardly surprising. A year after unveiling Google Wave, Google killed development on the project.

Lars Rasmussen – who designed the original Google Maps with his brother Jens before running the Google Wave project – has now defected to Facebook. ®