Feeds

Hadoop - Why is Google juicing Yahoo! search?

Inside the Mountain View mind

The essential guide to IT transformation

It's the Google equivalent of the everlasting gobstopper. And for some reason, the Mountain View Chocolate Factory has encouraged a knockoff industry among its Slugworthian rivals.

Considering the code of secrecy that typically envelops Google's internal operations, you have to wonder why the company helped foster the birth and ongoing development of Hadoop, the open-source incarnation of the new-age grid-computing platform that underpins its vast online infrastructure. Hadoop now drives at least a portion of Yahoo!'s search engine, and it runs Powerset, the basis for Microsoft's next-generation search extravaganza.

According to Christophe Bisciglia - the former Google engineer who recently jumped ship for the much-discussed Hadoop startup Cloudera - any advantages Hadoop bestows on Google's chief rivals is outweighed by the long-term benefits shoveled back into the Chocolate Factory. Famously, Hadoop is an educational tool for the next-generation of Google Oompa Loompas, and in theory its widespread adoption will eventually shove more stuff through Google's own search engine - meaning Google can serve ads and make more money.

But, it seems, the old Google arrogance is also at play. In sharing its distributed-computing genius with the rest of the world, Bisciglia says, Google "showed the world that they were right."

In 2004, Google published a pair of research papers describing its distributed file system, known as GFS, and its software framework for distributed data-crunching, known as MapReduce. And in short order, an independent developer named Doug Cutting launched an open-source project based on the two papers. He called it Hadoop after his son's yellow stuffed elephant.

By early 2006, Yahoo! was toying with the project, and the Google rival soon put Cutting on the payroll, slowly rolling Hadoop into its back-end infrastructure. The open-source platform powers the new Yahoo! Search Webmap, a mega-app that builds a database of all known web pages – complete with all the metadata needed to, shall we say, understand them. According to Yahoo! Grid Computing Pooh-Bah Eric Baldeschwieler, the fledgling app draws its map 33 per cent faster than the company's previous system - on the same hardware.

Facebook has embraced Hadoop in similar fashion. Amazon is offering the platform as a web service over its AWS virtual data center. And even Microsoft is feeding off the project's open-sourciness, thanks to its recent purchase of Powerset.

But in a very different way, Hadoop has also become a valuable tool for Google itself.

Boost IT visibility and business value

Next page: Big Data 101

More from The Register

next story
Munich considers dumping Linux for ... GULP ... Windows!
Give a penguinista a hug, the Outlook's not good for open source's poster child
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Intel's Raspberry Pi rival Galileo can now run Windows
Behold the Internet of Things. Wintel Things
Microsoft cries UNINSTALL in the wake of Blue Screens of Death™
Cache crash causes contained choloric calamity
Eat up Martha! Microsoft slings handwriting recog into OneNote on Android
Freehand input on non-Windows kit for the first time
Linux kernel devs made to finger their dongles before contributing code
Two-factor auth enabled for Kernel.org repositories
prev story

Whitepapers

5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.