Feeds

Hadoop - Why is Google juicing Yahoo! search?

Inside the Mountain View mind

5 things you didn’t know about cloud backup

It's the Google equivalent of the everlasting gobstopper. And for some reason, the Mountain View Chocolate Factory has encouraged a knockoff industry among its Slugworthian rivals.

Considering the code of secrecy that typically envelops Google's internal operations, you have to wonder why the company helped foster the birth and ongoing development of Hadoop, the open-source incarnation of the new-age grid-computing platform that underpins its vast online infrastructure. Hadoop now drives at least a portion of Yahoo!'s search engine, and it runs Powerset, the basis for Microsoft's next-generation search extravaganza.

According to Christophe Bisciglia - the former Google engineer who recently jumped ship for the much-discussed Hadoop startup Cloudera - any advantages Hadoop bestows on Google's chief rivals is outweighed by the long-term benefits shoveled back into the Chocolate Factory. Famously, Hadoop is an educational tool for the next-generation of Google Oompa Loompas, and in theory its widespread adoption will eventually shove more stuff through Google's own search engine - meaning Google can serve ads and make more money.

But, it seems, the old Google arrogance is also at play. In sharing its distributed-computing genius with the rest of the world, Bisciglia says, Google "showed the world that they were right."

In 2004, Google published a pair of research papers describing its distributed file system, known as GFS, and its software framework for distributed data-crunching, known as MapReduce. And in short order, an independent developer named Doug Cutting launched an open-source project based on the two papers. He called it Hadoop after his son's yellow stuffed elephant.

By early 2006, Yahoo! was toying with the project, and the Google rival soon put Cutting on the payroll, slowly rolling Hadoop into its back-end infrastructure. The open-source platform powers the new Yahoo! Search Webmap, a mega-app that builds a database of all known web pages – complete with all the metadata needed to, shall we say, understand them. According to Yahoo! Grid Computing Pooh-Bah Eric Baldeschwieler, the fledgling app draws its map 33 per cent faster than the company's previous system - on the same hardware.

Facebook has embraced Hadoop in similar fashion. Amazon is offering the platform as a web service over its AWS virtual data center. And even Microsoft is feeding off the project's open-sourciness, thanks to its recent purchase of Powerset.

But in a very different way, Hadoop has also become a valuable tool for Google itself.

Build a business case: developing custom apps

Next page: Big Data 101

More from The Register

next story
Why has the web gone to hell? Market chaos and HUMAN NATURE
Tim Berners-Lee isn't happy, but we should be
Linux turns 23 and Linus Torvalds celebrates as only he can
No, not with swearing, but by controlling the release cycle
Apple promises to lift Curse of the Drained iPhone 5 Battery
Have you tried turning it off and...? Never mind, here's a replacement
Sin COS to tan Windows? Chinese operating system to debut in autumn – report
Development alliance working on desktop, mobe software
Eat up Martha! Microsoft slings handwriting recog into OneNote on Android
Freehand input on non-Windows kit for the first time
Linux kernel devs made to finger their dongles before contributing code
Two-factor auth enabled for Kernel.org repositories
This is how I set about making a fortune with my own startup
Would you leave your well-paid job to chase your dream?
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Endpoint data privacy in the cloud is easier than you think
Innovations in encryption and storage resolve issues of data privacy and key requirements for companies to look for in a solution.
Scale data protection with your virtual environment
To scale at the rate of virtualization growth, data protection solutions need to adopt new capabilities and simplify current features.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?