Feeds

Hadoop - Why is Google juicing Yahoo! search?

Inside the Mountain View mind

High performance access to file storage

It's the Google equivalent of the everlasting gobstopper. And for some reason, the Mountain View Chocolate Factory has encouraged a knockoff industry among its Slugworthian rivals.

Considering the code of secrecy that typically envelops Google's internal operations, you have to wonder why the company helped foster the birth and ongoing development of Hadoop, the open-source incarnation of the new-age grid-computing platform that underpins its vast online infrastructure. Hadoop now drives at least a portion of Yahoo!'s search engine, and it runs Powerset, the basis for Microsoft's next-generation search extravaganza.

According to Christophe Bisciglia - the former Google engineer who recently jumped ship for the much-discussed Hadoop startup Cloudera - any advantages Hadoop bestows on Google's chief rivals is outweighed by the long-term benefits shoveled back into the Chocolate Factory. Famously, Hadoop is an educational tool for the next-generation of Google Oompa Loompas, and in theory its widespread adoption will eventually shove more stuff through Google's own search engine - meaning Google can serve ads and make more money.

But, it seems, the old Google arrogance is also at play. In sharing its distributed-computing genius with the rest of the world, Bisciglia says, Google "showed the world that they were right."

In 2004, Google published a pair of research papers describing its distributed file system, known as GFS, and its software framework for distributed data-crunching, known as MapReduce. And in short order, an independent developer named Doug Cutting launched an open-source project based on the two papers. He called it Hadoop after his son's yellow stuffed elephant.

By early 2006, Yahoo! was toying with the project, and the Google rival soon put Cutting on the payroll, slowly rolling Hadoop into its back-end infrastructure. The open-source platform powers the new Yahoo! Search Webmap, a mega-app that builds a database of all known web pages – complete with all the metadata needed to, shall we say, understand them. According to Yahoo! Grid Computing Pooh-Bah Eric Baldeschwieler, the fledgling app draws its map 33 per cent faster than the company's previous system - on the same hardware.

Facebook has embraced Hadoop in similar fashion. Amazon is offering the platform as a web service over its AWS virtual data center. And even Microsoft is feeding off the project's open-sourciness, thanks to its recent purchase of Powerset.

But in a very different way, Hadoop has also become a valuable tool for Google itself.

Combat fraud and increase customer satisfaction

Next page: Big Data 101

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Oh no, Joe: WinPhone users already griping over 8.1 mega-update
Hang on. Which bit of Developer Preview don't you understand?
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
Batten down the hatches, Ubuntu 14.04 LTS due in TWO DAYS
Admins dab straining server brows in advance of Trusty Tahr's long-term support landing
IRS boss on XP migration: 'Classic fix the airplane while you're flying it attempt'
Plus: Condoleezza Rice at Dropbox 'maybe she can find ... weapons of mass destruction'
Ditch the sync, paddle in the Streem: Upstart offers syncless sharing
Upload, delete and carry on sharing afterwards?
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.