Feeds

Microsoft pact holds gun to Yahoo!'s stuffed elephant

Just when Yahoo! was relevant again...

Security for virtualized datacentres

Updated You didn't shed a tear over the death of Yahoo!'s independent search engine? That may change.

As the two companies finally ended the epic gestation period for their inevitable web search pact, Yahoo! and Microsoft announced that Bing - Redmond's fledgling decision engine search engine - will be "the exclusive algorithmic search and paid search platform for Yahoo! sites." And though the two Google chasers made it clear that Yahoo! will continue to use its own technologies to drive other areas of its business, you have to wonder what the pact means for the future of Hadoop, the open-source grid platform that had finally restored Yahoo!'s mojo.

Yahoo! is the largest contributor to the increasingly popular Apache project, contributing more than 70 per cent of all patches, and it employs the project's founder, Nutch-crawler-creator Doug Cutting. But in signing its pact with Microsoft, it would appear that the company has agreed to bury its largest Hadoop application: the Yahoo! Search Webmap.

The Webmap - which provides the Yahoo! search engine with a database of all known web pages, complete with all the necessary metadata - has also been described (by Yahoo!) as the world's largest Hadoop application. And though Hadoop powers other portions of Yahoo!, it's unclear whether the company will put as much time and money into moving the platform forward. Yahoo! has not responded to our requests for comment. Nor has Microsoft.

Redmond told Cnet that it's "open" to merging Bing with Yahoo!'s Searchmonkey platform, a misguided effort to expose the company's search results to third party developers. But although Bing's "reference vertical" uses Hadoop - thanks to the acquisition of semantic search startup Powerset - it seems unlikely that Redmond would embrace Hadoop on Bing proper. Indeed, Powerset's general manager has told us that nearly a year after the startup's acquisition, Microsoft has made no plans to do so.

Even if it did, that's beside the point. The point here is that Yahoo! - Hadoop's godfather - is giving up the crown jewel in its Hadoop empire.

Inspired by Google-published research papers describing Mountain View’s proprietary software infrastructure, Hadoop is a means of crunching epic amounts of data across a network of distributed machines. Doug Cutting originally developed the platform for use with Nutch, naming it after his son's stuffed elephant. But in 2006, he was hired by Yahoo!, and by the beginning of last year Hadoop had made its way onto Yahoo! production systems.

Webmap is the big example. But Yahoo! does use Hadoop for various other tasks. The platform now powers the real-time automated algorithms that select news stories for the Yahoo! home page. And in some cases it's used to optimize ads - i.e. to match content with relevant advertising.

Presumably, Hadoop will continue to drive these non-search tools. But does that mean Yahoo! will continue to put its considerably weight behind the project's continued development?

Christophe Bisciglia is confident that Yahoo!'s commitment will remain. "Hadoop isn't just about search," says Bisciglia, one of the minds behind Cloudera, a Silicon Valley startup offering a commercialized version of Hadoop. "Over the coming months, we will likely see Yahoo! shift resources towards the advertising and content businesses, but Hadoop plays a critical role there as well, so even if the clients for Hadoop change a bit, I don't see the overall investment from Y! decreasing.

"The expensive part of operating a search business is the hardware itself - not the development team working on Hadoop. If anything, this will better position their Hadoop team to attack challenges that have more impact on Yahoo!'s bottom line."

Granted, Bisciglia has a certain interest in Yahoo! maintaining its Hadoop efforts. But let's hope he's right. The destruction of Yahoo!'s search engine comes just as Hadoop is taking off. It underpins Facebook's backend infrastructure. It's offered up from Amazon's Web Services cloud. And last month's Hadoop Summit - driven by, yes, Yahoo! - attracted more than 700 developers from around the globe.

What's more, Hadoop had finally made Yahoo! relevant again. Yes, the project was inspired by work done at Google. But whereas Google has kept GFS and MapReduce largely hidden behind the walls of the Mountain View Chocolate Factory, Yahoo! has embraced this new-age distributed computing paradigm as an open source project, inspiring countless other developers and web outfits along the way. And at least until Google says otherwise, the open-source incarnation of MapReduce is outperforming the original.

After years as a frivolous headline that few actually bothered to click on, Yahoo! has finally found its mojo. What a shame it would be if Microsoft took it away. ®

Update

With a blog post Thursday morning, after this story was published, Hadoop development VP Eric Baldeschwieler has reaffirmed Yahoo!'s commitment to the project. "Don't Panic!," he wrote. "We are as committed as ever to building a world class open source Cloud Computing infrastructure and Apache Hadoop remains our solution for batch computing. Hadoop is used to solve many, many internet scale problems beyond search at Yahoo. Today's deal only improves Yahoo's ability to invest in Hadoop.

"Yahoo is buzzing with more energy and bigger plans than ever before. The Hadoop team is running to keep up with our internal customers demands for ever larger, faster and better clusters. We are all looking forward to working with you, the wider Hadoop community, to build the better Hadoop that we all want."

Security for virtualized datacentres

More from The Register

next story
Just don't blame Bono! Apple iTunes music sales PLUMMET
Cupertino revenue hit by cheapo downloads, says report
The DRUGSTORES DON'T WORK, CVS makes IT WORSE ... for Apple Pay
Goog Wallet apparently also spurned in NFC lockdown
Hey - who wants 4.8 TERABYTES almost AS FAST AS MEMORY?
China's Memblaze says they've got it in PCIe. Yow
Cray-cray Met Office spaffs £97m on VERY AVERAGE HPC box
Only 250th most powerful in the world? Bring back Michael Fish
IBM, backing away from hardware? NEVER!
Don't be so sure, so-surers
Microsoft brings the CLOUD that GOES ON FOREVER
Sky's the limit with unrestricted space in the cloud
'ANYTHING BUT STABLE' Netflix suffers BIG Europe-wide outage
Friday night LIVE? Nope. The only thing streaming are tears down my face
Google roolz! Nest buys Revolv, KILLS new sales of home hub
Take my temperature, I'm feeling a little bit dizzy
prev story

Whitepapers

Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
Internet Security Threat Report 2014
An overview and analysis of the year in global threat activity: identify, analyze, and provide commentary on emerging trends in the dynamic threat landscape.