Feeds

Microsoft pact holds gun to Yahoo!'s stuffed elephant

Just when Yahoo! was relevant again...

Beginner's guide to SSL certificates

Updated You didn't shed a tear over the death of Yahoo!'s independent search engine? That may change.

As the two companies finally ended the epic gestation period for their inevitable web search pact, Yahoo! and Microsoft announced that Bing - Redmond's fledgling decision engine search engine - will be "the exclusive algorithmic search and paid search platform for Yahoo! sites." And though the two Google chasers made it clear that Yahoo! will continue to use its own technologies to drive other areas of its business, you have to wonder what the pact means for the future of Hadoop, the open-source grid platform that had finally restored Yahoo!'s mojo.

Yahoo! is the largest contributor to the increasingly popular Apache project, contributing more than 70 per cent of all patches, and it employs the project's founder, Nutch-crawler-creator Doug Cutting. But in signing its pact with Microsoft, it would appear that the company has agreed to bury its largest Hadoop application: the Yahoo! Search Webmap.

The Webmap - which provides the Yahoo! search engine with a database of all known web pages, complete with all the necessary metadata - has also been described (by Yahoo!) as the world's largest Hadoop application. And though Hadoop powers other portions of Yahoo!, it's unclear whether the company will put as much time and money into moving the platform forward. Yahoo! has not responded to our requests for comment. Nor has Microsoft.

Redmond told Cnet that it's "open" to merging Bing with Yahoo!'s Searchmonkey platform, a misguided effort to expose the company's search results to third party developers. But although Bing's "reference vertical" uses Hadoop - thanks to the acquisition of semantic search startup Powerset - it seems unlikely that Redmond would embrace Hadoop on Bing proper. Indeed, Powerset's general manager has told us that nearly a year after the startup's acquisition, Microsoft has made no plans to do so.

Even if it did, that's beside the point. The point here is that Yahoo! - Hadoop's godfather - is giving up the crown jewel in its Hadoop empire.

Inspired by Google-published research papers describing Mountain View’s proprietary software infrastructure, Hadoop is a means of crunching epic amounts of data across a network of distributed machines. Doug Cutting originally developed the platform for use with Nutch, naming it after his son's stuffed elephant. But in 2006, he was hired by Yahoo!, and by the beginning of last year Hadoop had made its way onto Yahoo! production systems.

Webmap is the big example. But Yahoo! does use Hadoop for various other tasks. The platform now powers the real-time automated algorithms that select news stories for the Yahoo! home page. And in some cases it's used to optimize ads - i.e. to match content with relevant advertising.

Presumably, Hadoop will continue to drive these non-search tools. But does that mean Yahoo! will continue to put its considerably weight behind the project's continued development?

Christophe Bisciglia is confident that Yahoo!'s commitment will remain. "Hadoop isn't just about search," says Bisciglia, one of the minds behind Cloudera, a Silicon Valley startup offering a commercialized version of Hadoop. "Over the coming months, we will likely see Yahoo! shift resources towards the advertising and content businesses, but Hadoop plays a critical role there as well, so even if the clients for Hadoop change a bit, I don't see the overall investment from Y! decreasing.

"The expensive part of operating a search business is the hardware itself - not the development team working on Hadoop. If anything, this will better position their Hadoop team to attack challenges that have more impact on Yahoo!'s bottom line."

Granted, Bisciglia has a certain interest in Yahoo! maintaining its Hadoop efforts. But let's hope he's right. The destruction of Yahoo!'s search engine comes just as Hadoop is taking off. It underpins Facebook's backend infrastructure. It's offered up from Amazon's Web Services cloud. And last month's Hadoop Summit - driven by, yes, Yahoo! - attracted more than 700 developers from around the globe.

What's more, Hadoop had finally made Yahoo! relevant again. Yes, the project was inspired by work done at Google. But whereas Google has kept GFS and MapReduce largely hidden behind the walls of the Mountain View Chocolate Factory, Yahoo! has embraced this new-age distributed computing paradigm as an open source project, inspiring countless other developers and web outfits along the way. And at least until Google says otherwise, the open-source incarnation of MapReduce is outperforming the original.

After years as a frivolous headline that few actually bothered to click on, Yahoo! has finally found its mojo. What a shame it would be if Microsoft took it away. ®

Update

With a blog post Thursday morning, after this story was published, Hadoop development VP Eric Baldeschwieler has reaffirmed Yahoo!'s commitment to the project. "Don't Panic!," he wrote. "We are as committed as ever to building a world class open source Cloud Computing infrastructure and Apache Hadoop remains our solution for batch computing. Hadoop is used to solve many, many internet scale problems beyond search at Yahoo. Today's deal only improves Yahoo's ability to invest in Hadoop.

"Yahoo is buzzing with more energy and bigger plans than ever before. The Hadoop team is running to keep up with our internal customers demands for ever larger, faster and better clusters. We are all looking forward to working with you, the wider Hadoop community, to build the better Hadoop that we all want."

Beginner's guide to SSL certificates

More from The Register

next story
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
NASA launches new climate model at SC14
75 days of supercomputing later ...
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
DEATH by COMMENTS: WordPress XSS vuln is BIGGEST for YEARS
Trio of XSS turns attackers into admins
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Astro-boffins start opening universe simulation data
Got a supercomputer? Want to simulate a universe? Here you go
prev story

Whitepapers

Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
How to determine if cloud backup is right for your servers
Two key factors, technical feasibility and TCO economics, that backup and IT operations managers should consider when assessing cloud backup.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Internet Security Threat Report 2014
An overview and analysis of the year in global threat activity: identify, analyze, and provide commentary on emerging trends in the dynamic threat landscape.