Feeds

Microsoft pact holds gun to Yahoo!'s stuffed elephant

Just when Yahoo! was relevant again...

Beginner's guide to SSL certificates

Updated You didn't shed a tear over the death of Yahoo!'s independent search engine? That may change.

As the two companies finally ended the epic gestation period for their inevitable web search pact, Yahoo! and Microsoft announced that Bing - Redmond's fledgling decision engine search engine - will be "the exclusive algorithmic search and paid search platform for Yahoo! sites." And though the two Google chasers made it clear that Yahoo! will continue to use its own technologies to drive other areas of its business, you have to wonder what the pact means for the future of Hadoop, the open-source grid platform that had finally restored Yahoo!'s mojo.

Yahoo! is the largest contributor to the increasingly popular Apache project, contributing more than 70 per cent of all patches, and it employs the project's founder, Nutch-crawler-creator Doug Cutting. But in signing its pact with Microsoft, it would appear that the company has agreed to bury its largest Hadoop application: the Yahoo! Search Webmap.

The Webmap - which provides the Yahoo! search engine with a database of all known web pages, complete with all the necessary metadata - has also been described (by Yahoo!) as the world's largest Hadoop application. And though Hadoop powers other portions of Yahoo!, it's unclear whether the company will put as much time and money into moving the platform forward. Yahoo! has not responded to our requests for comment. Nor has Microsoft.

Redmond told Cnet that it's "open" to merging Bing with Yahoo!'s Searchmonkey platform, a misguided effort to expose the company's search results to third party developers. But although Bing's "reference vertical" uses Hadoop - thanks to the acquisition of semantic search startup Powerset - it seems unlikely that Redmond would embrace Hadoop on Bing proper. Indeed, Powerset's general manager has told us that nearly a year after the startup's acquisition, Microsoft has made no plans to do so.

Even if it did, that's beside the point. The point here is that Yahoo! - Hadoop's godfather - is giving up the crown jewel in its Hadoop empire.

Inspired by Google-published research papers describing Mountain View’s proprietary software infrastructure, Hadoop is a means of crunching epic amounts of data across a network of distributed machines. Doug Cutting originally developed the platform for use with Nutch, naming it after his son's stuffed elephant. But in 2006, he was hired by Yahoo!, and by the beginning of last year Hadoop had made its way onto Yahoo! production systems.

Webmap is the big example. But Yahoo! does use Hadoop for various other tasks. The platform now powers the real-time automated algorithms that select news stories for the Yahoo! home page. And in some cases it's used to optimize ads - i.e. to match content with relevant advertising.

Presumably, Hadoop will continue to drive these non-search tools. But does that mean Yahoo! will continue to put its considerably weight behind the project's continued development?

Christophe Bisciglia is confident that Yahoo!'s commitment will remain. "Hadoop isn't just about search," says Bisciglia, one of the minds behind Cloudera, a Silicon Valley startup offering a commercialized version of Hadoop. "Over the coming months, we will likely see Yahoo! shift resources towards the advertising and content businesses, but Hadoop plays a critical role there as well, so even if the clients for Hadoop change a bit, I don't see the overall investment from Y! decreasing.

"The expensive part of operating a search business is the hardware itself - not the development team working on Hadoop. If anything, this will better position their Hadoop team to attack challenges that have more impact on Yahoo!'s bottom line."

Granted, Bisciglia has a certain interest in Yahoo! maintaining its Hadoop efforts. But let's hope he's right. The destruction of Yahoo!'s search engine comes just as Hadoop is taking off. It underpins Facebook's backend infrastructure. It's offered up from Amazon's Web Services cloud. And last month's Hadoop Summit - driven by, yes, Yahoo! - attracted more than 700 developers from around the globe.

What's more, Hadoop had finally made Yahoo! relevant again. Yes, the project was inspired by work done at Google. But whereas Google has kept GFS and MapReduce largely hidden behind the walls of the Mountain View Chocolate Factory, Yahoo! has embraced this new-age distributed computing paradigm as an open source project, inspiring countless other developers and web outfits along the way. And at least until Google says otherwise, the open-source incarnation of MapReduce is outperforming the original.

After years as a frivolous headline that few actually bothered to click on, Yahoo! has finally found its mojo. What a shame it would be if Microsoft took it away. ®

Update

With a blog post Thursday morning, after this story was published, Hadoop development VP Eric Baldeschwieler has reaffirmed Yahoo!'s commitment to the project. "Don't Panic!," he wrote. "We are as committed as ever to building a world class open source Cloud Computing infrastructure and Apache Hadoop remains our solution for batch computing. Hadoop is used to solve many, many internet scale problems beyond search at Yahoo. Today's deal only improves Yahoo's ability to invest in Hadoop.

"Yahoo is buzzing with more energy and bigger plans than ever before. The Hadoop team is running to keep up with our internal customers demands for ever larger, faster and better clusters. We are all looking forward to working with you, the wider Hadoop community, to build the better Hadoop that we all want."

Remote control for virtualized desktops

More from The Register

next story
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Stop the IoT revolution! We need to figure out packet sizes first
Researchers test 802.15.4 and find we know nuh-think! about large scale sensor network ops
SanDisk vows: We'll have a 16TB SSD WHOPPER by 2016
Flash WORM has a serious use for archived photos and videos
Astro-boffins start opening universe simulation data
Got a supercomputer? Want to simulate a universe? Here you go
Microsoft adds video offering to Office 365. Oh NOES, you'll need Adobe Flash
Lovely presentations... but not on your Flash-hating mobe
prev story

Whitepapers

Driving business with continuous operational intelligence
Introducing an innovative approach offered by ExtraHop for producing continuous operational intelligence.
Why CIOs should rethink endpoint data protection in the age of mobility
Assessing trends in data protection, specifically with respect to mobile devices, BYOD, and remote employees.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Protecting against web application threats using SSL
SSL encryption can protect server‐to‐server communications, client devices, cloud resources, and other endpoints in order to help prevent the risk of data loss and losing customer trust.