Feeds

Yahoo! exposes very own stuffed elephant code

Distributed data-crunching distro

HP ProLiant Gen8: Integrated lifecycle automation

Hadoop Summit Yahoo! has released its own Hadoop distro, an internet-scale distributed data-crunching platform based on the Apache open-source project that underpins several of the web’s highest profile sites, including Yahoo!, Facebook, and - amusingly - Microsoft’s Bing.

Inspired by Google-published research papers describing Mountain View’s proprietary software infrastructure, Hadoop is the brainchild of open-source guru Doug Cutting, the Nutch crawler founder who’s now on the Yahoo! payroll.

Yahoo! has used Hadoop code on its production infrastructure for more than a year now, and after calls from the ever-growing Hadoop community, the company is opening up its internal implementation of the project.

"We’ve put a lot of investment on our testing and deployment," Yahooligan Eric Baldeschwieler said Wednesday at the Yahoo!-sponsored Hadoop Summit in Santa Clara, California. "We’re going to take that work that we put into it and put it out on the web."

The new release - known as the Yahoo! Distribution of Hadoop - is not a commercially supported distro. "We’re not getting into a new business," Baldeschwieler explained. Yahoo! is leaving that business to Cloudera, the Silicon Valley startup that unveiled a commercial Hadoop distro this spring.

According to Baldeschwieler, Yahoo! will release code identical to that tested and deployed on the company’s internal machines. "The source code release will be exactly like we use on Yahoo! clusters," he said. And he expects Yahoo!-tweaked code will be released three to six months after general release of the Apache project code it's based on.

Yahoo! will not restrict access to the code, which will be available here from the Yahoo! developer network. It will merely require an agreement before downloading. The first release will be Hadoop version 0.20, which is now under alpha test inside the company.

Yahoo! contributes about 72 per cent of all Apache Hadoop patches. And it now uses Hadoop code to crunch data for myriad Yahoo! services, including its search index and the automated system that chooses news stories for its homepage.

Cutting cooked up Hadoop in 2004, naming the project after his son’s yellow stuffed elephant. Along with Yahoo!, one of its early users was Powerset, the semantic search engine recently acquired by Microsoft. Powerset now drives at least a small portion of Redmond’s latest Google challenge, which it insists on calling Bing. ®

The Power of One eBook: Top reasons to choose HP BladeSystem

More from The Register

next story
Apple fanbois SCREAM as update BRICKS their Macbook Airs
Ragegasm spills over as firmware upgrade kills machines
HIDDEN packet sniffer spy tech in MILLIONS of iPhones, iPads – expert
Don't panic though – Apple's backdoor is not wide open to all, guru tells us
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
Captain Kirk sets phaser to SLAUGHTER after trying new Facebook app
William Shatner less-than-impressed by Zuck's celebrity-only app
Do YOU work at Microsoft? Um. Are you SURE about that?
Nokia and marketing types first to get the bullet, says report
Microsoft takes on Chromebook with low-cost Windows laptops
Redmond's chief salesman: We're taking 'hard' decisions
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
EU dons gloves, pokes Google's deals with Android mobe makers
El Reg cops a squint at investigatory letters
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Reducing security risks from open source software
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Consolidation: the foundation for IT and business transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.