Feeds

Yahoo! cuddles Google's bastard grid-child

Gets 'butt kicking' from Microsoft

Top three mobile application threats

Stuffed Elephant Summit Sometime after the New Year, Yahoo! flipped the switch on what it calls the world’s largest Hadoop application, using the much-hyped open-source grid computing platform to tackle a task no smaller than the web itself.

Known as Yahoo! Search Webmap, this Hadoopified mega-app provides the world’s second most popular search engine with a database of all known web pages – complete with all the metadata needed to understand those pages. Yes, Yahoo! has crunched such data for years now, but thanks to Hadoop - an Apache project that mimics the GFS and MapReduce grid technologies developed at Google – Webmap can deliver the goods significantly faster than the company’s old school setup.

"When building our web index, one of the things we do is build a graph of all the links on the web. We start with all the web pages we know of. We extract links and other metadata. And then we aggregate up a big system-wide view of the Web," Yahoo! Grid Computing Pooh-Bah Eric Baldeschwieler told The Reg at the Yahoo!-sponsored Hadoop developer’s summit in Santa Clara, California. "With Webmap, we can do this 33 per cent faster on the same hardware."

According to Baldeschwieler, this far exceeded expectations. "The previous system – which we built in 2000 – was all C++. And with the new system we moved to Java," he explained. "The belief was that moving to Java would slow everything and that we would pay a penalty in moving to the new framework.

"When it’s running perfectly, the old system does outperform the new one. But of course hardware fails and there are all sorts of scenarios under which the old system doesn’t perform perfectly. Hadoop gives us much more flexibility. It’s built around the idea of running commodity hardware that runs all the time."

Hadoop is the bastard brainchild of Google and a man named Doug Cutting. Back in 2004, while developing Nutch, an open source search engine, Cutting realized that his engine wouldn't purr unless it was juiced with some sort of distributed computing platform. And for reasons unknown, Google had just published a pair of research papers that detailed GFS, its distributed file system, and MapReduce, a means of pooling processing power.

So Cutting and his open-source pals went to work on a project that would duplicate Google’s technologies – and maybe even (cough) improve them. He dubbed the project Hadoop after his son’s yellow stuffed elephant.

By early 2006, Yahoo! was flirting with the project, and the company soon gave Cutting a job. At the time, Hadoop and Nutch ran on just 20 nodes, indexing about 100m web pages. Two years later, Hadoop and Yahoo! Search Webmap run on 10,000 processor cores, indexing, um, many more web pages. "I can’t say exactly how many," Cutting told the Stuffed Elephant Summit. "Let’s just say it’s far in excess of 100 million."

But Yahoo! isn’t the only one that’s fallen for Hadoop. IBM Research turned up at the summit to show off JAQL, a language suited to building JSON (JavaScript Object Notation) apps atop Hadoop. Amazon, a summit co-sponsor, discussed the benefits of running Hadoop on its EC2 web services. And more than 350 developers turned up to listen – though Yahoo! had originally expected fewer than 100.

Baldeschwieler also pointed out that 28 separate developers have trumpeted their Hadoop clusters on the official Hadoop wiki. "And that’s just a small fraction of the people using it," he said. "I would guess 100s of organizations have adopted the platform. There’s definitely a lot of interest - and a lot of discussion."

Microsoft is not one of those organizations. Redmond’s research arm is building its own grid-computing platform, Dryad. And Dryad is not open source. But that didn’t stop the company from attending a conference dedicated to all things Hadoop.

Microsoft’s Michael Isard used his half hour to trumpet DryadLINQ, a programming language that mirrors IBM’s JAQL and a Yahoo!-led open source initiative called Pig. Except that it doesn’t run on Hadoop. It runs on Dryad.

At least one developer was mighty impressed with the presentation. But he still wondered whether DryadLINQ was already irrelevant. "I think you’re kicking everyone’s butt. You’re already working on a higher level of abstraction than anyone else," he told Isard. "But since you’re proprietary technology, we’ll have to wait and see how effective you’ll be."

Of course, Google’s grid computing technologies are also proprietary. But that's a different matter. You can debate the merits of Hadoop and Dryad all you want - but they're both playing catch-up. ®

High performance access to file storage

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Kingston DataTraveler MicroDuo: Turn your phone into a 72GB beast
USB-usiness in the front, micro-USB party in the back
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
BOFH: Oh DO tell us what you think. *CLICK*
$%%&amp Oh dear, we've been cut *CLICK* Well hello *CLICK* You're breaking up...
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
AMD's 'Seattle' 64-bit ARM server chips now sampling, set to launch in late 2014
But they won't appear in SeaMicro Fabric Compute Systems anytime soon
Amazon reveals its Google-killing 'R3' server instances
A mega-memory instance that never forgets
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.