Feeds

Hadoop takes Big Data beyond Java

Stuffed elephant mates with Python

Designing a Defense for Mobile Applications

From Nutch to Hadoop

He mimicked GFS and MapReduce to break up large chunks of data into small pieces and search them quickly across thousands of servers, building an implementation using open source. Again, it worked - to a point. "We could do demos on 20 machines and actually get some work done, but it wasn't ready to scale to thousands of machine and it wasn't horribly reliable," Cutting said. "This reliability thing was really hard work."

It was then Yahoo! that stepped in, offering the engineers and servers needed to iron out the problems. But Yahoo! had found another use for Hadoop: to quickly analyze huge piles of data distributed in silos of servers and web properties. With Yahoo!'s vice president of Hadoop software development Eric Baldeschwieler, Cutting split out the distributed computing part of Nutch and put it into Hadoop.

Cutting said researchers in Yahoo! wanted to get access to lots of data sets for things like ads served and web server loads. "If you were a researcher in Yahoo! asking how to make ads more relevant, you didn't have all the data in one place," he said. "They started pulling data together in one place to get some early users - and they loved it."

Suddenly, Yahoo! was quickly analyzing ever-changing data on its pages to making updates in hours that had previously taken weeks, and it was shuffling ads around to follow the latest click traffic.

"What it's all about is getting people a handle on running computation on terabytes of data and getting an answer back in a small amount of time reliably," Cutting said.

With Yahoo! focused on solving cluster security, Cutting is still pushing Hadoop forward and trying to crack the problem of breaking changes. Also he wants to make take Hadoop a step further attracting non-Java developers. He's tackling both through the Avro project.

Beyond Java

Avro is a format for data interchange intended to let applications call and process data after the application has been updated or changed. Also, the goal is for applications to be written for Hadoop in languages other than Java and to let Hadoop support native MapReduce and HDFS clients in languages like Python, C, and C++.

Meanwhile, Cutting has followed other open sourcers by joining a company that's trying to sell support and services to customers using his pet technology. He joined Cloudera in August 2009. Despite Hadoop's use at some of the largest sites online, Cutting believes Hadoop is good if you're running just 20 node clusters and that it's easier than running a database server to crunch huge piles of data. Cloudera customers include NetFlix and Samsung.

And if you don't want to run Hadoop yourself, you can deploy on cloud providers like Amazon and Rackspace that are running Hadoop. "It's a little harder than spread-sheet programming but there are tools that are making it simpler," Cutting re-assured us. "The whole goal is to make it fairly simple from the outside and keep the complexity inside."

Cutting may never have planned for where Hadoop is today, but he's not letting delays to version 1.0 obstruct its future either.®

Boost IT visibility and business value

More from The Register

next story
HIDDEN packet sniffer spy tech in MILLIONS of iPhones, iPads – expert
Don't panic though – Apple's backdoor is not wide open to all, guru tells us
Chrome browser has been DRAINING PC batteries for YEARS
Google is only now fixing ancient, energy-sapping bug
Do YOU work at Microsoft? Um. Are you SURE about that?
Nokia and marketing types first to get the bullet, says report
Microsoft takes on Chromebook with low-cost Windows laptops
Redmond's chief salesman: We're taking 'hard' decisions
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
EU dons gloves, pokes Google's deals with Android mobe makers
El Reg cops a squint at investigatory letters
Big Blue Apple: IBM to sell iPads, iPhones to enterprises
iOS/2 gear loaded with apps for big biz ... uh oh BlackBerry
prev story

Whitepapers

Reducing security risks from open source software
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Consolidation: the foundation for IT and business transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.