Feeds

Hadoop takes Big Data beyond Java

Stuffed elephant mates with Python

High performance access to file storage

From Nutch to Hadoop

He mimicked GFS and MapReduce to break up large chunks of data into small pieces and search them quickly across thousands of servers, building an implementation using open source. Again, it worked - to a point. "We could do demos on 20 machines and actually get some work done, but it wasn't ready to scale to thousands of machine and it wasn't horribly reliable," Cutting said. "This reliability thing was really hard work."

It was then Yahoo! that stepped in, offering the engineers and servers needed to iron out the problems. But Yahoo! had found another use for Hadoop: to quickly analyze huge piles of data distributed in silos of servers and web properties. With Yahoo!'s vice president of Hadoop software development Eric Baldeschwieler, Cutting split out the distributed computing part of Nutch and put it into Hadoop.

Cutting said researchers in Yahoo! wanted to get access to lots of data sets for things like ads served and web server loads. "If you were a researcher in Yahoo! asking how to make ads more relevant, you didn't have all the data in one place," he said. "They started pulling data together in one place to get some early users - and they loved it."

Suddenly, Yahoo! was quickly analyzing ever-changing data on its pages to making updates in hours that had previously taken weeks, and it was shuffling ads around to follow the latest click traffic.

"What it's all about is getting people a handle on running computation on terabytes of data and getting an answer back in a small amount of time reliably," Cutting said.

With Yahoo! focused on solving cluster security, Cutting is still pushing Hadoop forward and trying to crack the problem of breaking changes. Also he wants to make take Hadoop a step further attracting non-Java developers. He's tackling both through the Avro project.

Beyond Java

Avro is a format for data interchange intended to let applications call and process data after the application has been updated or changed. Also, the goal is for applications to be written for Hadoop in languages other than Java and to let Hadoop support native MapReduce and HDFS clients in languages like Python, C, and C++.

Meanwhile, Cutting has followed other open sourcers by joining a company that's trying to sell support and services to customers using his pet technology. He joined Cloudera in August 2009. Despite Hadoop's use at some of the largest sites online, Cutting believes Hadoop is good if you're running just 20 node clusters and that it's easier than running a database server to crunch huge piles of data. Cloudera customers include NetFlix and Samsung.

And if you don't want to run Hadoop yourself, you can deploy on cloud providers like Amazon and Rackspace that are running Hadoop. "It's a little harder than spread-sheet programming but there are tools that are making it simpler," Cutting re-assured us. "The whole goal is to make it fairly simple from the outside and keep the complexity inside."

Cutting may never have planned for where Hadoop is today, but he's not letting delays to version 1.0 obstruct its future either.®

Combat fraud and increase customer satisfaction

More from The Register

next story
This time it's 'Personal': new Office 365 sub covers just two devices
Redmond also brings Office into Google's back yard
Inside the Hekaton: SQL Server 2014's database engine deconstructed
Nadella's database sqares the circle of cheap memory vs speed
Oh no, Joe: WinPhone users already griping over 8.1 mega-update
Hang on. Which bit of Developer Preview don't you understand?
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
IRS boss on XP migration: 'Classic fix the airplane while you're flying it attempt'
Plus: Condoleezza Rice at Dropbox 'maybe she can find ... weapons of mass destruction'
Ditch the sync, paddle in the Streem: Upstart offers syncless sharing
Upload, delete and carry on sharing afterwards?
New Facebook phone app allows you to stalk your mates
Nearby Friends feature goes live in a few weeks
prev story

Whitepapers

Top three mobile application threats
Learn about three of the top mobile application security threats facing businesses today and recommendations on how to mitigate the risk.
Combat fraud and increase customer satisfaction
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
SANS - Survey on application security programs
In this whitepaper learn about the state of application security programs and practices of 488 surveyed respondents, and discover how mature and effective these programs are.