Channel

This article is more than 1 year old

Hadoop takes Big Data beyond Java

Stuffed elephant mates with Python

Mon 17 May 2010 // 21:12 UTC

From Nutch to Hadoop

He mimicked GFS and MapReduce to break up large chunks of data into small pieces and search them quickly across thousands of servers, building an implementation using open source. Again, it worked - to a point. "We could do demos on 20 machines and actually get some work done, but it wasn't ready to scale to thousands of machine and it wasn't horribly reliable," Cutting said. "This reliability thing was really hard work."

It was then Yahoo! that stepped in, offering the engineers and servers needed to iron out the problems. But Yahoo! had found another use for Hadoop: to quickly analyze huge piles of data distributed in silos of servers and web properties. With Yahoo!'s vice president of Hadoop software development Eric Baldeschwieler, Cutting split out the distributed computing part of Nutch and put it into Hadoop.

Cutting said researchers in Yahoo! wanted to get access to lots of data sets for things like ads served and web server loads. "If you were a researcher in Yahoo! asking how to make ads more relevant, you didn't have all the data in one place," he said. "They started pulling data together in one place to get some early users - and they loved it."

Suddenly, Yahoo! was quickly analyzing ever-changing data on its pages to making updates in hours that had previously taken weeks, and it was shuffling ads around to follow the latest click traffic.

"What it's all about is getting people a handle on running computation on terabytes of data and getting an answer back in a small amount of time reliably," Cutting said.

With Yahoo! focused on solving cluster security, Cutting is still pushing Hadoop forward and trying to crack the problem of breaking changes. Also he wants to make take Hadoop a step further attracting non-Java developers. He's tackling both through the Avro project.

Beyond Java

Avro is a format for data interchange intended to let applications call and process data after the application has been updated or changed. Also, the goal is for applications to be written for Hadoop in languages other than Java and to let Hadoop support native MapReduce and HDFS clients in languages like Python, C, and C++.

Meanwhile, Cutting has followed other open sourcers by joining a company that's trying to sell support and services to customers using his pet technology. He joined Cloudera in August 2009. Despite Hadoop's use at some of the largest sites online, Cutting believes Hadoop is good if you're running just 20 node clusters and that it's easier than running a database server to crunch huge piles of data. Cloudera customers include NetFlix and Samsung.

And if you don't want to run Hadoop yourself, you can deploy on cloud providers like Amazon and Rackspace that are running Hadoop. "It's a little harder than spread-sheet programming but there are tools that are making it simpler," Cutting re-assured us. "The whole goal is to make it fairly simple from the outside and keep the complexity inside."

Cutting may never have planned for where Hadoop is today, but he's not letting delays to version 1.0 obstruct its future either.®

Page:

More about

Google
Yahoo

More about

Google
Yahoo

Narrower topics

Narrower topics

Broader topics

Alphabet

TIP US OFF

Send us news

Topics

Special Features

Vendor Voice

Resources

Channel

Hadoop takes Big Data beyond Java

Stuffed elephant mates with Python

From Nutch to Hadoop

Beyond Java

More about

More about

Narrower topics

Broader topics

More about

More about

More about

Narrower topics

Broader topics

TIP US OFF

Other stories you might like

Google One VPN axed for everyone but Pixel loyalists ... for now

Google will delete data collected from 'private' browsing

Google joins the custom server CPU crowd with Arm-based Axion chips

Reducing the cloud security overhead

Google location tracking deal could be derailed by politics

Google sues app devs, claims they're Play Store crypto scammers with 100k+ victims

Google will pump more than $100B into AI, says DeepMind boss

Japan turns up heat on Apple, Google with threat of hefty fines

AI spam is winning the battle against search engine quality

Google plunks down $1 billion for extra Japan-US submarine cable

Japanese government rejects Yahoo! infosec improvement plan

Next Vision, or Vision Next? What we really thought about Google and Intel's AI events

About Us

Our Websites

Your Privacy