Feeds

'Grid computing Red Hat' out-Amazons Amazon

Cloudera in the, yes, cloud

Choosing a cloud hosting partner with confidence

Hadoop Summit In its mission to bring to world+dog the joys of Hadoop - that open-source grid-computing platform based on Google arrogance - Cloudera has out-Amazoned Amazon.

Today, the star-studded Hadoop startup told the world that its commercial stuffed-elephant distro can now be run on Amazon's Elastic Compute Cloud (EC2) in tandem with so-called Elastic Block Store (EBS) storage volumes. EBS volumes are mounted directly onto EC2 server instances.

This means you can run ongoing Hadoop jobs - starting them and stopping them whenever you like - without moving data back and forth between the local EC2 disks and Amazon's Simple Storage Sevice (S3). "Instead of using local disks, you can use EBS volumes," Cloudera man Christophe Bisciglia said today at the annual Hadoop Summit in Santa Clara, California.

"What's key about this is that your data is persistent. Currently, if you bring up a Hadoop cluster on Amazon and then bring it down, your [Hadoop File System] instance goes away. S3 can mitigate this, but then you have to round-trip between S3 and Hadoop every time you run a job.

"This is a way to turn your clusters on and off and keep them persistent and bring the full power of Hadoop."

Cloudera also says that its EBS integration improves Hadoop performance on the Amazon cloud by allowing more disks per server. EC2 provides a limited number of local disks for each instance.

Named for a yellow stuffed elephant, Hadoop mimics Google's MapReduce framework, mapping epic data-crunching tasks across a sea of machines - i.e. splitting them into tiny sub-tasks - before reducing the results into one master calculation. You can run it your own data centers - as Yahoo!, Facebook, and many others do - or you could run on Amazon's cloud. Or, for that matter, another infrastructure cloud.

Amazon's cloud offers its own Hadoop implementation as a service. It's called Amazon Elastic MapReduce. But it doesn't dovetail with EBS.

Bisciglia called Cloudera's EBS integration "a beta."

Cloudera also announced that its commercial distro - think of Cloudera as Hadoop's Red Hat - now includes the latest versions of Hive and Pig, two languages for coding atop Hadoop. The distro now includes Hive 0.3 and Pig 0.2. The distro is available at clouder.com/hadoop.

And the company has released beta packages of Hadoop version 0.20. "Twenty is going to be a really important release - it's going to include both sets of APIs, both the new and the old ones," Bisciglia said. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
ONE MILLION people already running Windows 10
A third of them are doing it in VMs, but early feedback focuses on frippery
Sign off my IT project or I’ll PHONE your MUM
Honestly, it’s a piece of piss
Netscape Navigator - the browser that started it all - turns 20
It was 20 years ago today, Marc Andreeesen taught the band to play
Torvalds CONFESSES: 'I'm pretty good at alienating devs'
Admits to 'a metric ****load' of mistakes during work with Linux collaborators
Sway: Microsoft's new Office app doesn't have an Undo function
Content aggregation, meet the workplace ... oh
Do Moan! MONSTER 6-day EMAIL OUTAGE hits Domain Monster
Customers freaked out by frightful service
Ploppr: The #VultureTRENDING App of the Now
This organic crowd sourced viro- social fertiliser just got REAL
Return of the Jedi – Apache reclaims web server crown
.london, .hamburg and .公司 - that's .com in Chinese - storm the web server charts
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.