The Register® — Biting the hand that feeds IT

Feeds

Hadoop goes 'open core' with Cloudera Enterprise

All-star startup fattens stuffed elephant

Ensure Ease of Recovery with Asigra’s Agentless Software

Hadoop Summit Cloudera – the commercial Hadoop outfit – has unveiled its first for-pay product: Cloudera Enterprise, an augmented version of the open source distributed data crunching platform designed specifically for production environments.

Cloudera Enterprise – announced today at the Hadoop Summit in Santa Clara, California – beefs up Hadoop with several proprietary management, monitoring, and administration tools, and it's sold on a subscription basis, priced according to the size of your Hadoop cluster. The all-star Silicon Valley startup has adopted an "open core" model, enhancing an open source Hadoop core with additional software that carries a price tag.

"We've been in the market with customers for coming up on two years now, supporting Hadoop in real enterprise production environments," Cloudera CEO Mike Olson, tells The Register. "We've learned a lot about how customers use [Hadoop], what it does well, and what makes it difficult to deploy and operate. As a result of all of that activity."

Additional proprietary tools include integration with LDAP directory servers for user authentication and access control; dashboards for controlling and managing the flow of data into Hadoop clusters; and user interface for cluster management and administration. Buyers also receive maintenance update and support.

The heart of this new enterprise product is the company's open source Hadoop distro – Cloudera’s Distribution for Hadoop (CDH) – which just graduated to version 3. Also announced today, the CDH consists of Apache Hadoop and eight additional open sources projects.

This includes Hive (a SQL-like query language developed at Facebook), Pig (a lower-level language developed by Yahoo!), HBase (a distributed database developed by the now Microsoft-owned Powerset), Sqoop (a MySQL connector built by Cloudera), Oozie (the Hadoop workflow system), and Zookeeper (means of juggling distributed services from a central location), as well as two new projects just opened up by Cloudera under an Apache license: Flume and Hue.

Flume is Cloudera's data loading infrastructure, while HUE – short for Hadoop User Interface – is the web-based Hadoop GUI formerly known as the Cloudera Desktop. HUE provides a graphical user interface for creating and submitting jobs on a Hadoop cluster, monitoring the cluster's health, and browsing stored data. Typically, clusters are managed via the command line.

Based on Google’s proprietary software infrastructure, Hadoop is a means of crunching epic amounts of data across a network of distributed machines. Named for the yellow stuffed elephant belonging to the son of project founder Doug Cutting, the platform underpins online services operated by everyone from Yahoo! and Facebook to Microsoft.

Hadoop mimics GFS, Google's distributed file system, and MapReduce, Mountain View's distributed number-crunching platform. In 2004, Google published a pair of research papers on these infrastructure technologies, and Doug Cutting used the papers to build a platform that would back Nutch, his open source web crawler. Hadoop was open sourced at Apache, and it was bootstrapped by Yahoo!, which hired Cutting in 2006, before he left for Cloudera.

The platform consists of the Hadoop File System (HDFS) and Hadoop MapReduce.

Previously, Cloudera's only proprietary product was the free Cloudera Desktop, which has now been open sourced as HUE. The company offered its own Hadoop distro and various other open source tools in tandem with support, training, and certification services. But Olson and company say they've long been planning to add a subscription revenue stream.

"For the first time we're able to go to market with the stance that if you're using just HDFS and MapReduce, you're not getting the volume you should be out of Hadoop," says company co-founder Cloudera Jeff Hammerbacher, who worked on Hadoop at Facebook. "At Facebook, HDFS and MapReduce provided an excellent starting point for building infrastructure to management and extract value from datasets, but we had a large variety of tools surrounding those two." ®

Requirements Checklist for Choosing a Cloud Backup and Recovery Service Provider

More from The Register

Bjarne Again: Hallelujah for C++
Plus: Now officially OK to admit you never used STL algorithms
Interwebs taunt Sir Jony over Apple eye candy makeover
Hey Ive, Ive... add more unicorns, willya?
SCO vs. IBM battle resumes over ownership of Unix
Zombie lawsuit back and wants to suck the brains out of Linux
Red Hat to ditch MySQL for MariaDB in RHEL 7
So long, Oracle! Don't let the door hit you on the way out
Shy? Socially inadequate? Fiddling with your phone could help
App 'tells the brutal truth' about social inadequates' chatup lines
Java EE 7 melds HTML5 with enterprise apps
New release arrives with GlassFish, NetBeans support
 breaking news
'Office Facebook' firm Tibbr wants you to PAY for mobe-meetings app
Great idea. Punters won't cough for it though
 breaking news
The only Waze is Google: Ad giant tipped to gobble map app 'for $1.3bn'
Pac-Man-satnav-ish upstart in bidding war with Apple, Facebook
 breaking news
PM Cameron calls for modern, programmable computers! (We think)
IT education musings to G8 chiefs to mystify IT industry
Apple at WWDC: Sleek new iOS, death of the big cats, pint-sized Mac Pro
CEO Cook: 'The biggest change to iOS since the introduction of the iPhone'