Feeds

Hadoop goes 'open core' with Cloudera Enterprise

All-star startup fattens stuffed elephant

The essential guide to IT transformation

Hadoop Summit Cloudera – the commercial Hadoop outfit – has unveiled its first for-pay product: Cloudera Enterprise, an augmented version of the open source distributed data crunching platform designed specifically for production environments.

Cloudera Enterprise – announced today at the Hadoop Summit in Santa Clara, California – beefs up Hadoop with several proprietary management, monitoring, and administration tools, and it's sold on a subscription basis, priced according to the size of your Hadoop cluster. The all-star Silicon Valley startup has adopted an "open core" model, enhancing an open source Hadoop core with additional software that carries a price tag.

"We've been in the market with customers for coming up on two years now, supporting Hadoop in real enterprise production environments," Cloudera CEO Mike Olson, tells The Register. "We've learned a lot about how customers use [Hadoop], what it does well, and what makes it difficult to deploy and operate. As a result of all of that activity."

Additional proprietary tools include integration with LDAP directory servers for user authentication and access control; dashboards for controlling and managing the flow of data into Hadoop clusters; and user interface for cluster management and administration. Buyers also receive maintenance update and support.

The heart of this new enterprise product is the company's open source Hadoop distro – Cloudera’s Distribution for Hadoop (CDH) – which just graduated to version 3. Also announced today, the CDH consists of Apache Hadoop and eight additional open sources projects.

This includes Hive (a SQL-like query language developed at Facebook), Pig (a lower-level language developed by Yahoo!), HBase (a distributed database developed by the now Microsoft-owned Powerset), Sqoop (a MySQL connector built by Cloudera), Oozie (the Hadoop workflow system), and Zookeeper (means of juggling distributed services from a central location), as well as two new projects just opened up by Cloudera under an Apache license: Flume and Hue.

Flume is Cloudera's data loading infrastructure, while HUE – short for Hadoop User Interface – is the web-based Hadoop GUI formerly known as the Cloudera Desktop. HUE provides a graphical user interface for creating and submitting jobs on a Hadoop cluster, monitoring the cluster's health, and browsing stored data. Typically, clusters are managed via the command line.

Based on Google’s proprietary software infrastructure, Hadoop is a means of crunching epic amounts of data across a network of distributed machines. Named for the yellow stuffed elephant belonging to the son of project founder Doug Cutting, the platform underpins online services operated by everyone from Yahoo! and Facebook to Microsoft.

Hadoop mimics GFS, Google's distributed file system, and MapReduce, Mountain View's distributed number-crunching platform. In 2004, Google published a pair of research papers on these infrastructure technologies, and Doug Cutting used the papers to build a platform that would back Nutch, his open source web crawler. Hadoop was open sourced at Apache, and it was bootstrapped by Yahoo!, which hired Cutting in 2006, before he left for Cloudera.

The platform consists of the Hadoop File System (HDFS) and Hadoop MapReduce.

Previously, Cloudera's only proprietary product was the free Cloudera Desktop, which has now been open sourced as HUE. The company offered its own Hadoop distro and various other open source tools in tandem with support, training, and certification services. But Olson and company say they've long been planning to add a subscription revenue stream.

"For the first time we're able to go to market with the stance that if you're using just HDFS and MapReduce, you're not getting the volume you should be out of Hadoop," says company co-founder Cloudera Jeff Hammerbacher, who worked on Hadoop at Facebook. "At Facebook, HDFS and MapReduce provided an excellent starting point for building infrastructure to management and extract value from datasets, but we had a large variety of tools surrounding those two." ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
BBC: We're going to slip CODING into kids' TV
Pureed-carrot-in-ice cream C++ surprise
China: You, Microsoft. Office-Windows 'compatibility'. You have 20 days to explain
Told to cough up more details as antitrust probe goes deeper
Linux turns 23 and Linus Torvalds celebrates as only he can
No, not with swearing, but by controlling the release cycle
Scratched PC-dispatch patch patched, hatched in batch rematch
Windows security update fixed after triggering blue screens (and screams) of death
Windows 7 settles as Windows XP use finally starts to slip … a bit
And at the back of the field, Windows 8.1 is sprinting away from Windows 8
This is how I set about making a fortune with my own startup
Would you leave your well-paid job to chase your dream?
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Endpoint data privacy in the cloud is easier than you think
Innovations in encryption and storage resolve issues of data privacy and key requirements for companies to look for in a solution.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?