The Register® — Biting the hand that feeds IT

Feeds

Cloudera gets proactive with Hadoop management

Lets loose freebie control freak

Cloud based data management

Updated Cloudera might have been the first company to try becoming the Red Hat for stuffed elephants, but with MapR, Hortonworks, IBM, Oracle, DataStax, and EMC all trying to commercialize Hadoop, Cloudera has to keep on its toes and perhaps even balance on a ball.

That's because the underlying Hadoop data muncher is an open source project, so any company that wants to make money on the big data wave has to add value above and beyond the core Hadoop stack. Cloudera thinks it has an edge in managing Hadoop clusters, and believes further that it will extend its lead with a new control freak for its Hadoop distro, called, appropriately enough, Cloudera Manager.

Like other open source companies founded in recent years, Cloudera embraces an "open core" distribution model. That means it wraps up the key open source elements of a particular project – in this case the Hadoop MapReduce application, its underlying Hadoop Distributed File System (HDFS), and a bunch of other things – and distributes this for free and offers commercial-grade support for the stack. But those engaging in open core distribution also peddle closed source add-ons for the open source tools usually under a perpetual or subscription license with an annual support contract and usually offering finer-grained management, more scalability, connectors to third party products, and such.

With Cloudera, the open core is the Cloudera Distribution of Apache Hadoop (abbreviated CDH with a version number) and the extended product is called Cloudera Enterprise (also with a version number). The current open source distro from Cloudera is CDH3, which debuted in April, and was updated in June, and is due for another update next year, Charles Zedlewski, vice president of products at Cloudera, tells El Reg. Tracking alongside of this open source distro is Cloudera Enterprise, which is being upgraded to the 3.7 release level with the addition of a slew of proactive management tools.

In addition, with today's announcement, Cloudera is breaking its Hadoop control freak free of the Cloudera Enterprise stack and offering a freebie version of the tool as well as Cloudera Manager, which includes functionality that used to be included in the Cloudera Management Suite console that was bundled only with Cloudera Enterprise.

The upshot is that Cloudera now has companies that are doing proof of concepts covered with the combination of its CDH3 distro and Cloud Manager Free Edition and production customers who want the full-on Cloud Manager linking into Cloud Enterprise with the 3.7 release.

Cloudera Manager can gather and scan Hadoop logs from the servers in the cluster to look for weird stuff and can even do proactive checking for HDFS and its increasingly popular column-oriented database overlay, HBase. The Hadoop control freak can also send alerts to cluster managers when nodes or services are running slowly or starting to fail; this alerting system has hooks into popular IT management frameworks for consolidating alerts to sysadmins.

Cloud Manager also has a feature called global time control, which correlates logs, system changes, configuration, running jobs, and other aspects of the Hadoop cluster to help admins figure out what went wrong when it inevitably does (as is the case with all complex systems). All of this information is stored in a MySQL database with near-realtime access.

For more sophisticated diagnoses, Cloudera Manager now has a snapshotting feature that can do a core dump on the system state of nodes in the cluster on a scale of minutes to an hour and captures versions of systems and software stacks, settings, logs, any changes, and such that are occurring on the system and packages all this data up and pops it into a file and sends it off to a sysadmin or Cloudera to do debugging and tuning. The time scale on the snapshot is adjustable, but the intent is to keep the file size down in the megabytes so it can be tagged to a specific event in the cluster that needs some work.

Cloudera Manager has all the bells and whistles and is intended for production Hadoop clusters, while Cloudera Manager Free Edition is intended for customers who do not yet need alerting, roll-backs, log search, event management, or proactive health management on their clusters. The free edition, available as a download here, is not open source, and only scales up to 50 nodes in a Hadoop cluster.

By the way the full-on Cloudera Manager 3.7 will work on either the CDH3 or Cloudera Enterprise versions of the stack available from Cloudera, since Cloudera Enterprise is based on the exact same code-set as CDH3. Zedlewski says that the new management tool has been tested on clusters with more than 1,000 nodes and running 10,000 to 15,000 processes.

Cloudera did not originally provide pricing for its Cloudera Enterprise stack or support contracts, but said that the stack is priced on a per-node annual subscription with Cloudera Manager having a per-user annual subscription. But the day after this story ran, the company reconsidered this position and said that it charges $4,000 per node per year for a subscription to Cloudera Enterprise. This is, by the way, precisely the same fee that MapR Technologies is charging for its M5 Hadoop stack and what reseller EMC/Greenplum is charging for its Greenplum HD rebranding of M5.

Cloudera also does not divulge how many customers it has, but Zedlewski tells El Reg that it has more than 100 customers who in turn have many hundreds of clusters with tens of thousands of nodes running its commercial-grade Cloudera Enterprise. The company is not willing to guess publicly how many CDH3-based clusters there might be out there in the world. ®

What you need to know about cloud backup

More from The Register

 breaking news
Number of cops abusing Police National Computer access on the rise
Only a telegram from the Queen can get you off it
 breaking news
NSA PRISM snoop-gate: Won't someone think of the children, wails Apple
10,000 things probed, mostly about missing kids, Alzheimer patients, we're told
Google flings another £1m at online child sex abuse vid CRACKDOWN
See, see, we're trying, ad giant tells Daily Mail UK.gov
Report: Cloud could slash biz software energy use by 87%
Study sees millions of redundant servers slurping power
 breaking news
Julian Assange: Google's just an arm of US government
Pale, embassy-dwelling blond claims conspiracy betweeen ad giant, politicians
 breaking news
CIA spooks picked Amazon's 'superior' cloud over IBM
Procurement report reveals tech gap in cloud cold war
Bone up on fresh EU privacy law - or end up in the clink, IT biz warned
Resellers no longer just flogging boxes - now they must offer legal advice
 breaking news
MPs demand UK rates revamp after Google's 'extraordinary tax mismatch'
Report: 'Highly contrived' structure has damaged HMRC's reputation
Amazon SLASHES hosted database prices
Microsoft, Google, stare meekly at own margins