Feeds

Dell floats Cloud Hadoop clusters

Stuffed elephant rides PowerEdge iron

Maximizing your infrastructure through virtualization

Dell might not own a lot of systems or middleware software a la IBM, Oracle, and Hewlett-Packard, but it wants to sell configured stacks tuned for specific work just like its rivals in the systems racket.

The company already peddles pre-configured Ubuntu/Eucalyptus and OpenStack systems for building and managing private clouds, and now, it's expanding out into big data munching with a pre-built Hadoop cluster based on the software stack from Cloudera.

Hadoop is a set of programs originally created by that mimic the functioning of Google's MapReduce data muncher and its related Google File System. Cloudera, which launched in March 2009, is the first of a handful of organizations providing commercial Hadoop distributions.

The company is a big contributor to the open source Apache Hadoop project, along with Yahoo!, where the project was originally bootstrapped. IBM, which could have partnered with Cloudera, chose instead to create its own Hadoop distribution, called InfoSphere BigInsights in May 2010. MapR kicked out its own commercially supported Hadoop in May of this year, and Hortonworks, a spinout directly from Yahoo, joined the fray in June 2011.

Cloudera offers its own open source distro (the Cloudera Distribution for Hadoop (CDH)) and an enterprise version (Cloudera Enterprise) that has extra goodies in it for which it can charge money. These extra goodies are not open sourced.

For its Dell-Cloudera stack, Dell is using the Cloudera Enterprise CDH3 version, which can run atop Red Hat Enterprise Linux 5 and 6, its CentOS 5 clone, SUSE Linux Enterprise Server 11, and Ubuntu Server 10.04 LTS and 10.10. Dell is at the moment putting RHEL in the Dell-Cloudera reference architecture, but you are allowed to tweak this and still order a single stack preconfigured. CDH3 Update 1 was just released on July 22.

The Cloudera stack includes the core Apache Hadoop, which has the MapReduce and distributed file system, as well as the companion Apache Hive (SQL-like query), Pig (a high-level programming language for Hadoop), HBase (a column-oriented distributed data store modeled after Google's BigTable), and ZooKeeper (a configuration server for clusters). The stack also includes Cloudera's SCM Express, a management server for configuring Hadoop clusters.

"There are hundreds of settings for a Hadoop configuration, and this makes specific setting recommendations based on the hardware and software that customers are deploying on," Ed Albanese, head of business development at Cloudera tells El Reg. The settings for deploying on a small Hadoop cluster can be radically different from those needed on a mid-sized or large cluster, and SCM Express makes it possible for companies to get the right settings without having to be Hadoop experts like Yahoo!

Dell-Cloudera block diagram

Block diagram of the Dell-Cloudera Hadoop stack (click to enlarge)

Dell is also tossing in its Crowbar tool, which it created for OpenStack-based private clouds running on its PowerEdge-C servers, which were announced last week. Crowbar works in concert with SCM Express to do BIOS configuration, RAID array setup, network setup, operating system deployment, and manage the provisioning of Hadoop software on the bare metal iron from Dell.

John Igoe, executive director of cloud software solutions at Dell, says that for a typical company that is not all that familiar with Hadoop, it can take days to weeks to manually configure a Hadoop cluster with 20, 30, or 40 nodes. But with the combination of the Cloudera Enterprise software and Crowbar, companies can go from a bare-metal rack to running Hadoop data-crunching jobs in less than a day.

Speed to deployment is probably not as important as ease of use, however. "I think there is a substantial opportunity for Hadoop, but we are still in the product's infancy," says Igoe. "Current Hadoop customers have a very deep software bench. But other companies that want to use Hadoop don't have these skills, and they are looking for is to give them those skills."

Stacking it all up

Dell's reference architecture recommends that customers choose the company's PowerEdge-C 2100 energy efficient rack servers, which are optimized to cram the most computing performance in the smallest space with the least amount of extraneous hardware. This is the same iron that Dell used for its Ubuntu Enterprise Cloud (UEC) pre-fabbed private clouds that came out in March, but is different from the PowerEdge C6100 cookie-sheet servers that are being used in the preconfigured OpenStack clouds launched last week.

Dell-Cloudera servers

Server layout in the Dell-Cloudera Hadoop stack (click to enlarge)

The Dell-Cloudera stack is built under the assumption that companies will start out small but grow fast. It consists of six PowerEdge C2100 server nodes and six (yes, six) PowerConnect 6248 48-port Gigabit Ethernet switches. There are two Hadoop master nodes (sometimes called a name node) that manage the Hadoop Distributed File System (HDFS) and MapReduce task distribution across the cluster. These master nodes can also run ZooKeeper. The edge node, of which there is only one in the base configuration, runs Pig and Hive and is the interface between users and the cluster. The slave nodes, of which there are three, run MapReduce and HDFS store data on local disk drives and chews on it as instructed by the master nodes.

The Power of One eBook: Top reasons to choose HP BladeSystem

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
Carbon tax repeal won't see data centre operators cut prices
Rackspace says electricity isn't a major cost, Equinix promises 'no levy'
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Securing Web Applications Made Simple and Scalable
Learn how automated security testing can provide a simple and scalable way to protect your web applications.