Feeds

EMC Greenplum Hadoop elephant straddles Cisco iron

Cah. Took them long enough

Boost IT visibility and business value

Well, that took long enough. Cisco Systems and the Greenplum big data unit of server partner EMC have finally gotten together and put the Greenplum wares on Cisco's Unified Computing System servers.

In a blog posting, Raghunath Nambiar, an architect at Cisco's Server Access and Virtualization Technology Group, reveals that the two partners in the Virtual Computing Environment Company has circled back and are now offering pre-configured Hadoop stacks that marry Cisco's C-Series rack servers and Greenplum's eponymous Greenplum MR Hadoop distribution.

Greenplum doesn't like to talk about the hardware its data warehousing and Hadoop clusters run upon, mainly because EMC, as an independent disk array maker and the owner of server virtualization juggernaut VMware, has to position itself as Switzerland in the server racket. Before it was acquired by EMC in July 2010 for an undisclosed sum, Greenplum had run its heavily customized implementation of the PostgreSQL database, which was parallelized and juiced to run data warehouse clusters, on Sun Fire x86 servers from Sun Microsystems. This was a good choice at the time, given the large amount of disk capacity that Sun had crammed onto its Opteron and Xeon servers, but a bad choice in the long term because database rival Oracle ate Sun. In the wake of the Sun acquisition, Greenplum has certified its code to run on Dell, Hewlett-Packard, and Huawei Technologies x86 servers and OEMs this iron from those companies, depending on what customers want.

EMC did not, interestingly enough, plunk the Greenplum Modular Data Computing Appliance data warehouse or its Hadoop appliance, which is actually based on a rebadged Hadoop stack from MapR Technologies, on the Vblock server-storage clusters it cooked up with Cisco to chase server virtualization and private cloud business in data centers and now virtual desktops. While the B Series blade servers in the UCS family may not be suitable for Greenplum workloads, the C Series rack servers could certainly be configured in a Vblock by EMC and Cisco to run this Greenplum code, but were not.

Part of the problem was that Hadoop doesn't use external storage, so there would be no EMC iron in such a Vblock. It is very likely that EMC and Cisco were waiting for Cisco to get a little more traction in the server racket – Cisco's server business now has more than 10,000 customers and a $1bn annual revenue run rate that will probably nearly double in the next year – before committing the Greenplum wares to the UCS platform.

According to Nambiar, the fully integrated Cisco-EMC stack takes Cisco's UCS C Series rack servers and its UCS 6200 converged server-storage 10GE switches and fabric interconnects and configures up the Greenplum MR Hadoop distro to run on the boxes. (This Hadoop distro is MapR's M5 Hadoop distribution with the names changed.) The setups start at a single rack and can be expanded to cover multiple racks. The UCS 6200 switch links into UCS 2200 fabric extenders, and according to the reference architecture (PDF), the UCS C210 M2 server is the workhorse that Cisco and EMC have chosen to run Hadoop. The C210 M2 server was announced in March 2010 and is a two-socket box that uses Intel's six-core Xeon 5600 processors and will no doubt be replaced by a new machine using Intel's "Sandy Bridge-EP" Xeon E5 chip. The C210 M2 can support up to 192GB of DDR3 main memory and has room for 16 2.5-inch disk drives and one or two RAID disk controllers.

Cisco UCS Greenplum Hadoop stack

The Cisco UCS-Greenplum Hadoop stack (click to enlarge)

In a single-rack configuration, the Greenplum MR-UCS stack has two 48-port UCS 6248UP fabric interconnects and two 2232PP 10GE fabric extenders. These link down into 16 of the C210 M2 servers, which have 96GB of main memory and 16 1TB disk drives, an LSI MegaRAID 9261-8i disk controller, and a Cisco UCS P81F virtual interface card that presents two 10GE ports to the fabric extenders. Cisco is dropping in the six-core Xeon X5670 processors, which run at 2.93GHz. Each rack has 192 cores, 256TB of raw storage capacity, and up to 350TB of usable Hadoop capacity with three-way data replication across the nodes and data compression turned on. The nodes are configured with Red Hat Enterprise Linux Standard Edition. ®

The essential guide to IT transformation

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
Docker kicks KVM's butt in IBM tests
Big Blue finds containers are speedy, but may not have much room to improve
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Gartner's Special Report: Should you believe the hype?
Enough hot air to carry a balloon to the Moon
Flash could be CHEAPER than SAS DISK? Come off it, NetApp
Stats analysis reckons we'll hit that point in just three years
Dell The Man shrieks: 'We've got a Bitcoin order, we've got a Bitcoin order'
$50k of PowerEdge servers? That'll be 85 coins in digi-dosh
prev story

Whitepapers

5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.