Feeds

EMC Greenplum Hadoop elephant straddles Cisco iron

Cah. Took them long enough

Internet Security Threat Report 2014

Well, that took long enough. Cisco Systems and the Greenplum big data unit of server partner EMC have finally gotten together and put the Greenplum wares on Cisco's Unified Computing System servers.

In a blog posting, Raghunath Nambiar, an architect at Cisco's Server Access and Virtualization Technology Group, reveals that the two partners in the Virtual Computing Environment Company has circled back and are now offering pre-configured Hadoop stacks that marry Cisco's C-Series rack servers and Greenplum's eponymous Greenplum MR Hadoop distribution.

Greenplum doesn't like to talk about the hardware its data warehousing and Hadoop clusters run upon, mainly because EMC, as an independent disk array maker and the owner of server virtualization juggernaut VMware, has to position itself as Switzerland in the server racket. Before it was acquired by EMC in July 2010 for an undisclosed sum, Greenplum had run its heavily customized implementation of the PostgreSQL database, which was parallelized and juiced to run data warehouse clusters, on Sun Fire x86 servers from Sun Microsystems. This was a good choice at the time, given the large amount of disk capacity that Sun had crammed onto its Opteron and Xeon servers, but a bad choice in the long term because database rival Oracle ate Sun. In the wake of the Sun acquisition, Greenplum has certified its code to run on Dell, Hewlett-Packard, and Huawei Technologies x86 servers and OEMs this iron from those companies, depending on what customers want.

EMC did not, interestingly enough, plunk the Greenplum Modular Data Computing Appliance data warehouse or its Hadoop appliance, which is actually based on a rebadged Hadoop stack from MapR Technologies, on the Vblock server-storage clusters it cooked up with Cisco to chase server virtualization and private cloud business in data centers and now virtual desktops. While the B Series blade servers in the UCS family may not be suitable for Greenplum workloads, the C Series rack servers could certainly be configured in a Vblock by EMC and Cisco to run this Greenplum code, but were not.

Part of the problem was that Hadoop doesn't use external storage, so there would be no EMC iron in such a Vblock. It is very likely that EMC and Cisco were waiting for Cisco to get a little more traction in the server racket – Cisco's server business now has more than 10,000 customers and a $1bn annual revenue run rate that will probably nearly double in the next year – before committing the Greenplum wares to the UCS platform.

According to Nambiar, the fully integrated Cisco-EMC stack takes Cisco's UCS C Series rack servers and its UCS 6200 converged server-storage 10GE switches and fabric interconnects and configures up the Greenplum MR Hadoop distro to run on the boxes. (This Hadoop distro is MapR's M5 Hadoop distribution with the names changed.) The setups start at a single rack and can be expanded to cover multiple racks. The UCS 6200 switch links into UCS 2200 fabric extenders, and according to the reference architecture (PDF), the UCS C210 M2 server is the workhorse that Cisco and EMC have chosen to run Hadoop. The C210 M2 server was announced in March 2010 and is a two-socket box that uses Intel's six-core Xeon 5600 processors and will no doubt be replaced by a new machine using Intel's "Sandy Bridge-EP" Xeon E5 chip. The C210 M2 can support up to 192GB of DDR3 main memory and has room for 16 2.5-inch disk drives and one or two RAID disk controllers.

Cisco UCS Greenplum Hadoop stack

The Cisco UCS-Greenplum Hadoop stack (click to enlarge)

In a single-rack configuration, the Greenplum MR-UCS stack has two 48-port UCS 6248UP fabric interconnects and two 2232PP 10GE fabric extenders. These link down into 16 of the C210 M2 servers, which have 96GB of main memory and 16 1TB disk drives, an LSI MegaRAID 9261-8i disk controller, and a Cisco UCS P81F virtual interface card that presents two 10GE ports to the fabric extenders. Cisco is dropping in the six-core Xeon X5670 processors, which run at 2.93GHz. Each rack has 192 cores, 256TB of raw storage capacity, and up to 350TB of usable Hadoop capacity with three-way data replication across the nodes and data compression turned on. The nodes are configured with Red Hat Enterprise Linux Standard Edition. ®

Beginner's guide to SSL certificates

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.