Feeds

VMware teaches Serengeti big-data virt new Hadoop tricks

Probably shuffling off to Pivotal soon

HP ProLiant Gen8: Integrated lifecycle automation

It comes as no surprise that VMware wants companies to run everything virtually rather than on bare metal, and for several years it has pushed the idea of virtualizing the Hadoop stack to make it run better and easier to manage. The tool it created to do that, called Project Serengeti, now has some feature tweaks to try to entice more big data cluster builders give it a whirl.

With Serengeti 0.8.0, released Tuesday, the open source tool for virtualizing Hadoop now supports a number of new Hadoop releases plus adds features to make it easier to set up HBase data warehouses on top of Hadoop.

The update to Seregenti was announced in a blog post by Richard McDougall, principal engineer in the office of the CTO at the virtualization giant. "Most big-data environments consist of a mix of workloads," McDougall explains. "Serengeti's mission is to enable as many of the big-data family of workloads into the same theme park, all running on a common shared platform."

By virtualizing clusters you can run various parts of the big-data munching tools on shared hardware, dialing up virtual machines running each workload as needed, and dialing them back so other workloads can play.

It's all about elastic scaling, for which you pay a virtualization performance tax. For many workloads, as servers have been crammed to the gills with cores, this overhead has been acceptable.

VMware wants to layer big data tools on top of its ESXi server virtualization

VMware wants to layer big data tools on top of its ESXi server virtualization

Most companies probably don't think about their Hadoop clusters in this manner, and very likely do think about them as performing very specific functions. They're more worried about the turnaround time for batch jobs and queries and how other applications are dependent on the results of that work, and they don't want to pay a performance overhead for virtualization.

But VMware is going to keep plugging away at the idea that virtualization will allow for mixed-mode use of server clusters for all kinds of big-data jobs. So will the Pivotal group once Serengeti passes along with the Cloud Foundry platform cloud and EMC's Greenplum data warehouse and Hadoop distribution over to the Pivotal spinoff sometime later this year.

With the Serengeti 0.8.0 release, Cloudera's CDH4 and MapR Technologies' M5 Hadoop distributions are now supported running inside of virtual machine containers. The open source Apache 1.0 distribution was already supported, as was EMC's Greenplum HD 1.2., Cloudera CDH3, and Hortonworks Data Platform 1.0.

With the CHD4 release, Serengeti is aware that you can use the HDFS1 or HDFS2 file systems, and is also aware of the federated NameNode support that Cloudera has built into its Hadoop distro and knows how to configure these options.

And with MapR distros, Serengeti is similarly aware of the container location database (CLDB) used in the NFS-alike file system that MapR uses instead of HDFS, and is also in the know about the FileServer, JobTracker, and TaskTracker elements of the MapR stack, and how to package these up into virty machines and scale out their performance by replicating copies.

If you are looking to set up an HBase data warehouse, as you can see in the Serengeti 0.8.0 release notes, the VMware tool can create an HBase cluster, with an underlying HDFS file system and linked to the MapReduce data-muncher and the Thrift and RESTful APIs that are used to control HBase.

Serengeti also knows how to configure active and hot standby replicants of the HMaster nodes for the data warehouse, and can scale out HBase RegionalServers once the data warehouse is set up atop HDFS. HBase can be deployed in a virtualized manner by Serengeti on top of the Apache Hadoop. Cloudera, Hortonworks, or Greenplum distros – but not MapR distros, for some reason.

You can download the virtual machine appliance stuffed with Serengeti 0.8.0 here at the VMware site, and it doesn't cost anything to use. ®

Reducing security risks from open source software

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
Amazon Reveals One Weird Trick: A Loss On Almost $20bn In Sales
Investors really hate it: Share price plunge as growth SLOWS in key AWS division
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
The triumph of VVOL: Everyone's jumping into bed with VMware
'Bandwagon'? Yes, we're on it and so what, say big dogs
Carbon tax repeal won't see data centre operators cut prices
Rackspace says electricity isn't a major cost, Equinix promises 'no levy'
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Seven Steps to Software Security
Seven practical steps you can begin to take today to secure your applications and prevent the damages a successful cyber-attack can cause.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.