Feeds

VMware teaches Serengeti big-data virt new Hadoop tricks

Probably shuffling off to Pivotal soon

Top 5 reasons to deploy VMware with Tegile

It comes as no surprise that VMware wants companies to run everything virtually rather than on bare metal, and for several years it has pushed the idea of virtualizing the Hadoop stack to make it run better and easier to manage. The tool it created to do that, called Project Serengeti, now has some feature tweaks to try to entice more big data cluster builders give it a whirl.

With Serengeti 0.8.0, released Tuesday, the open source tool for virtualizing Hadoop now supports a number of new Hadoop releases plus adds features to make it easier to set up HBase data warehouses on top of Hadoop.

The update to Seregenti was announced in a blog post by Richard McDougall, principal engineer in the office of the CTO at the virtualization giant. "Most big-data environments consist of a mix of workloads," McDougall explains. "Serengeti's mission is to enable as many of the big-data family of workloads into the same theme park, all running on a common shared platform."

By virtualizing clusters you can run various parts of the big-data munching tools on shared hardware, dialing up virtual machines running each workload as needed, and dialing them back so other workloads can play.

It's all about elastic scaling, for which you pay a virtualization performance tax. For many workloads, as servers have been crammed to the gills with cores, this overhead has been acceptable.

VMware wants to layer big data tools on top of its ESXi server virtualization

VMware wants to layer big data tools on top of its ESXi server virtualization

Most companies probably don't think about their Hadoop clusters in this manner, and very likely do think about them as performing very specific functions. They're more worried about the turnaround time for batch jobs and queries and how other applications are dependent on the results of that work, and they don't want to pay a performance overhead for virtualization.

But VMware is going to keep plugging away at the idea that virtualization will allow for mixed-mode use of server clusters for all kinds of big-data jobs. So will the Pivotal group once Serengeti passes along with the Cloud Foundry platform cloud and EMC's Greenplum data warehouse and Hadoop distribution over to the Pivotal spinoff sometime later this year.

With the Serengeti 0.8.0 release, Cloudera's CDH4 and MapR Technologies' M5 Hadoop distributions are now supported running inside of virtual machine containers. The open source Apache 1.0 distribution was already supported, as was EMC's Greenplum HD 1.2., Cloudera CDH3, and Hortonworks Data Platform 1.0.

With the CHD4 release, Serengeti is aware that you can use the HDFS1 or HDFS2 file systems, and is also aware of the federated NameNode support that Cloudera has built into its Hadoop distro and knows how to configure these options.

And with MapR distros, Serengeti is similarly aware of the container location database (CLDB) used in the NFS-alike file system that MapR uses instead of HDFS, and is also in the know about the FileServer, JobTracker, and TaskTracker elements of the MapR stack, and how to package these up into virty machines and scale out their performance by replicating copies.

If you are looking to set up an HBase data warehouse, as you can see in the Serengeti 0.8.0 release notes, the VMware tool can create an HBase cluster, with an underlying HDFS file system and linked to the MapReduce data-muncher and the Thrift and RESTful APIs that are used to control HBase.

Serengeti also knows how to configure active and hot standby replicants of the HMaster nodes for the data warehouse, and can scale out HBase RegionalServers once the data warehouse is set up atop HDFS. HBase can be deployed in a virtualized manner by Serengeti on top of the Apache Hadoop. Cloudera, Hortonworks, or Greenplum distros – but not MapR distros, for some reason.

You can download the virtual machine appliance stuffed with Serengeti 0.8.0 here at the VMware site, and it doesn't cost anything to use. ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
NASA launches new climate model at SC14
75 days of supercomputing later ...
Yahoo! blames! MONSTER! email! OUTAGE! on! CUT! CABLE! bungle!
Weekend woe for BT as telco struggles to restore service
Cloud unicorns are extinct so DiData cloud mess was YOUR fault
Applications need to be built to handle TITSUP incidents
BOFH: WHERE did this 'fax-enabled' printer UPGRADE come from?
Don't worry about that cable, it's part of the config
Stop the IoT revolution! We need to figure out packet sizes first
Researchers test 802.15.4 and find we know nuh-think! about large scale sensor network ops
SanDisk vows: We'll have a 16TB SSD WHOPPER by 2016
Flash WORM has a serious use for archived photos and videos
Astro-boffins start opening universe simulation data
Got a supercomputer? Want to simulate a universe? Here you go
prev story

Whitepapers

Designing and building an open ITOA architecture
Learn about a new IT data taxonomy defined by the four data sources of IT visibility: wire, machine, agent, and synthetic data sets.
How to determine if cloud backup is right for your servers
Two key factors, technical feasibility and TCO economics, that backup and IT operations managers should consider when assessing cloud backup.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Website security in corporate America
Find out how you rank among other IT managers testing your website's vulnerabilities.