Skytap fires up Hadoop data-chewer as cloud crash-test dummy

Eager to work with and compete against Amazon – someday

Designing a Defense for Mobile Applications

Hadoop might be a popular tool for munching on unstructured data, but setting up and tuning the software requires a lot more expertise than many people have and it takes a lot of time, too. That makes it a perfect piece of software to put on a cloud, provided you can either generate your data there to begin with or pipe it over there once you gin it up.

If you want to run Hadoop on a cloud, you can either buy raw server and storage capacity on Amazon's EC2 cloud and set it up yourself or you can use the Elastic MapReduce service on Amazon's cloud. Another alternative is to spin up Hadoop services on Microsoft's Azure cloud, which is in tech preview and which is expected to be commercially available soon.

And starting today, if you are feeling lazy or just experimenting, you have another option. Skytap, the other cloud backed by Amazon founder Jeff Bezos, is preconfiguring Hadoop to run on its test and development cloud.

Skytap was founded in 2006 with the name Illumita (the same year that AWS debuted) and launched its cloud and new name in 2008. The company runs its Skytap Cloud in a Savvis co-location service managed by Savvis outside of Seattle. The reason why Skytap built its own cloud rather than just running it on Amazon Web Services is that, as the company has explained in the past, the snapshotting and storage features on the Amazon cloud are too slow for test and dev environments even if they are suitable for production environments.

The company's homegrown cloud control freak is not available commercially so you can run it in your own data center, but it is OEMed by services giant CSC, which slaps its brand on it and runs a test and development cloud services for its customers out of its own Chicago data center. The Skytap Cloud uses VMware's ESXi hypervisor to partition capacity.

Skytap has several hundred customers and does not provide the size of its cloud, but does say that since it went live, over 1.9 million virtual machines have been launched on the server, up from 1 million last April.

In the August 2011 release of the Skytap service, the company added self-service cloud orchestration and hub and spoke network configurations, and in the April 2012 release it added a new management interface - and also made it puke out reports to get beancounters off your back about the cost of services.

With today's announcement, Skytap has created server templates for the CDH4 Hadoop distribution from Cloudera, which debuted back in June 2012. Specifically, Skytap is running the Cloudera Enterprise Free edition of CDH4, which is enabled to scale up to 50 nodes (physical or virtual), plus the Cloudera Manager graphical Hadoop management tool.

Setting up Hadoop on the Skytap cloud is as simple as 1, 2, 3

Setting up Hadoop on the Skytap cloud is as simple as 1, 2, 3 (click to enlarge)

The Cloudera CDH4 setup on the Skytap Cloud puts three virtualized servers together into a baby cluster. One template sets up the NameNode and other Hadoop management tools (Hive, Oozie, and ZooKeeper) on a virtual machine with two virtual CPUs and 2GB of memory and 40GB of virtual storage.

The other template creates a base compute node for the Hadoop cluster and its underlying Hadoop Distributed File System (HDFS). This Hadoop compute image has one virtual CPU, 1GB of virtual memory, and 40GB of disk capacity. The base cluster has one management node and two compute nodes, and you can add up to 48 more virtual nodes without invoking a license fee and support contract from Cloudera. You can also scale up the CPU, memory, and storage capacity on these images as your workload requires.

All of the virtual nodes in the Hadoop cluster run Canonical's Ubuntu Server 12.04 LTS release, and Skytap's own SmartClient provides root access with a command line to all of the virtual Hadoop nodes.

If you want to develop and test applications on a Hadoop cluster that is larger than 50 nodes, or if you want to run workloads in production on Skytap, you have to get your own support contract from Cloudera. And if you don't need all of the nodes up and running all the time, you can turn off the server nodes and just keep their data on storage (which you have to pay for, of course) and then fire them back up when you need them. You can't do this with a real Hadoop cluster, of course.

Single cloud from a single vendor? That's so yesterday...

The biggest crime you can commit with a physical cluster is to not have work for it to do, and even if you turn it off, it is still un-utilized capital. That's why Amazon doesn't believe in private clouds and only believes in shared public clouds. (Well, except when it comes to its own data centers for running its online retail business, of course.)

Brett Goodwin, vice president of marketing at Skytap, says that the company is not making a commitment to offer templates for Hadoop distributions from MapR Technologies or Hortonworks, the two other big commercial disties that sell supported versions of the open source Apache Hadoop stack.

"Cloudera was the obvious choice because they are the number one distribution for Hadoop," says Goodwin. "We're going to roll this out and get a sense from customers [about] what else they might want."

Microsoft Azure has tapped Hortonworks for its Hadoop variant, and Amazon peddles the Elastic MapReduce service - running either kosher open source Apache Hadoop or the M3 or M5 variants of Hadoop from MapR. The latter M5 offers the ability to make HDFS look like (and makes it mountable by) Network File System clients.

The obvious thing for Skytap to do is to help programmers create dev and test environments for Hadoop and then do some kind of conversion of their apps so they can run either on raw EC2 clouds or on the Elastic MapReduce service over at Amazon Web Services. When El Reg brought this up, Goodwin did a little hemming and hawing, and while not making a commitment to officially run production workloads on Skytap, the company admits that some customers already do that and in the long run, this will be officially sanctioned for Hadoop and other workloads.

"Today, where we are seeing demand in for test and development, proof of concepts, and training," explains Goodwin. "We do believe that customers will want to move applications into production, and it will be important for Skytap to provide production environments. But make no mistake. We expect a multi-cloud, federated world. We don't think an application has to be run on a single cloud from a single vendor."

Skytap does anticipate that companies will create and test their Hadoop application on its eponymous cloud and then run the real workloads on internal physical Hadoop clusters, and in some cases, says Goodwin, customers will want to burst from their internal Hadoop clusters out to the Skytap cloud if they are running shy on internal capacity. This cloudbursting is enabled through Skytap's AutoNetworks multi-VPM networking software.

Of course, Bezos could always have Amazon Web Services buy Skytap and keep all of its goodies for itself. This is not just likely but probable if Skytap has better technology for DevOps than Amazon has created in-house for its AWS cloud. Such an acquisition could be relatively expensive, with Skytap raising $23m in three rounds of venture funding, including cash from Bezos Expeditions as well as Ignition Partners, Madrona Venture Group, Washington Research Group, and OpenView Venture Partners. No matter what, you can bet that Bezos doesn't want Skytap to fall into enemy hands – and AWS has a lot of enemies out there these days. ®

The Power of One eBook: Top reasons to choose HP BladeSystem

More from The Register

next story
Apple fanbois SCREAM as update BRICKS their Macbook Airs
Ragegasm spills over as firmware upgrade kills machines
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
Attack of the clones: Oracle's latest Red Hat Linux lookalike arrives
Oracle's Linux boss says Larry's Linux isn't just for Oracle apps anymore
THUD! WD plonks down SIX TERABYTE 'consumer NAS' fatboy
Now that's a LOT of porn or pirated movies. Or, you know, other consumer stuff
EU's top data cops to meet Google, Microsoft et al over 'right to be forgotten'
Plan to hammer out 'coherent' guidelines. Good luck chaps!
US judge: YES, cops or feds so can slurp an ENTIRE Gmail account
Crooks don't have folders labelled 'drug records', opines NY beak
Manic malware Mayhem spreads through Linux, FreeBSD web servers
And how Google could cripple infection rate in a second
prev story


Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Reducing security risks from open source software
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Consolidation: the foundation for IT and business transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.