Original URL: http://www.theregister.co.uk/2010/02/11/redhat_cloud_forum_projects/

Red Hat projects to seed cloudy IT

Raining support money (someday)

By Timothy Prickett Morgan

Posted in Developer, 11th February 2010 07:02 GMT

Let's get one thing straight. We don't like the term cloud computing any more than you do

Of course, Richard Stallman doesn't like when we call it Linux rather than GNU/Linux. He's gotta live with Linux. And, well, we've gotta live with cloud computing. It's not going away.

Commercial Linux and middleware distributor Red Hat is, like other platform providers, trying to get money from IT departments that buy software. But Red Hat can't say that. For one, the company can't sell software because that violates open source licensing, and two, it's too boring to just come out and say that.

In this day and age, making a sale seems to mean convincing IT execs that your company has a full stack of virtualized servers and storage than can be shaped into pie-in-the-sky, super-low-cost, easier-to-administer and program, cloudy infrastructure. I say this mainly because of the way most IT vendors talk, not because of the results they get.

Debating whether or not cloud computing is a good term - or whether or not this is a revolution or evolution - is pointless. Though it can be fun. Utility computing is a much better term for what we are moving towards as far as I am concerned, and anyone who has been paying attention to IT since it was called data processing knows that so-called cloud computing is just an evolution. Brian Stevens, chief technology officer and vice president of engineering at Red Hat, said as much as he kicked off the second Open Cloud Computing Forum.

"It's really about the users, about allowing end users to standing up applications quickly," Stevens said. And yes, standing up is a new usage in the IT lingo for wrapping up applications in virtual machine wrappers and getting them running on virtualized hardware. "It is really a model that supports on demand, not just in provisioning, but in un-provisioning. Cloud is all the rage, but it is just a natural evolution of where computing is heading with or without the name cloud."

Yes, he said un-provisioning.

The problem with all of this cloud speak, explained Stevens, is that IT departments might be lulled into thinking that this stuff actually works yet in a manner that is safe for enterprise-class deployments. To be sure, 80 to 90 per cent of the clouds out there are using open source tools, but no two clouds seem to be built the same way. There are evolving and competing sets of standards.

"There a lot of great demoware out there," Stevens said of cloudy infrastructure, "but most customers are in the virtualization phase." He meant they're not quite ready for the very fluid kind of IT that cloudy infrastructure implies. And the good thing for Red Hat - which wants to make money selling support for open source infrastructure software - is that there is time yet to get a complete cloud stack together and wrap it all up, much as Red Hat did with the Linux kernel and a bunch of other useful bits to create Advanced Server Linux in May 2002.

Red Hat today trotted out some interesting projects that are part of its cloudy efforts. Many of them have their origins in the JBoss middleware stack, interestingly enough, since JBoss had to wrestle with some of the same provisioning, scalability, and storage issues that cloud infrastructure does.

The first project that Red Hat discussed at the Open Cloud Computing Forum was called BoxGrinder, which is a tool being cooked up by Red Hat to "grind out" server configurations for the multitude of virtualization fabrics out there.

According to Bob McWhirter, chief architect of cloud computing for JBoss middleware, BoxGrinder got its start because Red Hat needed to be able to crank out cloud-ready images of JBoss projects. While McWhirter contributes to these projects, BoxGrinder is part of a suite of products called StormGrind that are being managed by Marek Goldmann.

McWhirter has plenty of projects of his own as the founder of Codehaus, a repository for open source projects, Drools, a rules engine, Groovy, a programming language for the Java platform, and TorqueBox, an application platform for Ruby that runs atop JBoss.

BoxGrinder hooks into RPM repositories and allows programmers or system administrators to quickly grab and package up the front end, application server, and database tiers of an application stack inside virtualized appliances and dispatch them as a connected package for deployment on virtual infrastructures.

Right now, BoxGrinder can create servers based on Fedora Linux and will support Red Hat Enterprise Linux in the future (my guess is when RHEL 6 debuts sometime around the middle of this year). BoxGrinder doesn't just pull the RPMs out of the repository. It also allows you to allocate processors, memory, and disk to the virtual machines for each part of the stack with just a couple of lines and even use so-called Just Enough Operating System (JEOS) skinnied down Linuxes instead of the full Fedora or RHEL stack.

What is BoxGrinding?

BoxGrinder can create collections of virtualized n-tier software stacks for a number of different virtualized targets, including Red Hat's own KVM hypervisor as well as the several variations of the Xen hypervisor plus VMware's ESX Server, Oracle's VirtualBox, and Amazon's EC2 public cloud. Appliances can be created in 32-bit or 64-bit mode.

These n-tier stacks are called portfolios in the JBoss StormGrind lingo. McWhirter said that you can build a raw image with the commands in about four minutes for KVM and that it takes about six minutes for ESX Server. BoxGrinder exposes its appliance building commands as a network-accessible service through REST commands. The REST features will allow virtualized infrastructure to be designated as capacity for a build farm, so you can even tell cloudy infrastructure to build cloudy infrastructure. A BoxGrinder Studio edition is in development to add a Web-based graphical user interface to the tool.

Red Hat is also cooking up a bunch of file systems to support cloudy computing, and Jeff Darcy, the principal software engineer working on a Red Hat's Cloud Filesystem, said that developers and system architects in the cloud era had to "get used to polyglot persistence," by which he meant a variety of different means of storing data for applications: relational databases for account information, so-called NoSQL databases for metadata and logs, file systems, or archival storage for large objects and in-memory data grids for faster access to information.

Darcy was a bit vague about what Red Hat's Cloud Filesystem would look like, but he said that it would be based on an existing parallel file system and that it would be developed in three phases. The first step was to take a parallel file system and make it scale better, have better security, and support multi-tenancy. In the second phase, the Cloud Filesystem project will be extended for wide area networks, and in the final phase, the tool will have interfaces for desktops and laptops so these machines can store their data on cloud-based infrastructure.

No word on when this Cloud Filesystem will debut, but Darcy said "hopefully we will be able to produce something in fairly short order." It doesn't look like Red Hat will be using the Voldemort file system from LinkedIn, the Cassandra file system from Facebook (now part of Apache), or the MongoDB from 10gen, but Darcy walked attendees through the issues each one of these file systems have and the compromises their designers made as they addressed application issues on clouds.

Manik Surtani, principal software engineer at Red Hat, is steering a related project called Infinispan, which has come into being at Red Hat because "databases and clouds don't like each other." Clouds are inherently stateless and ephemeral animals, Surtani explained, and scalability is key.

The trouble is, the very things that make databases trustable repositories of information limit their scalability and create single points of failure in cloudy infrastructure. And so what happens? People put MySQL on EC2 and maybe they use Amazon's Elastic Block Storage (EBS) or maybe they make snapshots of MySQL databases to the S3 services. "These are, for lack of a better word, hacks," Surtani said.

So instead of a database hack on the cloud, Red Hat wants to cook up an in-memory data grid, which it is calling Infinispan. Memory, of course, is several orders of magnitude faster than disk access, and that makes it better than Google's DataStore or the open source Hadoop disk-based grid storage as far as Surtani is concerned. Disk access is inherently serial, while memory access can be done in parallel across many nodes on virtualized server infrastructure.

Infinispan looks like a tweaked version of JBoss Cache, the tool for clustering and caching JBoss middleware, and that's because it has some of the same features and code. But Infinispan has a set of new APIs that make it useful for more than middleware and that intend to transform it into a generic data store, and Surtani said it really is mostly new code. Data residing in Infinispan is organized in Map-like structures as opposed to the tree structure used in JBoss Cache, and it us optimized for multicore processors and faster remote calls between systems.

Infinispan borrows plenty of features from JBoss Cache, including JTA transactions, JMX reporting, MVCC locking, and query and indexing. But Infinispan has a hash-based data distribution methodology, which means there are replicated data sets across the memory in the cloud for the sake of resiliency and disaster recovery, but not zillions of copies, and moreover, these are done in memory and are therefore very, very fast.

Infinispan will not just support Java, but also applications written in C, C++, C#, and other languages thanks to support of the Memcached caching protocol and a new two-way binary protocol called HotRod. Infinispan also as a distributed execution environment. "You can access MapReduce-style work in a very simple way," said Surtani.

The in-memory data grid can also spill over data into file systems or databases if it starts to run out of room in memory, or dump data into Amazon S3 or Rackspace vCloud clouds if you want.

Surtani said that Infinispan was "sexy" because it had "transparent horizontal scalability" and was "elastic in both directions." With fast, low-latency data access and the ability to address a very large data heap for Java and other applications, Surtani was excited about Infinispan, and added that it "was free and didn't suck."

That pretty much sums up the open source movement, now doesn't it? Right up to the moment when that voluntary tech support bill shows up.

Infinispan is getting close to being launched into the community for wider testing and distribution. Red Hat has been testing it on server clusters with 20 or 30 nodes, but the project finally got access to a cluster with 1,000 nodes to see how far Infinispan can scale and tweak it for performance.

If you don't have much work this week, you can watch the 14 sessions for the forum here. ®