Red Hat projects to seed cloudy IT
Raining support money (someday)
What is BoxGrinding?
BoxGrinder can create collections of virtualized n-tier software stacks for a number of different virtualized targets, including Red Hat's own KVM hypervisor as well as the several variations of the Xen hypervisor plus VMware's ESX Server, Oracle's VirtualBox, and Amazon's EC2 public cloud. Appliances can be created in 32-bit or 64-bit mode.
These n-tier stacks are called portfolios in the JBoss StormGrind lingo. McWhirter said that you can build a raw image with the commands in about four minutes for KVM and that it takes about six minutes for ESX Server. BoxGrinder exposes its appliance building commands as a network-accessible service through REST commands. The REST features will allow virtualized infrastructure to be designated as capacity for a build farm, so you can even tell cloudy infrastructure to build cloudy infrastructure. A BoxGrinder Studio edition is in development to add a Web-based graphical user interface to the tool.
Red Hat is also cooking up a bunch of file systems to support cloudy computing, and Jeff Darcy, the principal software engineer working on a Red Hat's Cloud Filesystem, said that developers and system architects in the cloud era had to "get used to polyglot persistence," by which he meant a variety of different means of storing data for applications: relational databases for account information, so-called NoSQL databases for metadata and logs, file systems, or archival storage for large objects and in-memory data grids for faster access to information.
Darcy was a bit vague about what Red Hat's Cloud Filesystem would look like, but he said that it would be based on an existing parallel file system and that it would be developed in three phases. The first step was to take a parallel file system and make it scale better, have better security, and support multi-tenancy. In the second phase, the Cloud Filesystem project will be extended for wide area networks, and in the final phase, the tool will have interfaces for desktops and laptops so these machines can store their data on cloud-based infrastructure.
No word on when this Cloud Filesystem will debut, but Darcy said "hopefully we will be able to produce something in fairly short order." It doesn't look like Red Hat will be using the Voldemort file system from LinkedIn, the Cassandra file system from Facebook (now part of Apache), or the MongoDB from 10gen, but Darcy walked attendees through the issues each one of these file systems have and the compromises their designers made as they addressed application issues on clouds.
Manik Surtani, principal software engineer at Red Hat, is steering a related project called Infinispan, which has come into being at Red Hat because "databases and clouds don't like each other." Clouds are inherently stateless and ephemeral animals, Surtani explained, and scalability is key.
The trouble is, the very things that make databases trustable repositories of information limit their scalability and create single points of failure in cloudy infrastructure. And so what happens? People put MySQL on EC2 and maybe they use Amazon's Elastic Block Storage (EBS) or maybe they make snapshots of MySQL databases to the S3 services. "These are, for lack of a better word, hacks," Surtani said.
So instead of a database hack on the cloud, Red Hat wants to cook up an in-memory data grid, which it is calling Infinispan. Memory, of course, is several orders of magnitude faster than disk access, and that makes it better than Google's DataStore or the open source Hadoop disk-based grid storage as far as Surtani is concerned. Disk access is inherently serial, while memory access can be done in parallel across many nodes on virtualized server infrastructure.
Infinispan looks like a tweaked version of JBoss Cache, the tool for clustering and caching JBoss middleware, and that's because it has some of the same features and code. But Infinispan has a set of new APIs that make it useful for more than middleware and that intend to transform it into a generic data store, and Surtani said it really is mostly new code. Data residing in Infinispan is organized in Map-like structures as opposed to the tree structure used in JBoss Cache, and it us optimized for multicore processors and faster remote calls between systems.
Infinispan borrows plenty of features from JBoss Cache, including JTA transactions, JMX reporting, MVCC locking, and query and indexing. But Infinispan has a hash-based data distribution methodology, which means there are replicated data sets across the memory in the cloud for the sake of resiliency and disaster recovery, but not zillions of copies, and moreover, these are done in memory and are therefore very, very fast.
Infinispan will not just support Java, but also applications written in C, C++, C#, and other languages thanks to support of the Memcached caching protocol and a new two-way binary protocol called HotRod. Infinispan also as a distributed execution environment. "You can access MapReduce-style work in a very simple way," said Surtani.
The in-memory data grid can also spill over data into file systems or databases if it starts to run out of room in memory, or dump data into Amazon S3 or Rackspace vCloud clouds if you want.
Surtani said that Infinispan was "sexy" because it had "transparent horizontal scalability" and was "elastic in both directions." With fast, low-latency data access and the ability to address a very large data heap for Java and other applications, Surtani was excited about Infinispan, and added that it "was free and didn't suck."
That pretty much sums up the open source movement, now doesn't it? Right up to the moment when that voluntary tech support bill shows up.
Infinispan is getting close to being launched into the community for wider testing and distribution. Red Hat has been testing it on server clusters with 20 or 30 nodes, but the project finally got access to a cluster with 1,000 nodes to see how far Infinispan can scale and tweak it for performance.
If you don't have much work this week, you can watch the 14 sessions for the forum here. ®
Sponsored: Hyper-scale data management