NASA drops Ubuntu's Koala food for (real) open source
Open core is not open source: a cautionary tale
NASA is dropping Eucalyptus  from its Nebula infrastructure cloud not only because its engineers believe the open source platform can't achieve the sort of scale they require, but also because it isn't entirely open source.
NASA chief technology officer Chris Kemp tells The Reg that as his engineers attempted to contribute additional Eucalyptus code to improve its ability to scale, they were unable to do so because some of the platform's code is open and some isn't. Their attempted contributions conflicted with code that was only available in a partially closed version of platform maintained by Eucalyptus Systems Inc., the commercial outfit run by the project's founders.
Instead, Kemp's team built their own compute engine and fabric controller from scratch. The new platform — dubbed Nova — has been open sourced under the Apache 2.0 license and is now part of the OpenStack project announced today  by Rackspace.
OpenStack, Kemp says, is an effort to create a Linux-like ecosystem for so-called infrastructure clouds, which are online services that provide on-demand access to compute power and storage capable of scaling as needed. NASA is intent on building Nebula from open technologies not only to avoid the dreaded "vendor lock-in" but also to save a few dollars for the American taxpayer. "Nebula is designed to be both massively scalable and incredibly cheap," Kemp says. "You cannot certify commercial software in Nebula. We're not even going to think about that."
Eucalyptus — a platform that attempts to mimic Amazon's infrastructure cloud  inside private data centers — was created as an open source project at the University of California, Santa Barbara. In the spring of 2009, the founders took the project commercial with the formation of Eucalyptus Systems. The company eventually adopted what CEO Marten Mickos — the former boss of MySQL — calls an "open core" model. With open core, an open code base is joined by a commercial product that includes proprietary software.
With Mickos installed at the helm of Eucalyptus Systems and the software platform finding its way into Mark Shuttleworth's Ubuntu Linux distro — that's why he called it Karmic Koala — Eucalyptus has won a fair amount of open source cred over the past two years. But some open source zealots have questioned whether  an open core outfit is worthy of the open source name.
NASA — along with Eli Lilly — was one of the company's flagship customers. The word is that Nebula will eventually power websites and other services across not only NASA but the entire federal government , and though Eucalyptus repeatedly told us that an announcement related to this expansion across the government was on the way, it never actually arrived.
The company was not available to discuss NASA's switch from Eucalyptus to Nova.
NASA is working to build an infrastructure cloud that spans one million physical machines and 60 million servers — an enormous scale, to be sure — and according to Chris Kemp, Eucalyptus just wasn't designed to reach such levels. "With the architecture of the open source code in Eucalyptus, by our team's analysis, you couldn't get close to that," he says.
"Part of that is that Eucalyptus is a product. It's not a framework. If you need a faster queuing service or a faster database engine, all of that stuff is integrated directly into the Eucalyptus product. You can't pull it apart and replace the queuing engine or networking engine with other systems that are designed to scale better...When you're doing something at exascale, a product designed to work for hundreds of thousands of servers just doesn't work."
Though Eucalyptus was built to mimic the Amazon Web Services cloud — it uses the same APIs — the project's founders never claimed it would scale to Amazon-like levels. In fact, when the company was formed , CTO and University of California, Santa Barbara professor Rich Wolski said it wouldn't — though he indicated that at least part of this involved limitations beyond the software itself.
"One of the misconceptions about Eucalyptus is that it is able to allow an org to compete with Amazon," Wolski told us. "The Amazon AWS [Amazon Web Services] cloud is far more than a collection of software components. It operates on a gigantic scale, multiple time zones, multiple data centers, human resources that must be committed to maintain it so it can operate at that scale. It's not likely — maybe even impossible — that you're going to download something from the internet that is going to be able to operate at that scale. Eucalyptus can’t really be used for that purpose."
In any event, the platform wasn't suited to NASA's needs. And when NASA engineers attempted to contribute additional code to improve its scaling, this didn't work either. "I don't think the average user of Eucalyptus is going to build multiple global data centers on 10 Gigabit networks and manage exa-byte file systems. [Eucalyptus'] target customer is different, and we're not their target customer. We were constantly working around issues and patching things that they didn't want to include back into their code base because it wasn't ideal for their target customer."
As Kemp points out, NASA recently launched a project with Microsoft Research that provides a real-time, high-resolution view of Mars. It draws on over 15,000 tera-pixel photos that are then mosaic-ed into a half billion PNG images via the Nebula cloud. Indeed, the average corporate user isn't likely to require that sort of power.
"NASA, to a certain extent, exists in a class of its own," Kemp says. "We have the fastest peaceful supercomputer here, and certainly, the largest Intel-based supercomputer on the planet, with about 100,000 cores. I think we're up to about a petaflop. We can generate petabytes of data in hours."
And so, about six months ago, NASA engineers went to work on Nova. Pieces of Nebula are still running Eucalyptus, but the entire cloud will eventually move to the new platform. "Nebula is pursuing Nova in the future," Kemp says. "We're doing no further R&D on Eucalyptus...We will continue to run [Eucalyptus] for a few of the projects that are on it. But we're putting all of our investment in Nova. The future roadmap of Nebula is Nova as a compute engine and fabric controller." Kemp adds, however, that Eucalyptus may be used elsewhere at NASA.
In open sourcing Nova under an Apache license and adding it to the OpenStack project, NASA hopes to both foster development from outside coders and create a thriving "ecosystem" of infrastructure cloud platforms. "Now that Nova is included in OpenStack, we are anticipating a much larger community of developers running code that we'll continue to consume in the Nebula project."
OpenStack also includes code open sourced by Rackspace. The hosting provider has donated the compute engine that drives its Cloud Servers service and the on-demand storage platform behind its Cloud Files service. Some Rackspace code will be mixed with Nova, and Rackspace intends to use this, well, super Nova on its production services as well. Meanwhile, NASA will adopt Rackspace's Cloud Files software.
"In the Linux world, there are a lot of different kernels," Kemp says. "I think that the OpenStack will create cloud kernels, if you will. We will take a lot of the code. We will evangelize the development of that code. And we'll add some special sauce to it for Nebula. Rackspace will check the same code out and optimize it for their service, and that'll become a kernel. The exciting part of this project is we're creating a Linux-type ecosystem at that high-level of abstraction at the data center level for the cloud."
The OpenStack project already has the backing of several other big cloud names — including Cloud.com, Cloudkick, Dell, Opscode, and RightScale — as well as hardware names such as Intel, AMD, and Dell. Last week, Rackspace gathered the lot at its Austin, Texas headquarters to discuss the project.
Where does this leave Eucalyptus? Kemp makes it clear that he believes the platform still has its uses in the enterprise. But as Rich Wolski said, it's not a project suited to Amazon-like scale — for many reasons. "There's a difference between open core and open source," says Alex Polvi, the CEO of Cloudkick, an outfit offering a service for overseeing virtual servers running on Amazon-like public clouds as well as in private data centers.
"NASA and Rackspace have no business benefit directly from the code itself...OpenStack is a true open source project, where all the features will be given away for free because this benefits everyone." ®