Will open source storage make the hyperscale dream real?
One small Ceph, or one giant leap?
Open-source software has become an important presence in many areas of IT, and now, as storage increasingly becomes software-defined storage, it is storage's turn. The darling of the open-source storage movement – though it is by no means the only viable and popular option – is Ceph.
A unified storage platform originally developed for a PhD dissertation in California, Ceph has become one of the most popular, if not THE most popular, storage layers for OpenStack deployments, and with OpenStack leading the cloud computing charge, Ceph is benefiting considerably.
It has also been helped by acquiring significant vendor backing along the way, because while many users are happy to take the software and rely on community support, of which there is a significant amount, many others require paid professional support for their open-source projects. Ceph acquired this in spades when Inktank Storage, the company founded by its developer Sage Weil to provide commercial services and support for the project, was bought in 2014 by Linux developer Red Hat.
As well as Red Hat, which then brought Ceph development in house, the platform acquired yet more professional backing when it was adopted by several hardware and systems suppliers – most notably Fujitsu, but there are other, smaller vendors too – to power their hyperscale storage appliances.
So what is driving Ceph's adoption among serious-minded organisations, and why Ceph in particular? Especially when we can choose from many other free or open-source storage platforms that have hyperscale ambitions, for example Gluster, Lustre, MogileFS, OpenStack's Swift, Cinder and Manilla, and Skylable.
Picking up an inktank
Partly it is of course that it is software-defined storage, as mentioned above, plus it is both open-source and enterprise-grade. But the other vital aspect is that it is unified storage, providing object, block and file services all off a single underlying storage system. For comparison, look at how OpenStack has chosen to implement the three separately, while most of the other open-source hyperscale options are just object storage.
“Ceph competes with Swift for cloud object storage, but that's just one of the Ceph use-cases – and usually we are up against the proprietary folks,” says Sage Weil, Ceph's inventor, now with Red Hat as Ceph principal architect. “Cinder is just a broker API that presents a generic interface for accessing block storage – it doesn't provide any storage itself, so it's an enabler and not a competitor.”
Of course, there are good reasons why those developers have mostly gone the object route, most notably that it maps well onto the demands of hyperscale storage. In essence, hyperscale storage involves storing vast quantities of information – often petabytes and beyond – on media capable of increasing in size rapidly, efficiently, and indefinitely.
Hyperscale differs from traditional enterprise storage in several ways, most obviously its sheer scale, but also its application loads. Typically, hyperscale will serve more users with fewer applications, whereas enterprise storage supports more applications but fewer users.
Hyperscale storage also tends to be software-defined, using automation to minimise the amount of admin and other human involvement needed, modular and scale-out, so it can be expanded almost indefinitely by adding nodes to a cluster, and optimised for maximum raw capacity and minimum cost per petabyte on commodity storage.
Oh, and just to top it off, these systems have finally moved beyond frankly outdated technologies such as RAID, where the huge growth in disk capacities has long outstripped the ability to rebuild a failed RAID group in a short enough time to stay safe and reliable. Instead, Ceph for instance stripes and replicates individual files across multiple nodes for higher throughput, and this also helps it become fault-tolerant and self-healing. In addition it replicates frequently accessed objects to provide an element of load balancing.
The challenge with all of this, as Evaluator Group senior analyst Eric Slack points out, is the disconnect between the “hyperscale dream” and what's really feasible for the average organisation. “People read about what Amazon, Facebook and Google are doing and say 'That's cool, we should be doing this,'” he says. “Then reality sets in. Those hyperscalers have armies of smart people writing software, and they buy thousands of units. Enterprises don't.”
One solution is to take the hyperconverged route, but this generally means vendor lock-in, Slack argues. The alternative for companies facing the same issues as the hyperscalers, but at a smaller scale and without the resources, is to take an appliance approach, he says. He points to a host of companies bundling hyperscale software-defined storage with hardware, services and support, for example HP with the StoreVirtual technology that it bought with LeftHand Storage.
“The vendors all understand that while they would love to replace traditional infrastructure with hyperconverged, it's not going to happen,” he says. “Companies are moving away from big boxes towards purpose-built stuff. They aren't doing consolidation projects, they're saying 'Gosh, if we can stand up a private cloud, all the applications we put on that are going to be easier to run.'”
For those convinced by the open-source-with-commercial-support story, this is where Ceph comes in. “Ceph is our general multipurpose and object storage,” says Nick Gerasimatos, cloud development director at FICO, a US predictive analytics company that helps companies manage risk and fight fraud. “We also use SolidFire all-flash arrays – when we get a new application, we try it on Ceph first, load-test it and so on, and if it works we leave it there.”
Gerasimatos says that moving to an OpenStack and Red Hat Ceph-based cloud has helped FICO reduce time to market by 50 per cent and lower costs by 30 per cent, compared to its previous legacy-based infrastructure. He adds that it has also helped the company transform into a software- and platform-as-a-service (SaaS/PaaS) provider, where customers can both use FICO's tools and applications, and build them into their own applications and services.
“FICO's legacy environments were very much EMC, NetApp and so on,” he adds. “We didn't like that it's closed source, we like to be able to adjust the values, etc. Plus if you want to switch vendors you have to do a complete migration. With Ceph we just deploy more nodes and move the data over.”