I know for certain what software-defined storage is. It's the new black
Traditional storage sales are shrinking, so move on
I’m not going to re-define software-defined. Instead I’d like to look around and try to make sense of different interpretations and architecture designs that make this claim.
I have intentionally left out some solutions because they are not software-defined for me, while for others, I may have just forgotten to mention them or simply am not aware of them. So, please leave a comment if you have something to add.
Where did software-defined come from?
Well, the term was introduced in the networking space first (thanks to OpenFlow), but I’d say that “software-defined compute” (aka virtualisation in this case) came much earlier than that.
In fact, if you look at the first definition coined on software-defined networking, it was all about the separation of the control plane from the data plane. In other words, something similar to what happens at the computer layer, where you have a controller (e.g. vCenter) capable of defining and managing all the components and polices that rule infrastructures, and the hypervisor (e.g. ESXi) which is simply the mere executor on standard x86 hardware.
But then, as always, things changed. On one side we had Markitectures (just marketing taking advantage of the buzzword of the moment) and on the other, different engineering approaches for the same problem. So everything has become more complicated, just as it has with the networking guys.
And for storage it’s even more complicated. In fact, (excuse the oversimplification), while networking is about data transportation and compute is about working on data, storage is usually about data persistency, consistency, reliability and durability… all characteristics that are in contrast with the practically stateless nature of other infrastructure components, hence it’s more complicated to separate control and data layers.
But let’s talk about software-defined storage
Since we don’t have a single definition of software-defined storage (SDS), let me work with examples to describe different architectures.
One of my favourite categories of SDS is the one that most resembles the original definition from networking. Primary Data is the most interesting example, from my point of view. I met them a couple of weeks ago at SFD10 and here is a video that explains their architecture.
Primary Data really looks like Nicira (now VMware NSX) and promises similar benefits but for storage, with an out-of-band controller and pNFS-based components available on different OSes which manage all the data movements. The controller also includes a smart policy engine which enables you to associate SLAs with single data volumes and helps to automate a vast number of tasks.
Unfortunately, despite Primary Data’s marketing videos, the product has still not reached general availability and, at the moment, it could be of interest only to large customers with several different storage systems who want to normalise their infrastructure. In any case, you should keep your eye on one of these products.
Another type of software-defined storage category that I can think of is the one that includes modern scale-out distributed storage systems. The list of products in this category is very long but they all have a set of common characteristics:
- Use of commodity x86 hardware
- Scale-out, shared-nothing, design
- Strong API-based management interface
Borrowing again from Storage Field Day 10, Datera, Hedvig and Cloudian are all good examples of this category. Even though their solutions differ in features and scope (ranging from object storage to container, OpenStack and VMware data stores) the basics are very similar. In fact, the separation between control and data planes can be found here as well, even if they can co-exist on the same hardware.
In this category I found solutions like Datera very compelling. These are new highly specialised products for containers and cloud storage, with an impressive policy/provisioning engine. It’s quite clear that data is stored independently from the presentation and management/control layer or, to be more clear; once the data volume is provisioned the data path is always the result of a function of the cluster layout, while the policy applied to the volume isn’t fixed and pre-determined as in a traditional storage implementation. Even fault management is run differently from a traditional system, with the volume that lays itself out again to meet its policy goals.
Sounds complicated? It took me a while to to understand it. If you want to know more I strongly suggest you watch this demo from SFD10.
A special mention in this category goes to Coho Data. Maybe the only member of a data-path virtualisation sub-category, even though other vendors (like Hedvig, with its proxy, for example) are working, with alternative approaches, on products to further virtualise data access. Coho has an interesting SDS implementation which leverages SDN at the front-end. Something that could seem complicated but that brings several practical advantages, allowing it to virtualise both the data path and transportation layer.
Last, but not least, I’d like to include an open source solution: Ceph. It is another great example of SDS. And it is very similar to what I’ve already described. It has been maturing quickly lately, thanks to the efforts of Red Hat (and its investments), and the latest release is quite impressive, as is the near term roadmap. It’s no wonder it’s top of the charts when it comes to OpenStack and container storage.
One thing that I think is important for this category of software-defined storage, is that this model really works only at scale. In smaller configurations, with very few nodes, there is limited freedom for policy management and constraints imposed by the cluster layout, which makes it, in practice, quite impossible to separate the various components. But these kinds of systems were born to manage large numbers of clients (containers or VMs) and huge capacities with the burden of traditional storage management.
Another interesting category is, of course, hyper-converged/VSA-based storage. This can be considered a sort of sub-category of what I have described above. It works in a similar way, but it has a specific purpose (serving VMs) and it’s highly integrated with the hypervisor. Examples of this category are everywhere, starting with Nutanix and going down to VMware's VSAN.
For many end users, this is the quintessence of SDS. In fact, in this case, the end user is usually a general purpose sysadmin who carves out the data volumes (or VMDKs) directly from a pool of shared resources. It is the system that does all the heavy lifting for them, meeting the characteristics defined by requested data protection or performance policies or SLAs. The primary goal of this type of product is to make storage transparent, and combining efficiency with ease of use.
Closing the circle
Have I left any categories out? As I mentioned, If I have, please leave a comment detailing the category you have in mind and why you think it is SDS, and where you see the separation between control and data planes. If you are just thinking about a software that you can install on a server, it’s not enough.
SDS is becoming the new black and the trend is clearly visible everywhere, with end users of all sizes. Traditional storage sales have been falling quite drastically for a while now and keeping an eye on how the market is evolving is a must – especially when you want to improve the TB/sysadmin ratio because of the growth of your infrastructure. ®