Objects! Aaah-ah ... the savior of software-defined storage?
Performance, capacity, management and more reasons why they're great
Comment Software-defined storage (SDS) is one of those terms that has been readily hijacked by vendors over the past few years.
The term developed from the adoption of software-defined networking (SDN), used to define the separation of control and data traffic in the networking world, which provides the abstraction needed to deliver more efficient network management and to virtualise network functionality.
Where SDN was reasonably easy to define, SDS has been less clear. Looking at the SDS Wikipedia page, there is far less detail there than on the page for SDN, with only a vague definition of what SDS characteristics should be.
I’ve attempted to add my own definition and discussed the subject at TECHunplugged in Austin, Texas, earlier this year (see slide 13 in this deck).
Part of the problem with finding an adequate definition is that data storage has two components: both a persistent side for storing and recalling data, and a transmission side to cover how data passes from host to external storage. SDN by contrast only has to worry about the data transit definitions, so has fewer concerns around performance and throughput as far as an individual host is concerned. To add to the confusion, storage is moving back into the server with hyper-converged solutions, making it more difficult to come up with a consistent definition.
Object storage as a bridgehead
Looking at how object storage has developed over the last 6-7 years, we’ve seen many entrants come to the market that are purely software-based. NooBaa is one of the newest start-ups to this market, launching at VMworld in August 2016. However Scality, Cleversafe, Caringo, Cloudian, OpenIO and Ceph are all purely software-based (even if they are resold with hardware by vendors).
There are few vendors with products that are hardware focused (notably DDN, although it does a software offering too).
So why has object storage been more of a natural fit for SDS? Here are some thoughts:
- Performance – object stores are less dependent on performance, specifically the performance and latency of each individual I/O. Storing and retrieving objects is more focused on data throughput than latency, which is much easier to achieve in a scale-out model. Data can be scattered over many nodes, with any individual failure having less impact on the overall performance of an individual request. There aren’t usually many tiers of storage in object stores, so data can be widely distributed across nodes without direct concern for individual I/O performance.
- Capacity – object stores are designed for very large capacity and that by nature implies commodity hardware. No one wants to pay standard block-based vendor pricing for object storage systems. The economics of the data access profile and the data itself mean much of the data may be inactive and not justify expensive storage.
- Management – object stores are almost exclusively driven using web-based protocols (HTTP/S and REST) and managed with web GUIs. This nicely suits the SDS management definitions.
- Improvements in technology – the evolution of server components (processors, memory, bus speeds) means object stores can be built reliably from commodity components and implement fault tolerance at the disk and/or node level. Processor improvements mean functions like compression and erasure coding can be achieved with standard x86 CPUs rather than dedicated hardware.
SDS for all
Looking at wider storage usage, file and block-based SDS solutions already exist, but perhaps don’t have the same adoption rates as object storage. Unfortunately, there aren’t any figures to corroborate this, apart from looking at how the hyper-converged market has grown, with Gartner predicting a market size of $2 billion this year and $5 billion by 2019. SDS underpins hyper-converged solutions and many SDS vendors have pivoted to cover hyper-convergence in their offerings. So SDS is on the cusp of widespread adoption.
The architect’s view
There’s an assumption that hyper-converged solutions will subsume the traditional storage market, however I think we’re likely to see a big uptake in SDS. Dedicated hardware will (eventually) move to being as niche as the mainframe, but won’t totally disappear. The more interesting trend will be how pricing changes for storage. The existing model of $/GB charged differently by tier is hard to justify (and police) for a pure software solution, making it a flat rate per node, per GB and/or per feature. Essentially SDS pricing will move to align with that in the public cloud. ®