Secondary storage, the missed opportunity for object storage
The backend just ain't important any more
Comment I've had a lot of conversations lately with vendors and end users about secondary storage - the scenario is quickly changing, with new products targeting this space and more end users adopting different forms.
Object storage crossed my radar a few years ago, and even though its potential remains huge, radical changes are occurring which are altering the landscape as a consequence of a new class of smarter solutions.
"Flash & Trash," remember? But that refers to trash not as in rubbish in landfill, but trash as a resource, like in the recycling process, for example.
I don't want to repeat myself here (and this article could help to set up the background), but secondary storage is a huge opportunity for the future: in the range of 80/85 per cent of capacity installed and 50 per cent of overall storage revenues.
It has the advantages, for the end user, of coming from a lower $/GB, scalability, and many other features designed to lower TCO while readying organisations of any size for big data and IOT.
Object storage has failed expectations
Many object storage vendors have made the same mistakes in the past. They were focused on a basic technology level without looking at customer needs. These vendors spent most of their resources in pure object storage, good for a few massive deployments and next-generation cloud apps, but not for traditional large- and medium-sized organizations.
It is clear to everyone that the number of hyper-scale end users is very limited. Trying to sell big infrastructures to a few customers is not going to last long, especially if the user is big enough to design and build its own infrastructure components.
If, a few years ago, selling in the range of 0.5 to 10PB was only for a handful of vendors, now there are plenty of products capable of doing well in that area. This is no longer a problem of scaling in capacity, which is taken for granted; end users look at other aspects like flexibility, ease of use, security, data services, analytics, performance and integration with the rest of the infrastructure. Users want the horizontal infrastructure I've described in the past, which is capable of serving many different needs concurrently. Most, if not all, vendors are now fully aware of this, but not all have a product ready to meet this need.
There are exceptions. For example, not everyone is aware that 50 per cent of Scality revenues already come from scale-out NAS (same backend, just different protocols exposed.). Others (HDS and DDN) have built an ecosystem around their object storage. But if your business is based on a single product, which is quickly becoming a feature, and you are not thinking about real customer needs, then I tend to think your chances of having a prosperous future are quite low.
After the last round of acquisitions, and with HPE investing $10m in Scality while waiting to understand what to do, it is clear that object storage is just a component (or a back-end technology) that's flowing into something bigger, more interesting and more aligned with real user needs.
Secondary storage (three examples)
Without always referring to the usual suspects, I'd like to bring your attention to three different examples to explain what I mean: Caringo, Ctera and Ceph. [Quick disclaimer: I did a speech at Ctera SKO in January. And I've written a paper for Caringo in the past.]
I don't even know if you can consider it a startup anymore (funded in 2005 and with more than 500 customers), it started with object storage very early but the latest add-ons to its product are the ones that are triggering my interest.
It is working on two fronts. On one side you have FileFly, which is a software component you install on your Windows file servers. The idea is that FileFly, thanks to a good centralised policy engine, leverages Swarm as a single huge repository for the entire organization, leaving only a cache or stubs on local servers.
I'm over-simplifying here, and there are several use cases, but let me start from the easiest scenario. In this case, not only is the server protected and optimised (think about, for example, if this server is connected to a SAN, the real footprint on primary storage becomes minimal), but you obtain a form of DR too – especially if the server is in a remote office.
The second part is more interesting. Now, Swarm provides a search portal which can perform (and save) sophisticated queries on every piece of object metadata saved in the cluster. It becomes easy to search the entire file domain, even for the most distributed organisations. I like the idea but I'd like to see it improved with full content searches in the future.
Swarm is becoming more and more capable of serving different applications. HFDS and NFS support for example – it's not unique but again, the object store is just the back end that makes it possible to expose multiple protocols and ready-to-use solutions in a software-defined storage fashion.
Ceph is another example that is worth a mention. At the beginning, the goal of an open source, state-of-the-art, scale-out, unified, software-defined storage system seemed just too much to achieve, and its biggest problem was exactly this: visionary but too immature at the same time.
Then Red Hat acquired Inktank and things drastically changed. A small group of enthusiasts has become a large vibrant community. I hadn't paid much attention to Ceph for some time, but after a short chat with Red Hat last week I did my homework and the quantity and quality of code submissions on the official repositories is amazing.
For example, I think Ceph is currently a strong option for OpenStack installations and is deployed by organisations of every size – and, one of its biggest advantages, it can be deployed by anyone with a minimum of Linux skills for free, and grow from there.
The ecosystem around it is also interesting. For example, I like a solution developed by a small company I met a few months ago, Outpace.IO, which is also working on interfacing single disks with a small CPU/RAM card to run OSDs directly on the disk – a compelling idea.
Ceph has a great (and unexplored) potential in many different areas. It is still immature for some use cases, but has great potential nonetheless. The big difference between Ceph and the majority of object storage systems on the market today is that Ceph has been designed to be multi-protocol.
Ctera is not a storage system, but it is one of the few storage products capable of delivering what private cloud storage promises, and end users love it. To be fair, Ctera can work on any kind of back end from AWS S3 to on-premises NAS, but realistically, I think that object storage is its perfect match. The first time I had the opportunity to look into its products was at its start, with a remote backup/NAS solution with cloud back end. Now it has a complete end-to-end line of products delivering enterprise Sync & Share, remote and cross-cloud data protection.
My point here is that the back end is virtually non-influential (Ctera also provides concurrent multi-cloud support and cloud-to-cloud migration capabilities). Ctera has done a great job in supporting all the possible back ends and now almost all object storage vendors rely on it to provide this kind of service. This is the solution while the object store is just a repository. Good luck object storage vendors, what is your differentiator here?
Closing the circle
Now that enterprises are seriously looking at large-scale storage repositories and object stores, object storage is becoming less relevant – it's now just an access method. Once again, real world enterprises are more interested in ready-to-use solutions and not just in the enabling technology. In this case, Ctera, Ceph and Caringo are examples of products/vendors that are going in the right direction.
Most of the vendors have finally understood what is happening, but for some of them it's too late. For those that have finally discovered that the world is not made of exabytes but 1,000s of petabytes, it's too much of a problem scaling down and maintaining a balanced architecture (especially now that we have 10TB disk drives). On the other hand, others are working to become the horizontal platform I've mentioned many times but, again, the product was not designed with this in mind and now it's hard to catch up and become credible in the eyes of the user.
Exceptions exist, of course, but look around – they are very few in number. ®