We pick storage brains: Has object storage endgame started?
Too many companies, too few projects.... right?
Interview IBM buying Cleversafe could mark the start of the endgame for independent object storage suppliers. We talked to Philippe Nicolas, who was Scality’s Director of Product Strategy until earlier this year and is now a storage industry advisor. We asked him questions about the state of the object market and its suppliers.
His replies have been edited for brevity.
El Reg: Is object storage consolidation starting? Or has a slow consolidation been taking place over many years?
Philippe Nicolas: There are probably too many companies for too few multi-PB projects, meaning that not all companies are going to survive. The consolidation started slowly with the Bycast acquisition by NetApp in 2010, then nothing really happened for a time. Red Hat then acquired Inktank, developer of Ceph, for $175M in 2014, HGST swallowed Amplidata, and now IBM has bought Cleversafe. I thought a while ago that the first acquisition move would trigger a wave of acquisitions. I think it has started with the Red Hat acquisitions and we should see more of them.
El Reg: What role has Amazon's S3 played in the development of object storage?
Philippe Nicolas: Clearly Amazon has acted as the enabler for object storage, and everyone is referring to Amazon at least for online storage services and the API. Amazon demonstrated that object storage is a highly scalable answer to large projects and pretty attractive in terms of price. Other solutions, even if they started before Amazon, were very low-profile at that time.
Many vendors referred to the famous Amazon Dynamo paper* that serves as reference and positioning; it was published in 2007. It's even more compelling today with Amazon's extensions to other storage classes, such as Amazon S3 Standard – Infrequent Access (aka Standard – IA), and of course, Amazon Glacier. This is obviously the de-facto standard. For an object storage vendor, it's a must. Not supporting Amazon's S3 API is considered a serious drawback.
El Reg: What role will OpenStack Swift play in the object storage market?
Philippe Nicolas: OpenStack Swift participates in the education of the market and has made some huge progress in terms of features and capabilities since its inception. The community is very active and the general dynamism of the object storage market is a result of this big community pressure.
It's a good point, especially with all vendors behind it, but many of them, and even some users, got disappointed by the product's capability and its scalability.
OpenStack Swift suffers from lack of maturity. In fact, the product has real scalability limitations, but could be deployed for reasonable-size projects. It is a real act of democracy in favor of the object storage wave, and based on open source.
But other open source products exist such as OpenIO, Red Hat with Inktank Ceph, Sheepdog, SwiftStack (very close to Swift), and soon Minio, to name just a few. So OpenStack won't have things its own way.
El Reg: Why has the SNIA's CDMI (Cloud Data Management Interface) been a crashing failure?
Philippe Nicolas: The SNIA CDMI was a great initiative but has suffered from lack of big vendor adoption and promotion. It's always difficult for an established player to accept a standard that ultimately means that it could be potentially replaced by a new emerging player.
So CDMI was essentially promoted and supported by small emerging players who thought that CDMI would fix the standard API problem. But several small vendors didn't join in, like Amplidata, Caringo, Cleversafe, or Cloudian.
In fact, today we have 3 standards: Amazon S3 as a de-facto "market" standard, OpenStack Swift as a community standard promoted by a large group of companies, and SNIA CDMI as the "industry standard." The order for adoption and market interest is obvious to everyone: S3, Swift, then CDMI. CDMI was a superb idea, identifying a real need, but market conditions changed and it lost support.
El Reg: Does IBM's acquisition of Cleversafe represent a turning point for the object storage industry and, if so, what does this mean?
Philippe Nicolas: IBM finally recognized the big hole in their product line [with nothing] as robust and scalable as Cleversafe. IBM picked the best product – no doubt, Cleversafe is a pioneer in that category, invented a ton of things and always served as the reference for all vendors.
It was pretty strange that all big server vendors didn't have any strong object storage product, look at IBM, HP, Dell, or Oracle ... Are they really blind to what the market needs? Until now ... only the independent large storage vendors have an answer, like EMC, NetApp, HDS, or DDN.
This is also because object storage is a different approach perfectly aligned with the Software-Defined Storage movement, you do storage with servers – commodity ones most of the time – so vendors have to understand and accept that. It's no longer about selling arrays or filers from a storage business unit, but now involves a deep link with the server division. It has created some friction in many companies.
The story is different now for IBM, they can offer a very robust cloud storage service and address enterprise and on-premises vertical projects. It's the same for Dell with the strong EMC portfolio; these two acquisitions eventually should accelerate the decision for the remaining big players.
For Oracle, it's a bit different, they got Sun with StorageTek in the basket several years ago, acquired Nirvanix IP with a real culture of independent software. So they're in a good position to address market needs. This is the case with their battle against the giant Cloud providers, it's just announced during their Open World conference two Cloud storage services: an Archive one based on tape to compete against Amazon Glacier or Google Nearline, and also a Storage service like Amazon S3 or Google Standard storage with a solution compatible with OpenStack Swift, but not based on Swift.
El Reg: What are the possible exit strategies for the remaining VC-funded object storage start-ups now that NetApp has Bycast, HGST bought Amplidata, HDS bought Archivas, and IBM bought Cleversafe?
Philippe Nicolas: I'm really convinced that the clock starts now for Caringo, Cleversafe, Cloudian, Compuverde, Scality, and potentially even for Basho. There are more object storage vendors than potential buyers like HP, Cisco, or Seagate.
We all know that none of the object storage vendors will become a storage giant. They really need to find an exit rapidly, as IBM and Dell have some good options now. It reminds me of the dedupe movement a few years ago with Avamar, Data Domain, and Ocarina. Think about the famous debate about whether dedupe is a product or a feature – there's an analogy here.
Imagine a situation where HP, Cisco, and potentially Seagate acquired someone; they will be able to compete against other large vendors. What then would happen to the small remaining independent players? They will have no real choice except "verticalization" of their product, as all general deals will be fought for by the big players. If not, they will be out of the game sooner or later or become a zombie company. For that reason we'll soon count losers.
El Reg: How would you describe EMC's object storage developments and why hasn't EMC bought an object storage startup?
Philippe Nicolas: EMC has everything internally – the finance, the talented development team, and several products. It acquired Filepool, the Belgium pioneer of CAS, in 2001 and they quickly grew to own the CAS market. EMC anticipated a need and delivered the right solution at that time.
Then it developed different object flavours: Atmos, then an object access layer on top of Isilon, ViPR, and ECS to name a few solutions. It's probably too wide a product set, not very specific and very difficult to articulate by a hardware vendor when object storage is seen as a software technology.
EMC is still a hardware vendor who develops software for their systems. The difference between acting as an ISV (Independent Software Vendor) and developing software for your own product line resides in the capability for the first to be deployed on different hardware.
The perfect image here is Veritas Software and backup vendors. EMC didn't need any object storage solution from the outside, it was more an internal and positioning challenge, Atmos was pretty good, ECS is excellent, EMC had to learn from other vendors' strategies.
The best example here is DDN, which offers an end-to-end solution from primary to secondary to object storage. These products cohabit very well and you still have the option to penetrate accounts with each product separately in DDN's storage hierarchy.
El Reg: How do Ceph and Gluster relate to object storage, and will they succeed?
Philippe Nicolas: Ceph and Gluster were very attractive approaches and participated in the commodity acceleration of the market. Their successes were based on their open source nature, the capability for integrators to build services around them, and for users to directly own the product.
... It was all about community, knowledge, and open source.
Both of them got acquired by Red Hat, who faced a different dilemma from these two players: Red Hat wanted to create real differentiators against other Linux distributions and they believe storage is one of these. This is the reason why Gluster and Ceph were much more visible before, when they were independent, than since they've belonged to the Red Hat portfolio. It gives them stability and reduces some potential risks for any large customer to adopt these two products, but at the same time they became less prominent. If you ask object storage vendors, they'll say they see and meet less Ceph than before the acquisition.
El Reg: Why hasn't open source object storage become mainstream?
Philippe Nicolas: It's important to keep in mind that object storage solutions are essentially for capacity-oriented projects and remote access, not for primary or demanding environments where you see more block and file-based products.
Becoming mainstream is also associated with becoming a de-facto standard due to the number of projects, deployment, etc. Object storage is limited in terms of usage: it is not generic, not for everyone, every application, and every workload. Many vendors forget that and think they can address other needs by just adding a NAS head to compete against dedicated file server players, and they hit the wall.
El Reg: Why hasn't IBM been active in the object storage market?
Philippe Nicolas: IBM didn't address projects where object storage was a need, and they also found some alternatives such as GPFS, now Spectrum Scale.
Multiple pressures have attacked IBM: software-defined storage, Open Source, OpenStack, and Cloud providers that all contribute to shake IBM up and reveal a need for it to react. Storage is more than just hardware, being a cloud service, or software deployed on commodity servers, for example. Wow, it was a shock for IBM but you can't beat the market. I think a few key items participated in its realization: OpenStack, the SoftLayer acquisition, Ceph and Gluster being picked up by Red Hat, and also the pressure from giant cloud providers such Amazon, Google, and Microsoft.
IBM has had deep and serious gaps without any solutions for several years. In object storage they were naked, so selecting Cleversafe was a fantastic move.
El Reg: Are large-scale file systems destined to be NAS heads on top of a base object storage system layer?
Philippe Nicolas: Your question is about access methods, file and object access are two different things. File access requires the application to be close enough to the storage server due to protocol latency, and remote access relies on http-type logic.
Vendors made progress with WAN Optimization/Acceleration methods and of course File Access Gateways, but you don't access geographically distant file servers with file sharing protocols. So you need this special additional element that translates and converts files to http verbs. What you refer to is a possibility with local NAS heads with one or multiple geo-dispersed object storage back-ends.
El Reg: Will IBM's GPFS make progress against object storage competition?
Philippe Nicolas: This question is interesting but as IBM will be able to offer the Cleversafe solution, I hope they will arbitrate this challenge internally and position each of these differently, but potentially linked together like DDN does with its end-to-end approach.
El Reg: Will direct-addressed, key:value disk drive, like Seagate's Kinetic, provide faster object stores?
Philippe Nicolas: Object storage is pretty complex with tons of indirection that impacts performance. The fact that you can delegate to sub-systems some work, object copies for instance, has clearly some advantages. You can build very very large storage repositories and still maintain this notion of independence for performance and failure isolation. It's a real good innovation promoted by several vendors, I hope it won't finish up like OSD (Object-based Storage Devices) from a few years ago.
El Reg: Will they provide more disk capacity to the object store than ordinary disk drives, since there is no file system layer, with its overhead, between the object and the disk?
Philippe Nicolas: Outside of Kinetic, you can find object storage solutions with or without disk file systems on the drives. For instance DDN and Caringo use disk in raw mode, and Cloudian uses a file system on each disk. In both cases, this mode is completely hidden and doesn't impact the behavior of the system. Both of them have their rationale. For sure, vendors who use raw devices have better capacity usage, but at the same time they have to replace the file system and manage data placement.
Now with Kinetic, the idea is a bit similar to the raw approach, but the logic is controlled by the device itself. The disk controller has an embedded logic that frees servers and provides additional benefits. Space could be one of them, but this is not the motivation; it provides an elementary intelligent brick that provides better global service and processing close to the location of the data, and we can expect a better scalability factor with this approach.
So some vendors will have more work to do to support these drives. Outside of the system, nothing changes and users and applications don't see any modifications.
El Reg: What is the future for object storage?
Philippe Nicolas: My thoughts are fourfold. The first is about a full software stack perfectly represented by the Software-Defined Storage approach if we want to build a storage service from a commodity server farm.
The second is really that object storage will be more and more just an access protocol and not an architecture design. In fact, IT people don't really pay attention to the internal design but rather to the application's behavior they deliver to users.
So http interfaces are key [to] whatever is the internal structure, it means that object storage will be replaced by object protocols or interfaces. When you use Amazon, you don't care about the back-end architecture, your applications interact with the access layer – Amazon's S3 API – and it's the key integration component.
I believe object storage will become a feature and in that way it will join the commodity movement promoted by large vendors.
Third, object storage is more dedicated to secondary storage needs even if we can see some implementations in other use cases, but pretty limited ones.
Fourth, and this is more a philosophical comment, you can influence the market but the market is always right. ®