EMC's ViPR is great ... for other vendors, at least
Amitabh Srivastava talks to our bod about object storage
Storagebod Storagebod recently had the chance to chat to Amitabh Srivastava, head of EMC’s Advanced Software division and one of the principal architects of ViPR.
Srivastava is not a storage guy - in fact, his previous role with Microsoft sticks him firmly in the compute/server camp - but cited his experience in building out the Azure Cloud offering with helping him appreciate the problems that storage and data face going forward.
Building a dynamic and agile storage environment is hard and it’s not a solved problem yet. Storage, and more importantly the data it holds, has gravity: or, as I like to think of it, long-term persistence. Compute resource can be scaled up and down; data rarely has the idea of scaling down and generally hangs around. Data Analytics just means that our end-users are going to hug data for longer. Thus you’ve got this heavy and growing thing … it’s not agile but there needs to be some way of making it appear more agile.
You can easily move compute workloads and it’s relatively simple to change your network configuration to reflect these movements, but moving large quantities of data around is no trivial matter – well, at speed anyway.
Enterprise storage gets even more convoluted
Large enterprise storage environments are heterogeneous environments. Dual supplier strategies are common; sometimes to keep vendors honest, but often there is an acceptance that the different arrays have difference capabilities and use-cases. Three or four years ago, I thought we were heading towards general purpose storage arrays.
We now have more niche and siloed capabilities than ever before. Driven by developments in all-flash arrays, commodity hardware and new business requirements, the environment is becoming more complex.
Storage teams need a way of managing these heterogenous environments in a common and converged manner.
And everyone is trying to do things better, cheaper and faster, operational budgets remain pretty flat, headcounts are frozen or shrinking. Anecdotally, from talking to my peers, it seems that arrays are hanging around longer and refresh cycles have lengthened somewhat.
EMC’s ViPR is an attempt to solve some of these problems.
Can you lay a new access protocol on top of already existing and persistent data? Can you make it so that you don’t have to migrate many petabytes of data to enable a new protocol? And can you ensure that your existing applications and new applications can use the same data without a massive rewrite?
The access protocol in this case is object - and for some people object storage is a religion. “All storage should be object, why the hell do you want some kind of translation layer,” they say. Unfortunately, life is never that simple.
If you have a lot of legacy applications running and generating useful data, you probably want to protect your investment and continue to run those applications but you might want to mine that data using newer applications. This is heresy to many but reflects today’s reality. If you were starting with a green-field, all your data might live in an object store, but migrating a large existing estate to an object store is just not realistic as a short term proposition.
ViPR enables your existing file-storage to be accessible as both file and object. Srivastava also mentioned block, but I struggle with seeing how you would be able to treat a raw block device as an object in any meaningful manner. Perhaps that’s a future conversation.
But in the world of media and entertainment, I could see this capability being useful; in fact, I can see it enabling some workflows to work more efficiently, so an asset can be acquired and edited in the traditional manner. Then we could move into play-out as an object with rich metadata, but without moving around the storage environment.
Srivastava also discussed the possibilities of being able to HDFS your existing storage, for example allowing analytics to be carried out on data-in-place without moving it. I can see this being appealing, but challenges around performance, locking and the like become – well, challenging.
Moving to an era where data persists but is accessible in appropriate ways without copying, ingesting and simply buying more and more storage is very appealing. I don’t believe that there will ever be one true protocol; multi-protocol access to your data is key. Even in a world where everything becomes objects, there will almost certainly be competing APIs and command-sets.
The more real part of ViPR - when I say real, I mean it is the piece I can see a huge need for today - is the abstraction of the control-plane and making it look and work the same for all the arrays that you manage. Yet, after the abomination that is Control Center, can we trust EMC to make storage management easy, consistent and scalable?
Srivastava has heard all the stories about Control Center, so let’s hope he’s learned from our pain... The jury doesn’t even really have any hard evidence to go on yet, but the vision makes sense.
The sting in the tail
EMC have committed to openness around ViPR as well. I asked: what if someone implements your APIs and makes a better ViPR than ViPR?
Amitabh was remarkably relaxed about that. They aren’t going to mess about with APIs for competitive advantage and if someone does a better job than them, then they deserve to win. They obviously believe that they are the best if they're doing that; if we move to a pluggable and modular storage architecture, where it is easy to drop in replacements without disruption, they'd better be the best.
A whole ecosystem could be built around ViPR. EMC believe that if they get it right, it could be the on-ramp for many developers to build tools around it. They are actively looking for developers and start-ups to work with ViPR.
Instead of writing tools to manage a specific array it should be possible to write tools that manage all of the storage in the data centre. Obviously this is reliant on either EMC or other storage vendors implementing the plug-ins to enable ViPR to manage a specific array.
Will the other storage vendors enable ViPR to manage their arrays, thus increasing the value of ViPR? Or will it be left to EMC to do it? Well, at launch, NetApp is already there. I didn’t have time to drill into which versions of OnTap, however, and this is where life could get tricky; the ViPR-control layer will need to keep up with the releases from the various vendors.
But as more and more storage vendors are looking at how their storage integrates with the various virtualisation stacks, consistent and early publications of their control functionality becomes key. EMC can use this as enablement for ViPR.
If I was a startup, for example, ViPR could enable me to fast-track management capability of my new device. I could concentrate on storage functionality and capability of the device and not on the periphery management functionality.
So it’s all pretty interesting stuff, but it’s certainly not a forgone conclusion that this will succeed - and it relies on other vendors coming to play. We need the tools that will enable us to manage storage at scale, keeping our operational costs down and not having to rip and replace.
A snake's nest of competing efforts
How will the other vendors react? I have a horrible suspicion that we’ll just end up with a mess of competing attempts and it will come down to the vendor who ships the widest range of support for third-party devices. Before you dismiss this as just another attempt from EMC to own your storage infrastructure, if a software vendor had shipped/announced something similar, would you dismiss it quite so quickly? ViPR’s biggest strength and weakness is ... EMC.
EMC has to prove its commitment to openness and that may mean that in the short term it will do things that seriously assist its competitors at some cost to its business. I think that it needs to almost treat ViPR like it did VMware; at one point, it was almost more common to see a VMware and NetApp joint pitch than one involving EMC.
Oh, and EMC also has to ship a GA product. And probably turn a tanker around. And win hearts and minds, show that it has changed…
Finally, let’s forget about Software Defined Anything, let’s forget about trying to redefine existing terms. It doesn’t have to be called anything... we are just looking for BSM&C (better storage management and capability). Hang your hat on that… ®
Sponsored: Benefits from the lessons learned in HPC