Speedy storage server sales stumps sysadmin scribe: Who buys this?
Our man Trevor is left with more questions than answers
Storage speed-up: Is it really worth it?
The other big question I have concerns centralised storage acceleration (CSA) products. My company does some work with Proximal Data, so I'm immersed in this world on a regular basis. There are innumerable other products out there but they all boil down to "we will make centralised storage suck less."
Considering how many SANs and NASes are sold every single day, and how many of the things are deployed around the world, the CSA world's claim of "making your centralised storage go faster" seems pretty enticing. Buying new centralised storage is expensive, and it's only the newer (and most expensive) ones that come with the native ability to "insert SSD, watch it go faster." CSA gear isn't going away anytime soon.
The economics of some CSAs, however, fail to make sense to me.
The simple read caches, (VMware's Flash Read Cache, Proximal Data's Autocache and so forth), I understand the value of. These make sense to me because they're: A, dirt simple, and B, they're cheap.
Simple is very important. Add flash, enable software, go faster. There is nothing to "design". There are no architectural considerations. You don't even need to install the CSA software on every host: the more hosts that you bung SSDs into, the faster your centralised storage is for all hosts*.
What's more important, however, is "cheap". This is because every dollar you spend on your read caching solution – be that buying SSDs or the software – is a dollar you're not putting into something else. Like, say, faster centralised storage in the first place.
CSAs of any variety only make economic sense if the cost of deploying them is lower than the cost of upgrading your centralised storage to achieve the same benefit.
This gets even messier in that A (simple) influences B (cheap). The costs that need to be considered are not merely the capital costs of the server flash and the software to make it go, but the operational costs of managing and maintaining the thing.
Time spending tuning your system isn't necessarily time spent wisely
A properly implemented CSA has virtually no operational overhead. Install it, turn it on, and never think about it again. It should automatically make the best use of the flash you feed it. No tinkering with settings, no per-VM allocation; it just works.
Every button you need to push to make your CSA useful is time you could be spending doing something more important, and time is money.
Now, consider that when you walk away from the simple read caches, you layer on complexity. Instead of "push button, receive bacon", you now need to worry about network considerations for replication traffic between nodes. That means additional NICs, switch ports, cabling, network isolation and so forth. Or, if you run your replication traffic on an existing network, you get to add that to your list of things to monitor and fret about what's eating all your throughput.
If a flash disk dies in a simple read cache setup, it sucks a little for the server that is no longer "going faster," but you're not really losing any essential functionality. In some of the more alarming CSAs out there, you now have to worry about getting those SSDs replaced ASAP, otherwise you risk some of the write cache elements not having a redundant copy**.
This leads me to the actual question that has been bugging me about CSAs. What is the value of a centralised storage acceleration product that has all the complexity, risk and design considerations of a full-blown server SAN but doesn't actually store your data within its mesh?
If, for all that complexity and worry, the CSA product merely accelerates centralised storage I already have, why, in an era of VSANs, would I buy it? Especially if the cost of the CSA software in question is the same (or more) than the server SANs available on the market?
Doesn't it make more sense to just build a server SAN, and use the flash cache and data locality features that are part of the server SAN to ensure all your data goes faster, and task your traditional centralised SAN with something else? Or skip the complexity altogether and either use a simple read cache? Or how about just buying a faster SAN in the first place?
None of my tests show there to be a $/IOPS benefit in choosing a CSA product over just using a server SAN***. There can be a $/TB benefit, but the low cost of software-only server SANs, such as Maxta, turn those into edge cases.
So why do the more complicated – and expensive – CSA products out there keep selling?
I don't have an answer to this one. Either I'm missing something pretty fundamental about the economics of this all – the possibility of which is why questions like this bug the hell out of me – or there are a bunch of companies out there making non-optimal purchasing decisions. I'd dearly love to know which it is. ®
* Due to IOPS on hosts with the read cache enabled coming out of the local SSD, there are more IOPS available at the centralised storage device for all hosts. I don't really want to get into the semantics of it here in this article, but it's a real thing. If four out of eight hosts in a cluster have read cache, you will notice the other four speed up.
** Thankfully, not all of the write caching CSAs are this badly designed. Some have the ability to resize cache for VMs on the fly in order to ensure reconvergence is possible given the amount of flash now available to the mesh.
*** OK, that's a small lie. There are some CSAs that turn a chunk of RAM into cache, in addition to flash. If you tune your benchmarks just right they'll show consistently better than server SANs.