Big Data is getting too damn big - and nobody is helping to fix this
See that nettle? Time to pop your gardening gloves on, chaps
Storagebod As vendors race to be better, faster and to differentiate themselves in an already busy marketplace, the real needs of the storage teams can be left unmet - and also those of the storage consumer. At times it is as if the various vendors are building dragsters, calling them family saloons and hoping that nobody notices. The problems that I blogged about when I started out blogging seem still mostly unsolved.
Storage management at scale is still problematic; it is still extremely hard to find a toolset that will allow a busy team to be able to assess health, performance, supportability and capacity at a glance. Still, too many teams are using spreadsheets and manually maintained records to manage their storage.
Tools which allow end-to-end management of an infrastructure from rust to silicon and all parts in-between still don’t exist or if they do, they come with large price-tags which invariably do not have a real ROI or a realistic implementation strategy.
As we build more silos in the storage-infrastructure, getting a view of the whole estate is harder now than ever. Multi-vendor management tools are in general lacking in capability with many vendors using subtle changes to inflict damage on the competing management tools.
Data mobility across tiers where those tiers are spread across multiple vendors is hard; applications are generally not currently architected to encapsulate this functionality in their non-functional specifications. And many vendors don’t want you to be able to move data between their devices and competitors' ones - for obvious reasons.
But surely the most blinkered flash start-up must realise that this needs to be addressed; it is going to be an unusual company which will put all of its data onto flash.
Of course this is not just a problem for the start-ups but it could be a major barrier for adoption and is one of the hardest hurdles to overcome.
Although we have scale-out and scale-up solutions, scaling is a problem. Yes, we can scale to what appears to be almost limitless size these days but the process of scaling brings problems. Adding additional capacity is relatively simple; rebalancing performance to effectively use that capacity is not so easy. If you don’t rebalance, you risk hotspots and even under-utilisation.
It requires careful planning and timing even with tools; it means understanding the underlying performance characteristics and requirements of your applications. And with some of the newer architectures that are storing metadata and de-duping, this appears to be a challenge to vendors. Ask questions of vendors as to why they are limited to a number of nodes; there will sheepish shuffling of feet and alternative methods of federating a number of arrays into one logical entity will quickly come into play.
And then mobility between arrays becomes an issue to be addressed.
As arrays get larger, more workloads get consolidated onto a single array - and without the ability to isolate workloads or guarantee performance, the risk of bad and noisy neighbours increases. Few vendors have yet grasped the nettle of QoS and still fewer developers actually understand what their performance characteristics and requirements are.
Despite all efforts to curtail this, we store ever larger amounts of data. We need an industry-wide initiative to look at how we can better curate and manage data. And yet if we solve the problems above, the growth issue will simply get worse ... as we reduce the friction and the management overhead, we’ll simply consume more and more.
Perhaps the vendors should be concentrating on making it harder and even more expensive to store data. It might be the only way to slow down the inexorable demand for ever more storage. Still, that’s not really in their interest.
Sometimes one does wonder why all these problems persist ... ®
Sponsored: Data Loss Prevention & Data Theft Prevention