Big data fitness plan: What's the deal with DX?
Shaping up for transformation
Over the last few months people have stopped saying “digitalization” or “digital transformation” and abbreviating it “DX.” The industry is full of abbreviations and ephemeral jargon, and the most irritating part of this latest addition, for those of us in receiving end of the phrase “DX”, is they don’t know what “DX” means.
Actually, it’s all about, as one definition that I’ve read puts it: “The reworking of the products, processes and strategies within an organization by leveraging current technologies”. Doing stuff with technology instead of by hand, in other words. And as far as technology goes, we’re talking of (as Bill Schmarzo puts it): “Electronic, scientific, data-driven, quantified, instrumented, measured, mathematic, calculated and/or automated [capabilities].”
Data-driven processing is a particular challenge for organisations that have been around for a while. If your company has only been around for a few years you may well have built the infrastructure – including the storage layer – from the ground up, which means it’s probably quite well organised.
Companies that have been around for 20 or 30 years, however, are generally not so lucky: given that SAN technologies such as Fibre Channel have only been around since the mid 1990s, data stores prior to this tended to be application- or department-specific and held in distributed silos of storage.
Even organisations that sit somewhere in the middle, age-wise, are likely to have some level of separation and disconnection in their data storage layer, which makes it tricky to work effectively in a DX world. And don’t think for a moment that there aren’t many of these: there are plenty of markets full of organisations that sweat their kit until it’s ready to keel over (the public sector’s a common example here), and loads that hang on to data so they can mine every last seam (financial services, research houses, and so on).
Why is this? Simple. First, you wouldn’t have all this data if, when you collected it, there wasn’t a purpose for it. Next, if all this data is useful then it’s likely that you can get more value by processing it together than you can by leaving it separate (that is: the whole will usually be more than the sum of the parts).
Data’s an asset, just like your kit
One of the key concepts that’s being increasingly recognised is the concept that data is a business asset in itself – hence the rise in data-driven enterprises. After all, why rely on experience and instinct when you have terabytes of historical data that you can look to for facts?
True, past behaviour isn’t necessarily an indicator of anything in the future: but a lot of the time it is, so make the most of the facts you have to hand. I’ve been a convert for years, since 15 years ago when I consulted for a travel company that went to extraordinary lengths to use quantitative data in its marketing: when the manual and digital systems were run in parallel, the latter showed accuracy improvements of up to 43 per cent over the former.
Do I carry on with distributed data?
To use your data effectively, though, you need your systems to be able to access it. When it lives in silos, you could conceivably configure each of the silos to serve its storage over the network via NFS or CIFS but – depending on how you architect and implement things - this set up could be slow and unwieldy to manage. And yes, you could go a step further and implement a virtualisation layer that lets you present the remote drives as networked low-level devices via (say) iSCSI, but again it becomes a monster to manage.
Bite the migration bullet
If you have legacy silo storage, it’s really time to migrate to a more up-to-date centralised approach. Unless you have ridiculously antique equipment, the chances are that it’s able to connect to networked storage in some way – iSCSI, NFS or CIFS. Hence there should be no issue with shifting the data to a central SAN. And if you’re thinking: “Hang on, he said that NFS and CIFS are slow”: when it’s hosted on a legacy machine with an under-par spec it is, but on a high-speed SAN the rest of the modern high-speed kit will see decent performance.
On-prem or somewhere else?
Well, it’s up to you – because it’s so easy to run hybrid storage these days. And I get the distinct impression – which I’ll admit surprises me a little – that a big chunk of it will be in the cloud. For instance, IDC reckons by the end of 2020 75 per cent of organisations: “Will have core cloud API strategies as part of their DX architectures.”
What should you do? Well, whatever fits your DX intention, because alongside all the new technology – that is, idiotically fast Solid-State Disk (SSD) – the technologies that we’d consider “legacy” are all still available, just in modern form. Vendors of traditional spinning hard disk drives (HDD) continue to develop the mechanical technology, to keep up with the ever-faster drive connections that are in turn being developed to keep up with demand for faster data transfer.
Similarly, even though many of us have moved to Ethernet-based SANS, Fibre Channel is still getting faster over time. Need high-speed random access? Dig down the back of the sofa and invest in lumps of SSD. Need reasonably fast transfer but loads of hosts connected simultaneously to the same data? That’ll be SSD again. Fast transfers but more linear access? On-premise HDD may be the option for you. Big volume without the need to employ extra people to caress it or spend big money on private data centre cabinets and power? Cloud storage.
Burst into tiers ..
You should tier your storage so that each type is married to the applications that need it. As I mentioned earlier, the hardware that lets you present multiple lumps of infrastructure – both internal and cloud – via a unified management interface is now well established. With SSD in particular you can tune the Quality of Service of each presented volume. And again, there are wise people out there who think that the only way is up for hybrid setups: for example, Gartner expects that by 2020, cloud, hosting and traditional infrastructure services will come in more or less at par in terms of spending.
Do I need to dump the legacy kit?
For this one you have to be careful to consider how you define “legacy”. If you consider it to be the creaking, antique hardware with heavily outdated and underpowered specifications, which you’ve been trying to write off for years, the answer is yes – you’ll want to eliminate that because it’s just not accessible.
But no: if it’s networked storage, you shouldn’t chuck it out. If it’s still useful then continue to make use of it until that’s no longer the case – but do what you can to make it integrate as closely as possible with your other storage and particularly try to use a single-pane-of-glass management platform so make it as easy as possible to monitor and tweak it as required.
To sum up, then: the priority is to eliminate distributed silos of data. Bring everything together – by all means present it as different volumes but serve them through a single presentation mechanism – so that your processing engines can work on any or all of the data at once and get the benefits of aggregation.
That done, the next priority is on how you present whatever storage you have – to make it manageable and accessible as something as close to a single interface as you can. If you achieve that, you give yourself the flexibility to aggregate multiple types and generations of storage and to replace back-end kit and shuffle data around to keep up with requirements.
Finally, though, remember that data is king. I’ll never forget being told, back in my university days in a Data Structures lecture, that you should start any development task by figuring out the data structures: once you have that, everything else drops into place pretty easily. So, if you haven’t done so already, take the mental step of thinking “data driven” instead of being driven by the storage architecture.
Understand what data you have, what form it’s in (or could be made to be in), what you want to do with it, and how easy a life you want whilst managing and wrangling it: the architecture will follow.