This article is more than 1 year old
How to be certain about your data in an uncertain future
Be a speedboat, not an oil tanker. And don’t let the CEO read Forbes
Data-aware storage
Some companies are taking the idea of abstracting services from infrastructure to the next level, with data-aware storage. This makes the storage layer more aware of the nature of the data it is holding, providing administrators with still more flexibility. Last summer, analyst firm Taneja group held a webinar about data-aware storage, and highlighted three characteristics. It captures attributes of the storage stored on the device, looking for contextual patterns to identify sensitive data, for example, or singling out files meeting certain parameters. The storage layer might know that a file is low-latency video, and understand where best to store it based on that information.
Data-aware storage must also have real-time analytics, so that it can provide useful information about how the data is being used, ideally by particular applications. If it senses that an application has increased its IOPS significantly, it might be able to use this information, along with its understanding of the data, to solve the problem early on. Finally, it must provide services to help manage that data, such as balancing quality of service across different workloads, and it should make these available programmatically via APIs, which again highlights the ‘infrastructure as software’ idea.
El Reg has discussed some companies in the data-aware storage layer space before, such as Primary Data, which launched its DataSphere product last year. The company uses a policy engine to manage data according to objectives set by the business (it calls them ‘service level objectives’). This theoretically makes it possible for IT admins to encode some of those new business requirements as they emerge and have the storage layer do all the work behind the scenes.
This flavour of data-aware storage is also being touted as a silo-busting technology because it enables data to be moved across different storage types, ranging from NAS to fibre-connected SANs and direct-attached storage.
Consider data formats
Making storage infrastructure aware of the data it’s storing enables IT administrators to set and change policies over time, and can also make migration easier. It’s a relatively new concept in data architectures, though, and given that companies are still struggling with marketing terms such as software-defined storage, adoption will be slow. In the meantime, there’s another layer of your data architecture to consider – the application storage format. For years, transactional applications have relied mostly on relational formats, but non-relational formats are on the rise.
NoSQL databases represent a way to store and reference data in non-relational formats. In the context of futureproofing your data architecture against future requirements, their primary benefit is flexibility.
Relational databases are rigid, using schemas that are collections of tables containing columns that must be decided up front. That makes it difficult to change data structures on the fly, said Bob Weiderhold, CEO of NoSQL firm Couchbase. Schemas may have to be rebuilt entirely when changes need to be made.
“In many companies, that can take six months,” he said. “With NoSQL, it’s schema-less.”
Instead of a pre-defined schema, NoSQL data formats allow inferred schemas, which simply acknowledge new attributes as they’re added to records, he explained. You want to store the name of a customer’s parakeet? Just start adding that key value pair to records as you create them. No redesigning of tables are necessary, because there are no tables. Various NoSQL approaches do this in different ways. Couchbase stores records in nested documents, typically using JSON to represent different fields and values).
Define first, join later
What this means is that companies don’t have to be as structured about their data models, said Jon Cooke, head of data science at GFT, which designs data architectures for financial services clients.
“What we’re helping banks do is create flexible data platforms that can respond to new events in the marketplace without having to re-architect the entire data model,” he said. “What you should be doing is defining your data elements, your business entities, in isolation without forming some sort of spaghetti diagram. Join them when you actually ask the business questions.”
NoSQL may not be right for everyone. Typically, it lacks the ACID transaction capabilities necessary for drawing together lots of separate systems in a single transaction.
“Increasingly we’re moving more towards that but today you wouldn’t run your financial transactions per se on this platform,” said Sean O’Dowd, global financial services director at MapR, which provides Hadoop solutions. But then, there are lots of things that don’t need to be ACID-compliant, in finance and everywhere else.
If your business requirements and demands on your data are rapidly evolving, or if your data volumes are rapidly growing, then it may be worth looking at NoSQL as one in a range of future-proofing tools for your data architecture.
Agility rests on the ability to rapidly reconfigure the storage and structure of data. To properly set the scene for that, companies should be exploring not just virtualization but also programmatic allocation of resources. Even with all that in place, you may not be able to pivot instantly to cope with rapidly-evolving business conditions. Still, you’ll be moving in the right direction. ®