Feeds

The intelligent data storage imperative

Information Lifecycle Management

  • alert
  • submit to reddit

Boost IT visibility and business value

Analysis Storing data intelligently has suddenly become a major imperative for companies. How you handle replication and redundancy is becoming a critical factor. The reason that Information Lifecycle Management has become one of the focuses of the IT industry is that the amount of data we store is growing at an alarming pace. The world produced about 5 exabytes of new data per year (that's five million terabytes) and the rate of growth is about 30 per cent - at least that's what Hal Valarian's researchers at the University of Berkeley in California tell us. Now some of this is not data that many organizations store much of (video and audio), but a great deal of it is - website content and email being the main culprits.

To add to this we have the simple fact that regulation and compliance in the US are starting to demand that companies keep audit trails of changes to data (or at least important data). The audit trailing of data is likely to multiply the amount of transactional data we store by a factor of two or more. Regulation and compliance in Europe are likely to follow suit and present the same or similar demands. So the amount of data we store is going to continue to grow and just storing it in an intelligent way is going to be problematic. However, there is an issue that needs to be considered. A fair amount of data is simply out of control. Hal Valarian's team estimates that 80 per cent of stored data is replicated, or redundant, or both. This means that there are about five copies (on average) of every single chunk of data. We can readily acknowledge that there should be two copies (the real one and a back-up) and maybe there should be an average of three, because data often needs to be distributed for the sake of further usage or for the sake of performance. But two of the average five copies are probably completely redundant.

It isn't surprising really. It is common within organizations that data is left lying on a disk somewhere because no one dares to delete it, even though everyone is reasonably convinced that it probably isn't required. Unfortunately there is usually no accurate record of why the data exists, although there is usually some way of knowing the last time it was accessed and when it was created. Let's add another fact to the mix: 90 per cent of data on disk is never or seldom accessed after 90 days. Actually, a good deal of it is never accessed after a week. The 90 per cent figure applies to all data, but the data held in databases gets more usage for longer than data held in files and, particularly, data held in email systems. If we summarize the situation then: a good deal of data is redundant (around 40 per cent) and a good deal of data doesn't need to be on-line. So the intelligent archiving of data starts to become an imperative, because the cost of holding data in an archive is lower than holding it on disk. Unfortunately, it isn't so simple because the data that is accessed after 90 days is important data and keeping it available for quick access is important.

So, analyzing the usage of data in a proactive manner, so that it is possible to accurately estimate its future usage pattern is important. To complicate the situation, we have the fact that there are different options for data storage. Physically there is: solid state disk, fast disk, capacity disk, optical disk, near-line tape, far-line tape and non-digital means of storage - the options getting less expensive according to the speed of retrieval. But unless you know the speed that data needs to be made available, it is not possible to organize the sensible flow of data from instantly available to an archived state. Also, back-ups are a natural part of this migration as backed-up data too falls into this data migration cycle and needs to be stored for a specific speed of recovery. To further complicate the situation, the price of the technology is constantly changing. It is moving agreeably downwards, but the cost equation is still complex.

The market for digital tape is gradually being eroded by disk, as disk is a far more reliable medium and the cost per gigabyte is in steep decline. But this needs to be balanced against the fact that most organizations store ever more data - indeed the cost of data storage is usually the most expensive component of data center costs, despite the decline in costs. The complexity of the situation suggests that the more automated the solution the more practical it will be. Indeed, the ideal is to move towards a solution which monitors data growth and is able to predict what type of extra resource is required and when - optimizing a cost equation in the process. As in the previous column I wrote on this topic, it all depends on an analysis of the data resource and the setting of policy in line with what is known. As it happens, there are no vendors out there that have complete out-of-the-box solutions yet, although now the likes of EMC, IBM, Hitachi, StorageTek and the rest are all moving in the direction of getting smart about the problem and treating storage as a "virtual" resource. Indeed if you think in terms of information lifecycle management it is easy to understand EMC's acquisition of Legato and Documentum.

In my view, the Information Lifecycle Management problem will not be resolved by the storage vendors alone, but will ultimately involve the controlled versioning of all data and the attaching of a much richer set of meta data (using XML) to data itself - so that data of any kind knows who created it, when where how and why and also, perhaps, some indication of what its value actually is. This is really the domain of database - although database is still very far from being the natural store for all data. If you are getting the idea that you'll be hearing about Information Life Cycle management for many years to come, then you're probably right. We're only at the beginning of its life cycle.

© IT-analysis.com

Related stories

EMC: making storage simple for Small.biz
Business goes mad for storage
Digital memories: we can forget them for you wholesale!

The essential guide to IT transformation

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
Docker kicks KVM's butt in IBM tests
Big Blue finds containers are speedy, but may not have much room to improve
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Gartner's Special Report: Should you believe the hype?
Enough hot air to carry a balloon to the Moon
Flash could be CHEAPER than SAS DISK? Come off it, NetApp
Stats analysis reckons we'll hit that point in just three years
Dell The Man shrieks: 'We've got a Bitcoin order, we've got a Bitcoin order'
$50k of PowerEdge servers? That'll be 85 coins in digi-dosh
prev story

Whitepapers

5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.