Tech is the biggest problem facing archiving

Mountains of unreadable obsolete magnetic tapes!

Boost IT visibility and business value

Blocks and Files Technology is the biggest problem facing archiving. Archives grow bigger and bigger. The amount of data to be kept grows ever bigger and threatens to overflow an archive installation. So, let's use LTO-6 tapes instead of LTO-5 ones because they hold twice as much data in the same physical space.

That's logical but there is an unwanted side effect; LTO-5 drives can read LTO-3, LTO-4 and LTO-5 tapes. LTO-6 drives can read LTOs 4, 5 and 6 but not 3. All the LTO-3 tape contents have to be migrated up to LTO-6 to minimise future migrations. Because when LTO-7 comes along then its drives won't be able to read LTO-4 tapes and all their content will have to be migrated, etc., ad nauseam.

If their content isn't migrated then we can surely expect LTO-3 drive manufacture to cease shortly followed by LTO-3 drive support, break-and-fix skills and spare parts availability to wither away, followed in the fullness of time by LTO-5 support etc., and so it goes. Eventually it will be impossible to read an old tape format.

A significant aspect of archive tape library functionality in the future will almost inevitably need to be the automated migration of earlier tape formats to the newest ones to preserve content readability.

It would be great if the ability to read and write tapes could be divorced from the actual tape media. Note this problem doesn't exist so much with disk drives, because disk and drive are a unity. As long as the interface electronics and software exists (Fibre Channel or SAS or SATA) and as long as there is software that can interpret the data format on the drives … it's a different flavour of the same problem.

Newer versions of Word cannot read documents produced with older versions of Word. It also seems inevitable that, before long, some archive software will include old application and system software version plug-ins so that old data can be restored from an archive in human-readable format. Of course, there are only two ways to present data to the Mk 1 eyeball; as numbers, text and diagrams on a display of some sort or as printed marks on paper.

EMC Vatican Library Video still

Vatican Library; five centuries of stored paper, still readable by the Mark 1 Eyeball (screenshot)

The screen version is effectively an analogue of the paper version, and it is paper that is the enduring archive medium. Stick that in front of the Mark 1 eyeball and the jolly old inter-cranial computational unit will do its job.

Advancing storage technology, including hardware, system software and application software, gets in the way of this. It would be better if the digital archive medium contained as few steps between what the eyeball needs to see and the actual storage medium as possible, while still having the advantages of a digital medium's storage density.

That would then tend to reduce the side-effect exposure to technology advances.

Royal Dutch Petroleum Dock in E Indies

Royal Dutch Petroleum dock in the former East Indies (now Indonesia)

100 year archive

The problem is actually a very large one. Take the second-biggest company in the world in revenue terms; Shell, properly known as Royal Dutch Shell, which came into being 106 years ago. It has, in effect, a 100+ year archive consisting almost entirely (bar the last few years) of paper documents.

Let's imagine a 100-year tape archive. How would that work?

A little over every two years its tape format would advance a generation. LTO-1 was announced in 2000 with a 100GB capacity. Now, 13 years later, we have LTO-6 with 2.5TB capacity; that's 6 format generations over 13 years. Even delaying the format transitions for the archive to every five years (instead of every two years, as recent history shows us) would mean 20 tape format transitions in a century.

As the archive capacity mounted up the bulk of the tape archive's work would increasingly consist of migrating the contents of old tapes to new ones. It would present ever fewer of its resources in response to archive users' data access requests. The bulk of the cost of the archive would be spent internally, having it chase its own data migration tail, and its cost per user access would skyrocket.

We are not even thinking yet about how Word 2113 would be able to read a Word 2013 format document; that sort of problem would have to be dealt with possibly by a constant ongoing content format migration as well.

In a word, this is nonsense.

Unless we reach a stage where archival technology becomes as stable as paper and printing had been for decades, centuries even, then we cannot, unquestioning, keep all the data we digitally collect. The oldest, least-wanted data, will have to be let go, deleted. Unless there is a clear need to keep it then some kind of digital filtering mechanism will have to be used to scrap the least-wanted data and delete it.

The archive will have to be trawled by digital spider-bots; data killers, looking for useless data and destroying it to make space for wanted data.

Somebody could make a business out of taking this old data and storing it in a kind of digital deep-freeze for potential re-activation. Maybe this could be on the Moon, in a nuclear-powered mega-flash-vault, with plenty of space to expand; there'll always be another crater ... but this is science fiction.

The real moral of this tale is that virtually no data is needed for ever. Big data bigots' mad claims notwithstanding, digital archives will have to be regularly cleared. Physical space runs out; digital space runs out; formats change; applications change; and preserving access to older and older data will become crushingly expensive.

Technological change might come up with a solution to this problem, but it's a problem created by that very process of technological change. Beware what you wish for. ®

The essential guide to IT transformation

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
Docker kicks KVM's butt in IBM tests
Big Blue finds containers are speedy, but may not have much room to improve
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Gartner's Special Report: Should you believe the hype?
Enough hot air to carry a balloon to the Moon
Flash could be CHEAPER than SAS DISK? Come off it, NetApp
Stats analysis reckons we'll hit that point in just three years
Dell The Man shrieks: 'We've got a Bitcoin order, we've got a Bitcoin order'
$50k of PowerEdge servers? That'll be 85 coins in digi-dosh
prev story


5 things you didn’t know about cloud backup
IT departments are embracing cloud backup, but there’s a lot you need to know before choosing a service provider. Learn all the critical things you need to know.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.