Tech is the biggest problem facing archiving

Mountains of unreadable obsolete magnetic tapes!

Build a business case: developing custom apps

Blocks and Files Technology is the biggest problem facing archiving. Archives grow bigger and bigger. The amount of data to be kept grows ever bigger and threatens to overflow an archive installation. So, let's use LTO-6 tapes instead of LTO-5 ones because they hold twice as much data in the same physical space.

That's logical but there is an unwanted side effect; LTO-5 drives can read LTO-3, LTO-4 and LTO-5 tapes. LTO-6 drives can read LTOs 4, 5 and 6 but not 3. All the LTO-3 tape contents have to be migrated up to LTO-6 to minimise future migrations. Because when LTO-7 comes along then its drives won't be able to read LTO-4 tapes and all their content will have to be migrated, etc., ad nauseam.

If their content isn't migrated then we can surely expect LTO-3 drive manufacture to cease shortly followed by LTO-3 drive support, break-and-fix skills and spare parts availability to wither away, followed in the fullness of time by LTO-5 support etc., and so it goes. Eventually it will be impossible to read an old tape format.

A significant aspect of archive tape library functionality in the future will almost inevitably need to be the automated migration of earlier tape formats to the newest ones to preserve content readability.

It would be great if the ability to read and write tapes could be divorced from the actual tape media. Note this problem doesn't exist so much with disk drives, because disk and drive are a unity. As long as the interface electronics and software exists (Fibre Channel or SAS or SATA) and as long as there is software that can interpret the data format on the drives … it's a different flavour of the same problem.

Newer versions of Word cannot read documents produced with older versions of Word. It also seems inevitable that, before long, some archive software will include old application and system software version plug-ins so that old data can be restored from an archive in human-readable format. Of course, there are only two ways to present data to the Mk 1 eyeball; as numbers, text and diagrams on a display of some sort or as printed marks on paper.

EMC Vatican Library Video still

Vatican Library; five centuries of stored paper, still readable by the Mark 1 Eyeball (screenshot)

The screen version is effectively an analogue of the paper version, and it is paper that is the enduring archive medium. Stick that in front of the Mark 1 eyeball and the jolly old inter-cranial computational unit will do its job.

Advancing storage technology, including hardware, system software and application software, gets in the way of this. It would be better if the digital archive medium contained as few steps between what the eyeball needs to see and the actual storage medium as possible, while still having the advantages of a digital medium's storage density.

That would then tend to reduce the side-effect exposure to technology advances.

Royal Dutch Petroleum Dock in E Indies

Royal Dutch Petroleum dock in the former East Indies (now Indonesia)

100 year archive

The problem is actually a very large one. Take the second-biggest company in the world in revenue terms; Shell, properly known as Royal Dutch Shell, which came into being 106 years ago. It has, in effect, a 100+ year archive consisting almost entirely (bar the last few years) of paper documents.

Let's imagine a 100-year tape archive. How would that work?

A little over every two years its tape format would advance a generation. LTO-1 was announced in 2000 with a 100GB capacity. Now, 13 years later, we have LTO-6 with 2.5TB capacity; that's 6 format generations over 13 years. Even delaying the format transitions for the archive to every five years (instead of every two years, as recent history shows us) would mean 20 tape format transitions in a century.

As the archive capacity mounted up the bulk of the tape archive's work would increasingly consist of migrating the contents of old tapes to new ones. It would present ever fewer of its resources in response to archive users' data access requests. The bulk of the cost of the archive would be spent internally, having it chase its own data migration tail, and its cost per user access would skyrocket.

We are not even thinking yet about how Word 2113 would be able to read a Word 2013 format document; that sort of problem would have to be dealt with possibly by a constant ongoing content format migration as well.

In a word, this is nonsense.

Unless we reach a stage where archival technology becomes as stable as paper and printing had been for decades, centuries even, then we cannot, unquestioning, keep all the data we digitally collect. The oldest, least-wanted data, will have to be let go, deleted. Unless there is a clear need to keep it then some kind of digital filtering mechanism will have to be used to scrap the least-wanted data and delete it.

The archive will have to be trawled by digital spider-bots; data killers, looking for useless data and destroying it to make space for wanted data.

Somebody could make a business out of taking this old data and storing it in a kind of digital deep-freeze for potential re-activation. Maybe this could be on the Moon, in a nuclear-powered mega-flash-vault, with plenty of space to expand; there'll always be another crater ... but this is science fiction.

The real moral of this tale is that virtually no data is needed for ever. Big data bigots' mad claims notwithstanding, digital archives will have to be regularly cleared. Physical space runs out; digital space runs out; formats change; applications change; and preserving access to older and older data will become crushingly expensive.

Technological change might come up with a solution to this problem, but it's a problem created by that very process of technological change. Beware what you wish for. ®

Boost IT visibility and business value

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
VVOL update: Are any vendors NOT leaping into bed with VMware?
It's not yet been released but everyone thinks it's the dog's danglies
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
prev story


Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.