Tech is the biggest problem facing archiving

Mountains of unreadable obsolete magnetic tapes!

Beginner's guide to SSL certificates

Blocks and Files Technology is the biggest problem facing archiving. Archives grow bigger and bigger. The amount of data to be kept grows ever bigger and threatens to overflow an archive installation. So, let's use LTO-6 tapes instead of LTO-5 ones because they hold twice as much data in the same physical space.

That's logical but there is an unwanted side effect; LTO-5 drives can read LTO-3, LTO-4 and LTO-5 tapes. LTO-6 drives can read LTOs 4, 5 and 6 but not 3. All the LTO-3 tape contents have to be migrated up to LTO-6 to minimise future migrations. Because when LTO-7 comes along then its drives won't be able to read LTO-4 tapes and all their content will have to be migrated, etc., ad nauseam.

If their content isn't migrated then we can surely expect LTO-3 drive manufacture to cease shortly followed by LTO-3 drive support, break-and-fix skills and spare parts availability to wither away, followed in the fullness of time by LTO-5 support etc., and so it goes. Eventually it will be impossible to read an old tape format.

A significant aspect of archive tape library functionality in the future will almost inevitably need to be the automated migration of earlier tape formats to the newest ones to preserve content readability.

It would be great if the ability to read and write tapes could be divorced from the actual tape media. Note this problem doesn't exist so much with disk drives, because disk and drive are a unity. As long as the interface electronics and software exists (Fibre Channel or SAS or SATA) and as long as there is software that can interpret the data format on the drives … it's a different flavour of the same problem.

Newer versions of Word cannot read documents produced with older versions of Word. It also seems inevitable that, before long, some archive software will include old application and system software version plug-ins so that old data can be restored from an archive in human-readable format. Of course, there are only two ways to present data to the Mk 1 eyeball; as numbers, text and diagrams on a display of some sort or as printed marks on paper.

EMC Vatican Library Video still

Vatican Library; five centuries of stored paper, still readable by the Mark 1 Eyeball (screenshot)

The screen version is effectively an analogue of the paper version, and it is paper that is the enduring archive medium. Stick that in front of the Mark 1 eyeball and the jolly old inter-cranial computational unit will do its job.

Advancing storage technology, including hardware, system software and application software, gets in the way of this. It would be better if the digital archive medium contained as few steps between what the eyeball needs to see and the actual storage medium as possible, while still having the advantages of a digital medium's storage density.

That would then tend to reduce the side-effect exposure to technology advances.

Royal Dutch Petroleum Dock in E Indies

Royal Dutch Petroleum dock in the former East Indies (now Indonesia)

100 year archive

The problem is actually a very large one. Take the second-biggest company in the world in revenue terms; Shell, properly known as Royal Dutch Shell, which came into being 106 years ago. It has, in effect, a 100+ year archive consisting almost entirely (bar the last few years) of paper documents.

Let's imagine a 100-year tape archive. How would that work?

A little over every two years its tape format would advance a generation. LTO-1 was announced in 2000 with a 100GB capacity. Now, 13 years later, we have LTO-6 with 2.5TB capacity; that's 6 format generations over 13 years. Even delaying the format transitions for the archive to every five years (instead of every two years, as recent history shows us) would mean 20 tape format transitions in a century.

As the archive capacity mounted up the bulk of the tape archive's work would increasingly consist of migrating the contents of old tapes to new ones. It would present ever fewer of its resources in response to archive users' data access requests. The bulk of the cost of the archive would be spent internally, having it chase its own data migration tail, and its cost per user access would skyrocket.

We are not even thinking yet about how Word 2113 would be able to read a Word 2013 format document; that sort of problem would have to be dealt with possibly by a constant ongoing content format migration as well.

In a word, this is nonsense.

Unless we reach a stage where archival technology becomes as stable as paper and printing had been for decades, centuries even, then we cannot, unquestioning, keep all the data we digitally collect. The oldest, least-wanted data, will have to be let go, deleted. Unless there is a clear need to keep it then some kind of digital filtering mechanism will have to be used to scrap the least-wanted data and delete it.

The archive will have to be trawled by digital spider-bots; data killers, looking for useless data and destroying it to make space for wanted data.

Somebody could make a business out of taking this old data and storing it in a kind of digital deep-freeze for potential re-activation. Maybe this could be on the Moon, in a nuclear-powered mega-flash-vault, with plenty of space to expand; there'll always be another crater ... but this is science fiction.

The real moral of this tale is that virtually no data is needed for ever. Big data bigots' mad claims notwithstanding, digital archives will have to be regularly cleared. Physical space runs out; digital space runs out; formats change; applications change; and preserving access to older and older data will become crushingly expensive.

Technological change might come up with a solution to this problem, but it's a problem created by that very process of technological change. Beware what you wish for. ®

Beginner's guide to SSL certificates

More from The Register

next story
The cloud that goes puff: Seagate Central home NAS woes
4TB of home storage is great, until you wake up to a dead device
Fat fingered geo-block kept Aussies in the dark
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
Intel offers ingenious piece of 10TB 3D NAND chippery
The race for next generation flash capacity now on
Want to STUFF Facebook with blatant ADVERTISING? Fine! But you must PAY
Pony up or push off, Zuck tells social marketeers
Oi, Europe! Tell US feds to GTFO of our servers, say Microsoft and pals
By writing a really angry letter about how it's harming our cloud business, ta
Astro-boffins start opening universe simulation data
Got a supercomputer? Want to simulate a universe? Here you go
prev story


Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Getting ahead of the compliance curve
Learn about new services that make it easy to discover and manage certificates across the enterprise and how to get ahead of the compliance curve.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.