The Register® — Biting the hand that feeds IT

Feeds

Tech is the biggest problem facing archiving

Mountains of unreadable obsolete magnetic tapes!

Free ESG report : Seamless data management with Avere FXT

Blocks and Files Technology is the biggest problem facing archiving. Archives grow bigger and bigger. The amount of data to be kept grows ever bigger and threatens to overflow an archive installation. So, let's use LTO-6 tapes instead of LTO-5 ones because they hold twice as much data in the same physical space.

That's logical but there is an unwanted side effect; LTO-5 drives can read LTO-3, LTO-4 and LTO-5 tapes. LTO-6 drives can read LTOs 4, 5 and 6 but not 3. All the LTO-3 tape contents have to be migrated up to LTO-6 to minimise future migrations. Because when LTO-7 comes along then its drives won't be able to read LTO-4 tapes and all their content will have to be migrated, etc., ad nauseam.

If their content isn't migrated then we can surely expect LTO-3 drive manufacture to cease shortly followed by LTO-3 drive support, break-and-fix skills and spare parts availability to wither away, followed in the fullness of time by LTO-5 support etc., and so it goes. Eventually it will be impossible to read an old tape format.

A significant aspect of archive tape library functionality in the future will almost inevitably need to be the automated migration of earlier tape formats to the newest ones to preserve content readability.

It would be great if the ability to read and write tapes could be divorced from the actual tape media. Note this problem doesn't exist so much with disk drives, because disk and drive are a unity. As long as the interface electronics and software exists (Fibre Channel or SAS or SATA) and as long as there is software that can interpret the data format on the drives … it's a different flavour of the same problem.

Newer versions of Word cannot read documents produced with older versions of Word. It also seems inevitable that, before long, some archive software will include old application and system software version plug-ins so that old data can be restored from an archive in human-readable format. Of course, there are only two ways to present data to the Mk 1 eyeball; as numbers, text and diagrams on a display of some sort or as printed marks on paper.

EMC Vatican Library Video still

Vatican Library; five centuries of stored paper, still readable by the Mark 1 Eyeball (screenshot)

The screen version is effectively an analogue of the paper version, and it is paper that is the enduring archive medium. Stick that in front of the Mark 1 eyeball and the jolly old inter-cranial computational unit will do its job.

Advancing storage technology, including hardware, system software and application software, gets in the way of this. It would be better if the digital archive medium contained as few steps between what the eyeball needs to see and the actual storage medium as possible, while still having the advantages of a digital medium's storage density.

That would then tend to reduce the side-effect exposure to technology advances.

Royal Dutch Petroleum Dock in E Indies

Royal Dutch Petroleum dock in the former East Indies (now Indonesia)

100 year archive

The problem is actually a very large one. Take the second-biggest company in the world in revenue terms; Shell, properly known as Royal Dutch Shell, which came into being 106 years ago. It has, in effect, a 100+ year archive consisting almost entirely (bar the last few years) of paper documents.

Let's imagine a 100-year tape archive. How would that work?

A little over every two years its tape format would advance a generation. LTO-1 was announced in 2000 with a 100GB capacity. Now, 13 years later, we have LTO-6 with 2.5TB capacity; that's 6 format generations over 13 years. Even delaying the format transitions for the archive to every five years (instead of every two years, as recent history shows us) would mean 20 tape format transitions in a century.

As the archive capacity mounted up the bulk of the tape archive's work would increasingly consist of migrating the contents of old tapes to new ones. It would present ever fewer of its resources in response to archive users' data access requests. The bulk of the cost of the archive would be spent internally, having it chase its own data migration tail, and its cost per user access would skyrocket.

We are not even thinking yet about how Word 2113 would be able to read a Word 2013 format document; that sort of problem would have to be dealt with possibly by a constant ongoing content format migration as well.

In a word, this is nonsense.

Unless we reach a stage where archival technology becomes as stable as paper and printing had been for decades, centuries even, then we cannot, unquestioning, keep all the data we digitally collect. The oldest, least-wanted data, will have to be let go, deleted. Unless there is a clear need to keep it then some kind of digital filtering mechanism will have to be used to scrap the least-wanted data and delete it.

The archive will have to be trawled by digital spider-bots; data killers, looking for useless data and destroying it to make space for wanted data.

Somebody could make a business out of taking this old data and storing it in a kind of digital deep-freeze for potential re-activation. Maybe this could be on the Moon, in a nuclear-powered mega-flash-vault, with plenty of space to expand; there'll always be another crater ... but this is science fiction.

The real moral of this tale is that virtually no data is needed for ever. Big data bigots' mad claims notwithstanding, digital archives will have to be regularly cleared. Physical space runs out; digital space runs out; formats change; applications change; and preserving access to older and older data will become crushingly expensive.

Technological change might come up with a solution to this problem, but it's a problem created by that very process of technological change. Beware what you wish for. ®

5 ways to reduce advertising network latency

Whitepapers

5 ways to reduce advertising network latency
Implementing the tactics laid out in this whitepaper can help reduce your overall advertising network latency.
Supercharge your infrastructure
Fusion­‐io has developed a shared storage solution that provides new performance management capabilities required to maximize flash utilization.
Avere FXT with FlashMove and FlashMirror
This ESG Lab validation report documents hands-on testing of the Avere FXT Series Edge Filer with the AOS 3.0 operating environment.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Email delivery: 4 steps to get more email to the inbox
This whitepaper lists some steps and information that will give you the best opportunity to achieve an amazing sender reputation.

More from The Register

next story
Dedupe-dedupe, dedupe-dedupe-dedupe: Flashy clients crowd around Permabit diamond
3 of the top six flash vendors are casing the OEM dedupe tech, claims analyst
Disk-pushers, get reel: Even GOOGLE relies on tape
Prepare to be beaten by your old, cheap rival
Dragons' Den star's biz Outsourcery sends yet more millions up in smoke
Telly moneybags went into the cloud and still nobody's making any profit
Hong Kong's data centres stay high and dry amid Typhoon Usagi
180 km/h winds kill 25 in China, but the data centres keep humming
Microsoft lures punters to hybrid storage cloud with free storage arrays
Spend on Azure, get StorSimple box at the low, low price of $0
WD unveils new MyBook line: External drives now bigger... and CHEAP
Less than £0.04/GB, but it loses the Thunderbolt speed
VMware vSAN test pilots: Don't panic but there's a chance of DATA LOSS
AHCI SATA controller won't play nice with Virtzilla's robo-storage beta
prev story