Feeds

British Library tracks rise and fall of file formats

Analysis of 2.5 billion online files suggests software obsolescence slowing

  • alert
  • submit to reddit

Beginner's guide to SSL certificates

File formats and the software capable of reading them are living longer than previously thought, according to a British Library and UK Web Archive study.

Formats over Time: Exploring UK Web History (PDF, slides as PDF) considers 2.5 billion files author Andrew N Jackson retrieved with the help of the Internet Archive and the Joint Information Systems Committee (JISC). All the files come from “the UK web domain” and come from the period between 1996 and 2010.

Jackson used Apache Tika and PRONOM's DROID tool to inspect the files and determine the format they use. Central to the research was Jeff Rothenberg's 1997 prediction that “Digital Information Lasts Forever – Or Five Years, Whichever Comes First.” Jackson is also keen on a rebuttal from David Rosenthal, who he quotes as saying: “When challenged, proponents of [format migration strategies] have failed to identify even one format in wide use when Rothenberg [made that assertion] that has gone obsolete in the intervening decade and a half.”

Jackson's take is that file formats seem to last rather longer than five years even if they don't survive forever.

“While there were just two active versions of HTML in 1996 (2.0 and 3.2), all six were still active in 2010,” he writes. “Similarly, there were three active versions of PDF in 1996 (1.0-1.2) and eleven different versions in 2010 (1.0-1.7, 1.7 Extension Level 3, A-1a and A-1b, with 1.2-1.6 dominant). In general, it appears that format versions, like formats, are quick to arise but slow to fade away.

HTML versions found online in the UK between 1996 and 2010

Jackson attributes formats' longevity to the Network Effect, but also writes that he is uncomfortable drawing firm conclusions about software obsolescence given the sample is UK-centric and the tools used to analyse data identify files imperfectly.

He nonetheless concludes:

Our initial analysis supports Rosenthal's position; that most formats last much longer than five years, that network effects to appear to stabilise formats, and that new formats appear at a modest, manageable rate.

But he also warns that “a number of formats and versions that are fading from use, and these should be studied closely in order to understand the process of obsolescence.” ®

Internet Security Threat Report 2014

More from The Register

next story
Download alert: Nearly ALL top 100 Android, iOS paid apps hacked
Attack of the Clones? Yeah, but much, much scarier – report
You stupid BRICK! PCs running Avast AV can't handle Windows fixes
Fix issued, fingers pointed, forums in flames
NSA SOURCE CODE LEAK: Information slurp tools to appear online
Now you can run your own intelligence agency
Microsoft: Your Linux Docker containers are now OURS to command
New tool lets admins wrangle Linux apps from Windows
Facebook, working on Facebook at Work, works on Facebook. At Work
You don't want your cat or drunk pics at the office
Soz, web devs: Google snatches its Wallet off the table
Killing off web service in 3 months... but app-happy bonkers are fine
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Driving business with continuous operational intelligence
Introducing an innovative approach offered by ExtraHop for producing continuous operational intelligence.
Why CIOs should rethink endpoint data protection in the age of mobility
Assessing trends in data protection, specifically with respect to mobile devices, BYOD, and remote employees.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.