Feeds

The Web as historical record

Rosetta Stone or writing in the sand?

  • alert
  • submit to reddit

5 things you didn’t know about cloud backup

I spent some of last weekend researching for an article about Service-Oriented Architecture (SOA). The term is used very widely in the industry today and when it is used, the writer assumes that the reader understands the term, and understands it in the same way as the writer. As we all know all terms in our industry have a Humpty Dumptyness about them ("When I use a word," Humpty Dumpty said, in rather a scornful tone, "it means just what I choose it to mean - neither more nor less.")

Therefore, I decided to use the Internet to find a definition of SOA. This is when I discovered the power, weakness and fragility of the Internet as a historical record. Searching on Service Oriented Architecture - or SOA - obviously dredges up plenty of results, most of which does not define the term. Add the word "definition" to the search and you get a more usable list.

I then wanted to find the original definition, and discovered one of the weaknesses of most search engines and the Web itself: you cannot sort by date. Most search engines do not give you that facility at all. ASK does allow it but the results are of limited value as many of the entries on the Web do not have any usable date attached to them. This is a major problem for historians, like myself today, but will be a greater frustration to future historians looking back over a hundred years of the Web and not knowing when something was written.

Not having the date on scraps of Web jotting is a shame, but understandable, but I was surprised to find significant documents, such as university dissertations, did not always have a date either in the viewable data or any metadata. Probably the most annoying of these for my purposes is the lack of a date on vendor sales blurb. It is extremely frustrating when you cannot tell if it has been superseded or is just old. This is a small plea to everyone, who develops entries for the Web, to put a date in the viewable text.

After some random surfing and investigation I found the original use of the term SOA, way back in 1996. As far as I could tell it was not used much until the introduction of .NET in 2001, and did not become popular until an explosion from early 2003 onwards. 1996 is eight years ago and in Web terms that is a long time. Much of what was written then is lost forever: either it has been deleted, because it has been superseded; or archived as irrelevant; or the web site that hosted it has gone, a casualty of the dotcom bust. So I was quite lucky that it was written by a surviving analyst firm.

This clearly demonstrates the fragility of the Web as a historic record. Up to now we have had multiple copies of a document physically dispersed, so the chance of some of them surviving is high. I always remembered standing in front of the Dead Sea Scrolls. Not only could I view them, but I could also decipher the characters and, to a limited degree, read and understand the text that was written 2000 years ago. Compare that to a vanished website where there is no evidence of what was written - not even an unreadable archive.

The archivists in the British Library, and similar institutions, are beginning to worry about this problem. If they do not succeed in finding a solution to the Web's ephemeral nature, we may become the generation that created more text than any previously, but left none to future generations.

© IT-Analysis.com

Related stories

Law seeks deposit of web sites with UK libraries
Public Records Office to preserve digital documents
Google restores Usenet archive, plans posting

Secure remote control for conventional and virtual desktops

More from The Register

next story
6 Obvious Reasons Why Facebook Will Ban This Article (Thank God)
Clampdown on clickbait ... and El Reg is OK with this
No, thank you. I will not code for the Caliphate
Some assignments, even the Bongster decline must
Kaspersky backpedals on 'done nothing wrong, nothing to fear' blather
Founder (and internet passport fan) now says privacy is precious
TROLL SLAYER Google grabs $1.3 MEEELLION in patent counter-suit
Chocolate Factory hits back at firm for suing customers
Mozilla's 'Tiles' ads debut in new Firefox nightlies
You can try turning them off and on again
Sit tight, fanbois. Apple's '$400' wearable release slips into early 2015
Sources: time to put in plenty of clock-watching for' iWatch
Facebook to let stalkers unearth buried posts with mobe search
Prepare to HAUNT your pal's back catalogue
prev story

Whitepapers

Endpoint data privacy in the cloud is easier than you think
Innovations in encryption and storage resolve issues of data privacy and key requirements for companies to look for in a solution.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Advanced data protection for your virtualized environments
Find a natural fit for optimizing protection for the often resource-constrained data protection process found in virtual environments.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.