Feeds

The Web as historical record

Rosetta Stone or writing in the sand?

  • alert
  • submit to reddit

Security for virtualized datacentres

I spent some of last weekend researching for an article about Service-Oriented Architecture (SOA). The term is used very widely in the industry today and when it is used, the writer assumes that the reader understands the term, and understands it in the same way as the writer. As we all know all terms in our industry have a Humpty Dumptyness about them ("When I use a word," Humpty Dumpty said, in rather a scornful tone, "it means just what I choose it to mean - neither more nor less.")

Therefore, I decided to use the Internet to find a definition of SOA. This is when I discovered the power, weakness and fragility of the Internet as a historical record. Searching on Service Oriented Architecture - or SOA - obviously dredges up plenty of results, most of which does not define the term. Add the word "definition" to the search and you get a more usable list.

I then wanted to find the original definition, and discovered one of the weaknesses of most search engines and the Web itself: you cannot sort by date. Most search engines do not give you that facility at all. ASK does allow it but the results are of limited value as many of the entries on the Web do not have any usable date attached to them. This is a major problem for historians, like myself today, but will be a greater frustration to future historians looking back over a hundred years of the Web and not knowing when something was written.

Not having the date on scraps of Web jotting is a shame, but understandable, but I was surprised to find significant documents, such as university dissertations, did not always have a date either in the viewable data or any metadata. Probably the most annoying of these for my purposes is the lack of a date on vendor sales blurb. It is extremely frustrating when you cannot tell if it has been superseded or is just old. This is a small plea to everyone, who develops entries for the Web, to put a date in the viewable text.

After some random surfing and investigation I found the original use of the term SOA, way back in 1996. As far as I could tell it was not used much until the introduction of .NET in 2001, and did not become popular until an explosion from early 2003 onwards. 1996 is eight years ago and in Web terms that is a long time. Much of what was written then is lost forever: either it has been deleted, because it has been superseded; or archived as irrelevant; or the web site that hosted it has gone, a casualty of the dotcom bust. So I was quite lucky that it was written by a surviving analyst firm.

This clearly demonstrates the fragility of the Web as a historic record. Up to now we have had multiple copies of a document physically dispersed, so the chance of some of them surviving is high. I always remembered standing in front of the Dead Sea Scrolls. Not only could I view them, but I could also decipher the characters and, to a limited degree, read and understand the text that was written 2000 years ago. Compare that to a vanished website where there is no evidence of what was written - not even an unreadable archive.

The archivists in the British Library, and similar institutions, are beginning to worry about this problem. If they do not succeed in finding a solution to the Web's ephemeral nature, we may become the generation that created more text than any previously, but left none to future generations.

© IT-Analysis.com

Related stories

Law seeks deposit of web sites with UK libraries
Public Records Office to preserve digital documents
Google restores Usenet archive, plans posting

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
Facebook pays INFINITELY MORE UK corp tax than in 2012
Thanks for the £3k, Zuck. Doh! you're IN CREDIT. Guess not
DOUBLE BONK: Testy fanbois catch Apple Pay picking pockets
Users wail as tapcash transactions are duplicated
Happiness economics is bollocks. Oh, UK.gov just adopted it? Er ...
Opportunity doesn't knock; it costs us instead
Google Glassholes are UNDATEABLE – HP exec
You need an emotional connection, says touchy-feely MD... We can do that
YARR! Pirates walk the plank: DMCA magnets sink in Google results
Spaffing copyrighted stuff over the web? No search ranking for you
prev story

Whitepapers

Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
New hybrid storage solutions
Tackling data challenges through emerging hybrid storage solutions that enable optimum database performance whilst managing costs and increasingly large data stores.