Feeds

Archive.org suffers Fahrenheit 911 memory loss

Online fire extinguished

  • alert
  • submit to reddit

The next step in data security

Opinion You don't often think about libraries in terms of strength. Few mayors tout the large sack of the local book depository or put it up against a massive skyscraper during PR stunts. Libraries are pretty passive creatures that receive some credit for the quantity of volumes they hold but not much credit these days for being powerful entities.

That is until you run across something like Archive.org. For where the Library of Congress exudes strength, Archive.org piddles weakness. The site is really a reminder of how not far the Internet has come and how strong some old traditions really are.

The supposed Internet archiving site is not a passive entity at all. It doesn't simply collect more and more data for the use of researchers as it claims. Instead, Archive.org actively engages in odd publicity stunts and actively pulls down information. What could be weaker than a media-hungry library with disappearing material?

On Wednesday, Archive.org put up a copy of Michael Moore's Fahrenheit 911 documentary for download. The site was apparently responding to an interview in which Moore said he didn't mind people downloading the movie as long as the sites offering it didn't profit from the action. So Archive.org flexed its freedom of information/culture muscle and boldly offered the movie in a variety of formats.

An intern here in The Register's Chicago office was ordered to test the download out. It worked. Our intern - Streaming Sally - used the FreeCache technology Archive.org recommended, and the download took about 3 hours. The movie came in a bit choppy but certainly watchable - so Sally said.

But just hours after putting up the movie, Archive.org pulled it down. In the movie's place was a note that read, "This is under copyright, and archive.org needs to pull it before any damage happens."

Think of this as a child fondling a can of spray paint but then stepping away from the school wall before "any damage happens." Or a seven-year-old contemplating a ten-yard run with scissors in hand and then putting the weapon down before "any damage happens." How ever you think about it. It's clear that there are children running Archive.org - the kind that play copyright gags while doing shots of Pepsi late into the night.

We know this because Archive.org has long had a childlike relationship with information. Our first indication of this happened back in 2002. At that time, Intel has accidentally released the code-name of an upcoming project - Nehalem. One of Intel's engineers discussed the project in an interview conducted by Intel itself and posted on Intel's web site. Some schmuck of a reporter found the code-name and did a story on it.

Intel's PR machine then went into action. First, it removed the interview from its site. Then, it called Google to make sure no copies of the interviewed lurked in Google's cache. Then, it called Archive.org to remove any trace of the interview at all.

Libraries exist to preserve society’s cultural artifacts and to provide access to them. If libraries are to continue to foster education and scholarship in this era of digital technology, it’s essential for them to extend those functions into the digital world.

Open and free access to literature and other writings has long been considered essential to education and to the maintenance of an open society. Public and philanthropic enterprises have supported it through the ages.

The Internet Archive is opening its collections to researchers, historians, and scholars. The Archive has no vested interest in the discoveries of the users of its collections, nor is it a grant-making organization.

This is pretty big talk for a toddler of a library. The Intel incident is by no means the first or only time Archive.org has pulled information at a vendor or user's request. Exactly how a vendor that of its own volition posts information in a public forum can then go back and claim it's proprietary is beyond us and how a "library" can obey this request defies comprehension. We're not talking about Windows source code here, friends.

Beyond any of this, Archive.org does a poor job of recording sites - you know, the ones it doesn't erase. Response times are horrible and more often than not only a few old examples of sites exist.

Without question, an Internet library raises tricky questions. How, for example, can you archive a libelous story when both the publisher and subject agree the original must be pulled? Not the best of situations. Still, we're pretty sure Archive.org is not the caliber of organization needed to clear up these serious matters.

The upshot of all this is that we desperately need a "real" Internet archive - one that doesn't pretend to be brave for a few hours as part of some information stunt and one that doesn't delete the very records it's supposed to keep. ®

Related stories

Britain's Web presence to be saved
Vivendi spinoff takes MP3.com archive private
The Persistence of Hoax
Defacement contest likely to target Web hosting firms

Choosing a cloud hosting partner with confidence

More from The Register

next story
Phones 4u slips into administration after EE cuts ties with Brit mobe retailer
More than 5,500 jobs could be axed if rescue mission fails
JINGS! Microsoft Bing called Scots indyref RIGHT!
Redmond sporran metrics get one in the ten ring
Driving with an Apple Watch could land you with a £100 FINE
Bad news for tech-addicted fanbois behind the wheel
Murdoch to Europe: Inflict MORE PAIN on Google, please
'Platform for piracy' must be punished, or it'll kill us in FIVE YEARS
Phones 4u website DIES as wounded mobe retailer struggles to stay above water
Founder blames 'ruthless network partners' for implosion
Found inside ISIS terror chap's laptop: CELINE DION tunes
REPORT: Stash of terrorist material found in Syria Dell box
Sony says year's losses will be FOUR TIMES DEEPER than thought
Losses of more than $2 BILLION loom over troubled Japanese corp
Bono: Apple will sort out monetising music where the labels failed
Remastered so hard it would be difficult or impossible to master it again
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.