Archive.org suffers Fahrenheit 911 memory loss
Online fire extinguished
Opinion You don't often think about libraries in terms of strength. Few mayors tout the large sack of the local book depository or put it up against a massive skyscraper during PR stunts. Libraries are pretty passive creatures that receive some credit for the quantity of volumes they hold but not much credit these days for being powerful entities.
That is until you run across something like Archive.org. For where the Library of Congress exudes strength, Archive.org piddles weakness. The site is really a reminder of how not far the Internet has come and how strong some old traditions really are.
The supposed Internet archiving site is not a passive entity at all. It doesn't simply collect more and more data for the use of researchers as it claims. Instead, Archive.org actively engages in odd publicity stunts and actively pulls down information. What could be weaker than a media-hungry library with disappearing material?
On Wednesday, Archive.org put up a copy of Michael Moore's Fahrenheit 911 documentary for download. The site was apparently responding to an interview in which Moore said he didn't mind people downloading the movie as long as the sites offering it didn't profit from the action. So Archive.org flexed its freedom of information/culture muscle and boldly offered the movie in a variety of formats.
An intern here in The Register's Chicago office was ordered to test the download out. It worked. Our intern - Streaming Sally - used the FreeCache technology Archive.org recommended, and the download took about 3 hours. The movie came in a bit choppy but certainly watchable - so Sally said.
But just hours after putting up the movie, Archive.org pulled it down. In the movie's place was a note that read, "This is under copyright, and archive.org needs to pull it before any damage happens."
Think of this as a child fondling a can of spray paint but then stepping away from the school wall before "any damage happens." Or a seven-year-old contemplating a ten-yard run with scissors in hand and then putting the weapon down before "any damage happens." How ever you think about it. It's clear that there are children running Archive.org - the kind that play copyright gags while doing shots of Pepsi late into the night.
We know this because Archive.org has long had a childlike relationship with information. Our first indication of this happened back in 2002. At that time, Intel has accidentally released the code-name of an upcoming project - Nehalem. One of Intel's engineers discussed the project in an interview conducted by Intel itself and posted on Intel's web site. Some schmuck of a reporter found the code-name and did a story on it.
Intel's PR machine then went into action. First, it removed the interview from its site. Then, it called Google to make sure no copies of the interviewed lurked in Google's cache. Then, it called Archive.org to remove any trace of the interview at all.
Libraries exist to preserve society’s cultural artifacts and to provide access to them. If libraries are to continue to foster education and scholarship in this era of digital technology, it’s essential for them to extend those functions into the digital world.
Open and free access to literature and other writings has long been considered essential to education and to the maintenance of an open society. Public and philanthropic enterprises have supported it through the ages.
The Internet Archive is opening its collections to researchers, historians, and scholars. The Archive has no vested interest in the discoveries of the users of its collections, nor is it a grant-making organization.
This is pretty big talk for a toddler of a library. The Intel incident is by no means the first or only time Archive.org has pulled information at a vendor or user's request. Exactly how a vendor that of its own volition posts information in a public forum can then go back and claim it's proprietary is beyond us and how a "library" can obey this request defies comprehension. We're not talking about Windows source code here, friends.
Beyond any of this, Archive.org does a poor job of recording sites - you know, the ones it doesn't erase. Response times are horrible and more often than not only a few old examples of sites exist.
Without question, an Internet library raises tricky questions. How, for example, can you archive a libelous story when both the publisher and subject agree the original must be pulled? Not the best of situations. Still, we're pretty sure Archive.org is not the caliber of organization needed to clear up these serious matters.
The upshot of all this is that we desperately need a "real" Internet archive - one that doesn't pretend to be brave for a few hours as part of some information stunt and one that doesn't delete the very records it's supposed to keep. ®