Use of web archive was not hacking, says US court
Though bypassing its protection measures could be
The use of web archive The Wayback Machine did not constitute hacking in the case of a law firm which used the web archive to see pages which owners did not want it to see, a US court has ruled.
The deliberate bypassing or evasion of the archive's protection measures could still be deemed hacking, though, said Judge Robert Kelly, the judge in the Eastern District of Pennsylvania. In this case, protection mechanisms put in place by the page owners had failed.
In a dispute over intellectual property, patient advocacy group Healthcare Advocates sued Health Advocate Inc. The company being sued was represented by law firm Harding Earley Follmer & Frailey.
Law firm Harding viewed a number of Health Advocates' web pages on The Wayback Machine on 9 July. On 7 or 8 July that company's president, Kevin Flynn, had put a robots.txt file on its pages which should have barred the Wayback Machine from accessing its pages. But lawyers at Harding were able to view the pages because of a malfunction at The Wayback Machine.
"Plaintiffs' expert, Gideon Lenkey, has testified that the Harding firm was able to view archived screenshots of Healthcare Advocates' website because the servers at Internet Archive were not respecting robots.txt files," said Kelly's ruling. "Mr Lenkey also testified that the Harding firm did not engage in 'hacking'."
Circumventing an electronic protective measure breaks federal law in the US, and Healthcare Advocates brought a law suit against Harding.
Kelly ruled, though, that because Healthcare Advocate's protections malfunctioned, there was no protection to break or bypass.
"When the Harding firm accessed Internet Archive’s database on 9 July, 2003, and 14 July, 2003, it was as though the protective measure was not present," he wrote. "Charles Riddle and Kimber Titus simply made requests through the Wayback Machine that were filled. They received the images they requested only because the servers processing the requests disregarded the robots.txt file present on Healthcare Advocates' website.
"As far as the Harding firm knew, no protective measures were in place in regard to the archived screenshots they were able to view. They could not avoid or bypass any protective measure, because nothing stood in the way of them viewing these screenshots. The Harding firm did not use alter code language to render the robots.txt file void like the defendant in Corley did with the encryption," said Kelly.
"They did not 'pick the lock' and avoid or bypass the protective measure, because there was no lock to pick. The facts show that the Harding firm received the archived images solely because of a malfunction in the servers processing the requests."
Healthcare Advocates also claimed that Harding had breached copyright law in their viewing and use of the web pages, but Kelly ruled that the law firm's activity constituted fair use of the material.
The company also claimed that the activity broke the Computer Fraud and Abuse Act, a claim Kelly also rejected.
Kelly granted summary judgment in Harding's favour. He said in his ruling: "It would be an absurd result if an attorney defending a client against charges of trademark and copyright infringement was not allowed to view and copy publicly available material, especially material that his client was alleged to have infringed."
The ruling said that in this case the placing of a robots.txt file, which is most often used to give instructions to search engine "robots" on what pages of a website should not be indexed, constitutes a "technological measure" within the DMCA.
That ruling will have limited relevance in other cases, though. No court in the US has yet said that such a file constitutes a technological measure in every case, and Kelly warned against interpreting his specific ruling in that way.
"The only way to gain access would be for Healthcare Advocates to remove the robots.txt file from its website, and only the website owner can remove the robots.txt file. Thus, in this situation, the robots.txt file qualifies as a technological measure effectively controlling access to the archived copyrighted images of Healthcare Advocates," he said. "This finding should not be interpreted as a finding that a robots.txt file universally qualifies as a technological measure that controls access to copyrighted works under the DMCA."
Copyright © 2007, OUT-LAW.com
OUT-LAW.COM is part of international law firm Pinsent Masons.
A barn door in the middle of a field?
It seems that the "experts" make no difference between robots.txt and .htaccess files!! While the latter indeed serves a functionality similar to lock the former is just a notice.
A real life example to IT illiterate laweyrs: putting in a public building a sign forbidding taking pictures (what actually web archives do) still does not put any technical obstacles preventing the public to use their bare eyes. What on Earth can prevent a determined councel to give an order to few interns with browsers ("click on every hyperlink if it keeps you on that web site"), and to dig through results.
Copyrighting a book does forbid to make illegal copies of it but AFAIK does not prevent reading it (if one is literate enough)!
robots.txt has no place in law, and there is no compulsion to follow its suggestions.
If one places a curtain over one's bedroom window to keep passing law enforcement from peeking in and arresting the occupant while in the act of some sort of illegal fetish, and the wind blows the curtain aside long enough for said officers to get a look, then it's perfectly legal (in the US) for the officers to then proceed to act under the "plain view" doctrine. The simple fact of hanging the curtain does not protect the criminal. While the officers are forbidden from moving the curtain, themselves, if an "accident" disables the curtain's protections, then there is technically no curtain ... and no protection.
Regardless of whether the defendants had included robots.txt (the curtain), IA grabbed their site and offered it to the plaintiffs without extraordinary effort.
If anything, this should have been about the rights of IA to permanently store copyrighted content without the permission of its copyright owners ... which brings us to Google's cache ...
So, they put the robots.txt file on "the 7th or 8th". And the defendant viewed the archived copy on the 9th. Who knows when they were archived? It could have been a month previous.
As many others have pointed out, there is no obligation to observe an robots.txt file.