Original URL: https://www.theregister.com/2007/08/08/litigation_data_retention/

Websites could be required to retain visitor info

Even if it would break their privacy policies

By Mark Rasch

Posted in Legal, 8th August 2007 10:26 GMT

A series of legal events means that companies that have no business reason to retain documents or records may be compelled to create and retain such records just so they can become available for discovery.

Companies routinely create, maintain and store electronic records. Some records are consciously created – like memoranda, letters, spreadsheets, and even e-mails and chat or instant message communications. Other records are created inadvertently, like meta data, log records, IP history records and the like. Some information is useful to the company, and it wants to retain it, and other information is of little use, merely takes up space, creates potential liability, and represents an unwarranted threat for attack or violation of privacy. The problem for most companies in developing or maintaining a document retention/destruction policy is identifying the documents and records it wants to keep and effectively purging the ones it doesn't want. Some recent legal events have made the problem of document retention and destruction even more complicated.

A recent case involving file sharing site TorrentSpy illustrates the point. Torrentspy's privacy policy is clear and concise. It states:

TorrentSpy.com is committed to protecting your privacy. TorrentSpy.com does not sell, trade or rent your personal information to other companies. TorrentSpy.com will not collect any personal information about you except when you specifically and knowingly provide such information.

Pretty straightforward, and not too dissimilar from thousands of other website privacy policies. Such privacy policies are considered to be legally binding contracts, and the United States Federal Trade Commission, and Privacy Commissioners in Europe, Asia and other places routinely hold companies to their promises – under threat of civil and criminal prosecution or fines.

The first problem with this privacy policy – like most privacy policies – is that it's not true. Whenever you visit a website, you "involuntarily" provide "personal" information to the site operator – things like the type of browser you are using, your IP address, the physical location of that IP address, your configuration settings, and what website you may have been referred from or to, among other things.

If you are engaging in malicious, unlawful, or otherwise "actionable" conduct, the website operator may certainly attempt to use this information to identify you and discern what you are doing – the essence of "personal information". Indeed, much of what we do as forensic investigators is to use this kind of information to find people.

While net-savvy individuals know that this information is being collected and utilized, the vast majority of individuals would not say that they "specifically and knowingly" provided that information to the website. This information frequently has economic value to the website operator as well. Knowing what site referred the user may result in payments from or to the referring site under "pay per click" agreements.

Aggregated personal information is useful for advertisers, and valuable to those who collect it. So its not accurate to say that your website ONLY collects information that you voluntarily give them. A better approach to a privacy policy would include language similar to that used by, for example, Google, which specifically states:

Log information - When you use Google services, our servers automatically record information that your browser sends whenever you visit a website. These server logs may include information such as your web request, Internet Protocol address, browser type, browser language, the date and time of your request and one or more cookies that may uniquely identify your browser.

Some of this information is collected automatically as a consequence of delivering web content to the requestor. You would think that, in pursuance of its privacy policies, a company could choose not to collect or more accurately not to store or retain such information – after all, that's what they promised their customers, no?

There has long been an adage in the law that essentially states that "if it exists, it is discoverable". Now, as a result of a lawsuit involving TorrentSpy, the United States District Court for the Central District of California has essentially extended this logic to state that, "if it doesn't exist, we will require that it be created and stored so that it can become discoverable".

The case, Columbia Pictures v. Bunnell (pdf) arose when the movie studios wanted to find out the identity of people using TorrentSpy to download copyrighted works – personal information about TorrentSpy's users. TorrentSpy promised its users that it wouldn't collect such information, and had no legal obligation to do so. As the court noted:

In general, when a user clicks on a link to a page or a file on a website, the website's web server program receives from the user a request for the page or the file. The request includes the IP address of the user's computer, and the name of the requested page or file, among other things. Such information is copied into and stored in RAM.).

RAM is a form of temporary storage that every computer uses to process data. Every user request for a page or file is stored by the web server program in RAM in this fashion. The web server interprets and processes that data, while it is stored in RAM, in order to respond to user requests.

The web server then satisfies the request by sending the requested file to the user. If the website's logging function is enabled, the web server copies the request into a log file, as well as the fact that the requested file was delivered. If the logging function is not enabled, the request is not retained.

In keeping with its stated contractual privacy policy, TorrentSpy did not enable the logging function, did not capture the information in RAM (or more accurately did not store it) and therefore alleged that it could not produce it in litigation.

After TorrentSpy was sued, the question arose about whether or not the information NOT regularly collected by TorrentSpy – the information in RAM – constituted Electronically Stored Information subject to both discovery and what is called a litigation hold. Under a litigation hold, once you become aware that information you may posess is relevant to ongoing or threatened litigation, you must suspend your document destruction policy and stop deleting that relevant information.

Electronically Stored Information is defined under the Federal Rules of Civil Procedure as "information that is fixed in a tangible form and to information that is stored in a medium from which it can be retrieved and examined".

The court rejected TorrentSpy's claims that the information in RAM was never "stored" since logging was never enabled, and that requiring TorrentSpy to enable logging amounted to requiring it to "create"; records that didn' exist. Certainly, the information in RAM was – for a brief time – stored at least transitorily, just as streaming media (like a VOIP call, or videoconference) is stored on your computer for the brief interval it is being displayed.

Thus, the information is (1) electronic; (2) stored; and (3) relevant. The consequence of this is that not only is the information subject to discovery under the TorrentSpy precedent, but the entity must then suspend its document deletion policy, which in the case of TorrentSpy was to delete information in RAM that it never stored.

The potential consequences of this ruling (which is currently on appeal) are frightening. Whenever a company or other entity learns that information that it doesn't collect (or more accurately collects but doesn't store more than briefly) might be relevant to some litigation, it has to undertake affirmative efforts to start collecting and storing this information, in violation of its express privacy policy (creating potential FTC or privacy commission liability) for no purpose other than to create liability.

Thus, when you learn of the possibility of litigation, you may have to START storing streaming media, contents of VOIP calls, contents of videoconferences, webinars, chats, instant messages, logs, scans, or other electronic records that you never stored before.

The court also noted that companies "cannot insulate themselves from complying with their legal obligations to preserve and produce relevant information within their possession, custody or control and responsive to proper discovery requests, by reliance on a privacy policy -- the terms of which are entirely within [their] control". Thus, even if you SAY that the information wont be collected (stored) and you have no reason to collect (store) it, a court could mandate that you do so at your own expense.

ISPs, Portals and Telcos

A similar issue arises with respect to information held by Internet Service Providers (ISPs), web portals like Google, Yahoo and Microsoft, and telephone companies. These entities routinely collect massive volumes of data about their clients and customers – including things like search requests and results, IP history information, logon information, services utilized, date, time, source, destination, and duration of calls.

VoIP providers or ISPs may also store the contents of voice or video communications temporarily as a consequence of transmission of the packet network. Remember the adage – if it exists, it is discoverable.

Now there are legitimate reasons for companies to want to collect, store and use at least some of this information. There are business models based on the analysis of this information. Load balancing, billing, and even selling this information are all legitimate uses (provided that the consumer has some awareness that this is going on.) What is important is that the provider – the telco, the ISP or the portal – decides what information is going to be collected, how it is going to be used, whether it is going to be stored (and for how long) and then communicates these facts to the consumer.

There has long been a debate over how long these entities will retain the records, and what they will do with them. The Department of Justice and the FBI has long been seeking authority to require ISPs, Telcos and others to retain log data and other data at their own expense, "just in case" the information might later become relevant to some investigation.

European countries have also been engaged in the same dialogue. If the records are retained (even when there is no business reason for keeping them) the records become discoverable – by grand jury subpoena, FISA or Title III wiretap orders, National Security Letters, or by voluntary cooperation by the ISP or subject. They also become available in any other litigation – copyright infringement, defamation, or routine divorce cases.

Since the ISP or portal would generally be a third party with respect to the underlying litigation, they might not be mandated to create or permanently store log or other transitory information, but that is not entirely clear. What is clear is that the government wants companies that create electronic data to keep it "just in case".

Indeed, ABC News reported that the FBI, in a Department of Defense authorization bill requested a grant of $5m to pay telephone companies to store information such as call records, and to develop a method of retrieving such information at the request of law enforcement. As reported by ABC News:

The $5m project would apparently pay private firms to store at least two years' worth of telephone and Internet activity by millions of Americans, few of whom would ever be considered a suspect in any terrorism, intelligence or criminal matter. The project would involve "the development of data storage and retrieval systems...for at least two years' worth of network calling records," according to an unclassified budget document posted to the FBI's Web site.

So instead of warehousing the records themselves (and with no legal authority to subpoena ALL records), the government is essentially issuing a document preservation request to the telephone companies, requesting that the records be kept by the telcos for two years, and agreeing to pay all or some of the cost of doing so.

Effectively, this makes the telephone companies into the warehouses for the government and for anybody with a subpoena. Note that there is nothing wrong with the phone companies keeping these records for their own business purposes, but now they will be keeping them presumably just in case.

The issue is not unique to telephone companies. Financial services companies, credit card companies, ISPs, web portals, VoIP providers, social networking sites, chat and IM providers all could be either compelled to retain records, or paid off to retain them just in case - even when their own privacy policy expressly forbids it.

Web portals like Google, Yahoo! and Microsoft learned the lesson of the adage that if records exist they will be subpoenaed when, in the context of defending Congress' anti-smut statute, the government subpoenaed (in a civil lawsuit) massive volumes of data about how people used these portals, what they searched for, and what was ultimately delivered.

As a result of this, and of the document retention requests by law enforcement and regulators, all of the major portals have voluntarily agreed to anonymize their records after a period of time – Yahoo! for 13 months, Google and Microsoft for 18 to 24 months.

Ask.com went further, offering a service called AskEraser which it claims would allow for anonymous web surfing, and where "the company claims it will not retain the search histories of customers who opt in for the AskEraser".

Which brings us back to where we started. Just because you promise NOT to collect or retain records, doesn't mean that you won't be required to collect and maintain them. Even if you don't have technology readily available to capture data streaming through your network, if the information is stored there briefly, you may be required to capture it.

Sure, you can try anonymizing technologies, but these usually work by NOT LOGGING data, which as we learned with TorrentSpy doesn't always work. What we need is a commonsense approach to what really is a record that is stored by a company, as opposed to log data which COULD be stored by a company.

This article originally appeared in Security Focus.

Copyright © 2007, SecurityFocus