The Register® — Biting the hand that feeds IT

Feeds

Guidelines needed to protect anonymity

It's an information free for all

Regcast training : Hyper-V 3.0, VM high availability and disaster recovery

In early August, officials at America Online released information about searches being conducted by AOL members and users of the AOL search tool. This historical data was released onto the internet by several AOL officials to demonstrate how useful such data could be for tracking patterns, uses and interest of AOL members.

The data was anonymised, with members being assigned random ID numbers instead of userid's or names, and was only online for a few days.

The New York Times demonstrated, however, how easy it was to take that anonymised data, and with a few keystrokes, determine the identity of the searcher, and their personal interests, likes and dislikes – indeed to create a profile of users from this anonymized data.

The persons responsible for the "data breach" at AOL were fired – more for a public relations problem than anything else. The case demonstrates how any database, once collected, can be misused, and the significant lack of legal protection for similar information.

Personally identifiable

Privacy laws, both in the United States and abroad generally protect the collection, dissemination and use of "personally identifiable information" of various types and classes. This includes, for example things such as identifiable banking or financial information, personal health information, credit card or payment card information, and personal communications (for example, contents of emails).

Aggregated information on the other hand is not generally afforded the same level of protection. Thus, information about trends, overall internet use, health care utilisation, overall buying patterns, and the like is generally treated as the property of the institution that creates, collects, stores or collates this information.

If it is easy to convert the aggregate information into identifiable information, it may be afforded some level of protection, or may still be treated as identifiable information.

For many companies, there is a blurring of the lines between personal information (that is information about ME) and aggregate information. So, for example, Google collects information about every single thing I look for – every search request, the contents of everything delivered, what I click on, where I go from there.

It keeps both the aggregate information (how many people buy stuff off those ads on the side) and the personal information (tell me everything YOU have looked at this month). The aggregated information is analysed, processed, sold, and used by Google to increase advertising revenue, do load balancing – all kinds of things.

The same is true of ISPs and ecommerce sites. They collect and analyse massive amounts of information about even the most intimate details about you – who you chat with, who you email, what you read, what you post, and potentially even the source, destination and length of your VoIP calls.

Unless they have agreed not to in a Terms of Service agreement, there is virtually nothing preventing them from using this data, in an aggregated and "anonymous" fashion, and very little preventing them from using it otherwise.

Governments – particularly the US government – have taken advantage of this fact to attempt to obtain massive amounts of information. For example, during the course of litigation involving the government's efforts to prohibit materials that are "harmful to minors" the US government subpoenaed from the largest search companies (Yahoo!, MSN, and Google) massive amounts of such aggregate information.

When they got the cooperation of various telephone companies to turn over massive amounts of telephone calling records (non-content information) they apparently argued that such aggregated information (in that case not anonymised) was not entitled to legal protection.

The problem is, as The New York Times learned, it is relatively easy to convert this anonymised information into pointers to learn its source.

Agentless Backup is Not a Myth

More from The Register

 breaking news
Number of cops abusing Police National Computer access on the rise
Only a telegram from the Queen can get you off it
 breaking news
NSA PRISM snoop-gate: Won't someone think of the children, wails Apple
10,000 things probed, mostly about missing kids, Alzheimer patients, we're told
Flash flaw potentially makes every webcam or laptop a PEEPHOLE
But it's a Google problem - Chrome only, insists Adobe
 breaking news
NSA PRISM-gate: Relax, GCHQ spooks 'keep us safe', says Cameron
Whatever they are up to, it's all above board, we're told
 breaking news
Yahoo! joins! rivals! in! PRISM! data! request! admission!
Keep calm and carry on using American tech firms, folks
PRISM snitch claims NSA hacked Chinese targets since 2009
Snowden suddenly looks safer in Hong Kong after revelations
 breaking news
US chief spook: Look, we only want to spy on 6.66 BEELLLION of you
Americans assured they are not in the NSA's sights
Speech-to-text drives motorists to distraction
Will talking to you mean I crash into that car up ahead, Siri?
DHS warns of vulns in hospital medical equipment
Has your doctor's anasthesia machine been hacked?
 breaking news
'BadNews is malware' says outfit that found it
Google says code harmless but Lookout says code base is evolving