Original URL: https://www.theregister.com/2008/12/17/yahoo_anonymization_explained/

Yahoo! mocks Google Privacy Theatre

Less-nonsensical anonymization

By Cade Metz

Posted in Software, 17th December 2008 21:54 GMT

Analysis The privacy gap between Yahoo! and Google is greater than you think. It's not just that Yahoo! will anonymize user search data 6 months before Google anonymizes user search data. It's that Yahoo! anonymization is less nonsensical than Google anonymization.

Today, as we dutifully reported, Yahoo! said it would anonymize user search data within a mere 90 days (with exceptions for fraud, security, and legal obligations). It even agreed to extend this unprecedented policy to page views, page clicks, ad views, and ad clicks.

Of course, anonymization is a meaningless word. But it would seem that Yahoo!'s use of the term isn't nearly as misleading as Google's. When Yahoo! says it will anonymize log data, it intends to:

  • Delete the final octet of the user's IP address
  • Run the user's Yahoo! ID through a one-way secret hash and delete the last 50 per cent of the hashed identifier
  • Run the user's cookie identifiers through a one-way secret hash
  • Filter all personally identifiable information - such as credit card numbers, social security numbers, and non-popular names - from search queries

In its lust for targeted advertising and who knows what else, Yahoo! has stopped short of true anonymization: deleting IPs, IDs, and cookie info entirely. Recreating this data isn't beyond the realm of possibility. But at Google, recreation is trivial.

The Mountain View Chocolate Factory says it will - at some unspecified point in the future - anonymize user data after nine months. But it takes some additional liberties with the word "anonymize".

With its nine-month anonymiztion, Google intends to "change some of the bits" in the user IPs stored on its servers. But that's it. The plan would leave cookie data alone.

And that means IPs are easily restored.

Google may erase certain IP bits on your nine-month-old search queries, but those bits will remain intact on newer queries - and both sets of queries will carry the same cookie info. Recovering the missing bits on older data is one-step process.

After 18 months, Google does alter cookie data - in some unspecified way. And the company argues that users have the power to scrub their own cookies before then. "We have focused on IP addresses, because we recognize that users cannot control IP addresses in logs," the company has told us. "On the other hand, users can control their cookies.

"When a user clears cookies, s/he will effectively break any link between the cleared cookie and our raw IP logs once those logs hit the 9-month anonymization point. Moreover, we are still continuing to focus on ways to help users exert better controls over their cookies."

Of course, most users don't even know what a cookie is.

Plus, Google has not said it will disassociate search queries from your Google ID - required for using Google services such as Gmail or Google Docs and Spreadsheets.

In September, Google also said it might tweak its nine-month policies. But today, in the email, the ad broker provided no update. At the moment, it's unclear when Google will even begin its nine-month IP doctoring.

But the company wants you to know it takes privacy very seriously. "We aim to strike the appropriate balance between protecting our users' privacy and offering them benefits of data retention, such as better security measures and new innovations," it said.

It did not mention advertising.

Yes, Yahoo! is balancing as well. But the wounded web portal has gone significantly further than Google to protect its users from hacks, subpoenas, and, yes, national security letters. The rub is that Yahoo! handles about 20 per cent of US search traffic - and Google commands 70. ®