Original URL: http://www.theregister.co.uk/2006/11/08/guilty_associations/

"I have nothing to hide" - or the Sainsbury's Lesson

Guilt by association

By Guy Kewney

Posted in Policy, 8th November 2006 10:56 GMT

Comment How frightened would you be if you were secretly planning to get pregnant, without telling your husband, and discovered that someone had written to him telling him about it? Or, put the other way, how would you feel if you discovered your wife was pregnant only when someone dropped you a letter?

And who would that person be? It would have to be someone like Robbie "Cracker" Coltrane, right? A deep profiler with the power of a hypnotist...? Or it would have to be a Government spook. They could do it, surely, the way the world's spooks monitor all of us: easy. They tap our phones, right?

Well, the thing to remember is that it really did happen. And no, if you thought it was phone tapping, you're wrong, and it's a misunderstanding which I won't make any friends for clearing up. But you need to understand the basic principles of data mining to understand why the world of spooks and the world of search engines are about to overlap, and why you should be nervous about this.

The lesson here is one I call "The Sainsbury's Lesson" when doing presentations for technical audiences, because I was taught this by a data miner who worked for the giant British supermarket of that name.

The story, summarised, is that Sainsbury's was spending an absurd amount of money sending people promotional coupons, money-off special offers, and other junk mail to encourage them to swing by the Sainsbury's supermarket next time, rather than Waitrose or Safeway or Asda - and it was pretty hard to be sure it was actually doing any good.

The trouble was simple: they were sending girly shampoo promotions to households with six rugby-playing male students, or home improvement promotions to households with one elderly pensioner with osteoporosis, or bulk beer deals to households where they were all strictly teetotal. Not profitable stuff. And their IT staff heard about this and said: "But you don't have to do that!"

This goes back a bit before the days of Nectar, when Sainsbury's had its own loyalty card, plus it sold fuel out of its own petrol stations and ran a bank and a credit card. And the IT people said: "If you know what sort of things someone buys you can make a pretty good guess about what they may want to buy next."

The beauty of the system was that data mining requires no intellectual engagement. The Sainsbury's Lesson could be called the Amazon Lesson, or the Tivo Lesson for that matter. All you do is look for patterns. The more frequently you find a pattern, the better it is to guide you.

Take a commuter travelling from London to Reading. They can get into a car and drive. They can get into a taxi and be driven. They can take a bus. But most will get onto a train, with a choice of catching the Great Western service from Paddington, or the slower SouthWest service from Waterloo. If you want to catch them before they get to Reading, you aren't going to do it by chance.

But suppose you get information that says that they have passed through, Ascot, Sunningdale, Virginia Water, and Egham? If you know the route, you know that the next station, going west, will be Staines. Now, it's perfectly possible that they aren't going to be at Staines next. They might get off the train at Egham, take a cab to the station beyond Staines, and rejoin the train there... but seriously, what are the chances of that?

The same patterns show that we have choices, sure, in our purchasing life...but that, in fact, most of us do follow the same paths others do.

So Sainsbury's was able to discard 95 per cent of the potential households in its area and say: "They aren't going that way. They might buy boot polish, but it's so unlikely as to be a complete waste of money putting a Kiwi voucher through the door."

On the other hand, the patterns might show quite clearly that 5,000 people bought something called "product A" and "product B" and "product C" and went on to buy "product D" three weeks later. So if you can dig out all the people who bought A, B and C but have not yet bought D, and mail them, you're clearly looking at a common pattern. So drop them a discount voucher for D!

And D turns out to be baby clothes, and the man of the household is vastly amused: "Darling, look at this! Those idiots at Sainsbury's are sending us baby clothes vouchers! As if we would want to start a family!" - and she says, turning pink and coughing: "Dear, I've been meaning to tell you, I've missed my last period..."

And this actually happened. As you can imagine, the angry letters poured in: "Who told you...!?" and the project was subtly modified.

That's how spooks work. They don't care what you're talking about until they know you're a threat. They work out whether you're a threat or not by looking for patterns (and in deference to my friends in Cheltenham, I won't go into much more detail) in your ordinary, everyday behaviour, which betrays who your friends are. Most of that information is stuff you'd never think to make secret.

And this is the important part - you'd never think to ask Parliament to say: "Protect me from this!"

Now, your credit card records are commercially confidential, but are they a State-protected secret? Your electricity bill payments, your bank statements, your phone call bills, your email address book...your Oyster card details for the subway rail, your fuel purchases...is any of that prohibited for Government databases?

Why would you care?

Now, look at the success of Touching the Void. It languished on the remainder list for two years until Amazon started telling people who read a different book on mountain climbing: "We see that people who read what you're reading, enjoyed this!"

Or look at the way Tivo works. It matches the sort of TV you actually watch with what you record, and actually anticipates your interest in new series' which you haven't even heard of yet, based on viewing patterns which it recognises.

Where am I going with this?

Imagine that you have an unfortunate trend in your thinking. Imagine that you find yourself following a train of thought which matches a sequence which has been seen before among enemies of the Government.

"Great!" you say. "This means that the enforcement people can actually anticipate terrorist acts? Spot potential terrorists before they even realise they're on the path to mayhem?"

No. Not terrorists, just enemies. Hostile journalists, campaigning lobbyists, critical journalists, businessmen who are likely to sponsor rival parties, people who oppose the party leader's favourite idea of the year.

You, yourself, possibly don't realise where you're heading. They do. They can now take action, as appropriate.

Appropriate action? Well, they could, of course, target you for a hearts and minds educational campaign so that you see things from a better point of view. Or it could be hostility: arrange a Customs or VAT inspection, or start a smear campaign, or search aggressively for accounting irregularities.

All based on guilt by association. They may not know exactly what is in your heart. But they know who you associate with, and what other people are doing who went down that path. All that is an open book to someone with access to all your email, all your search queries, all the sites you visited, all the phone calls you made, all the books you bought or ordered from the library. "People who read this, also bought..."

You should go to Amazon and tweak its database. Give it a chance to work out which books you bought for Auntie Ena, which records you gave to your mother, which bits of hardware you got as a present for a kid, and which were things you wanted for you. And then rate them. You will find the next "we have recommendations for you...!" message spookily prescient.

And the search engines have all this, in spades. Exactly where it's going would be the subject of a $4,000 research paper. But you can bet that what they exploit today is a tiny fraction of what they can, potentially, dig out of the online data.

Of all the big search engine companies, I'd trust Google more than the rest. It has a great track record of telling the US Government to piss off when asked for data on user data patterns, when people like MSN and Yahoo! have not only provided the data, but didn't even complain about the request.

But does it matter? Frankly, the more I think about what this data mining makes possible, the more I realise it needs specific legislation, not just making it a crime to provide this sort of data to government, but making it forbidden for governments to keep.

I think that the UK and US governments in the last 10 years have shown, unambiguously, that any sort of protest, disagreement, or criticism of government is something they feel they can legitimately suppress.

If they can detect protest, disagreement, or criticism before the critic reaches the point of taking effective action, they can do so without anybody noticing. If they could have caught Brian Haw before he set up in the square in front of Parliament, how much less embarrassing it would have been!

Would they do something like that?

Frankly, anybody who can actually pass a law saying you can't open a newspaper with a critical article about the government outside Parliament, would - in my opinion - do just about anything. They believe that what they are doing is RIGHT and anybody who opposes them is BAD. Why would they not try to stop us?

The genie, unfortunately, is out of the bottle.

The data that makes humans predictable is widely available and easy to aggregate. If the data exists, and the technology is simple, then it will be used. If it becomes overtly illegal to let government have the data, then perhaps we could be safe...maybe.

But frankly, anybody who owns a credit card, bank account and a passport probably has enough data out there already to make them pretty predictable.

Don't worry about Google giving your geographical position away to corner shop coffee sellers. Worry about the governments, who will have tools of oppression that make simple Stalinesque brutality look like freedom, and who will make "pre-crime" a reality. ®