Phorm launches data pimping fight back
CEO Kent Ertegrul on spyware, bullshit and opting-out
Interview A week is a long time in internets. Last Friday we all felt like we were shouting at the bins about Phorm and its deals with BT, Virgin Media, and Carphone Warehouse.
Now, you can't move for stories about data pimping and the massive change in people's relationship with their ISP Phorm represents, not to mention the new legal turf we're being dragged on to.
The advertising targeting firm has now launched an impressive rearguard action aimed at soothing the controversy, with CEO and former MIG joyride salesman Kent Ertegrul pimping himself out across any media outlet that takes any notice of what the online public cares about (see Bootnote). We met up with him and Phorm's top boffin Marc Burgess on Wednesday afternoon at Phorm HQ (just around the corner from El Reg's new digs).
We're told Ertegrul did his own video webcast last night (here), but it doesn't seem to have been archived, so in what follows you'll have to imagine his Bond villain mid-Atlantic accent for yourself. There's four pages to slog through, but we hope you'll agree it's worth it.
El Reg: Can you explain the history of Phorm and how you were linked to what security experts describe as spyware?
Kent Ertegrul: We started off creating a toolbar. The toolbar was kind of a social browsing concept. Wherever you browsed it would show you people who browsed to the same website. And then what you would do is click and chat with those people, so it was like social networking based on where you were browsing.
Kent 'I am not the Prince of Darkness' Ertegrul,
with friend (artist's impression)
We concluded that the best way to monetise that was advertising. And because we were aware through the toolbar where people were browsing we started building an ad server that allowed you to show ads based on that.
That grew very well and then we saw an opportunity to take the ad technology that we built and bundle it with freeware applications. So that's how we got into the adware business - as opposed to the spyware business.
Things grew more and we went public, in fact we were the only public adware business, with shareholders like Fidelity and Morgan Stanley. Our [non-executive] chairman is the former chairman of Microsoft UK [David Dornan]. There's nothing shady.
But what happened was it became very clear to us that there was no distinction in people's minds between adware - which is legitimate - and spyware. So we did something unprecedented which was we turned around to our shareholders and we shut down all our revenues. We weren't sued, we weren't pressed by anyone, we just said "this is not consistent with the company's core objectives".
So PeopleOnPage was the original toolbar. When we took that public we were 121Media. When we decided to shut that down we became Phorm.
That's how we got into the adware business - as opposed to the spyware business.
Explain for our readers how Phorm's profiling system works.
Marc Burgess: What the profiler does is it first cleans the data. It's looking at two sets of information: the information in the request that's sent to the website and then information in the page that comes back.
From the request it pulls out the URL, and if that URL is a well known search engine such as Google or Yahoo! it'll also look for the search terms that are in the request.
And then from the information returned by the website, the profiler looks at the content. The first thing it does is it ignores several classes of information that could potentially be sensitive. So there's no form fields, no numbers, no email addresses (that is something containing an "@") and anything containing a title like Mr or Mrs.
Aren't you collecting the first three characters?
MB: Because of a peculiarity of the tokenisation, numbers three digits or shorter aren't collected anyway, they're too short so there's no numbers at all. If you have a mixture of letters and numbers - a compound - that would be potentially collected.
Say, for example, the start of postcode?
KE: But as you'll see it's irrelevant anyway.
MB: So we do this basic cleaning process and then we take a look at the key words that have come from the page and we eliminate "noise words" that have a low intrinsic meaning. So what we're left with is a clean version of the key words in the page which we then basically do a chart of the ten most commonly occurring words.
This process has the effect of largely eliminating personally identifiable information [PII] from the web page because it would have to contain PII that didn't match any of our criteria and also appeared repeatedly in the page.
The profiler takes this "data digest" and it passes it through the box we call the anonymiser and into the box we call the channel server. The channel server has got a database of advertising categories that we call channels - things like sport, health and beauty, travel, luxury cars, etc. The channels are global to the whole system [across ISP networks]. Via the Open Internet Exchange advertisers are able to specify the channels they want to target.
The channels are controlled in the content they can have. We don't have adult advertising, no medical channel, no tobacco, no gambling. The channels are also designed so they always match a minimum number of unique users - 5,000. A channel has to be sufficiently broad so that it doesn't just reduce to one or two users.
As soon as that match has been made the data digest, which has only ever been in memory, is immediately deleted. It never goes to disk.
KE: This is the single most important piece of this because this is a big story but it's not the story that you think it is...
[EDIT: We emailed Marc Burgess after the interview to ask what effect the system will have on the performance of your broadband. He replied: "The system is designed not to have any adverse impact on the connection performance, with no difference whether you are opted in or out."]
Sponsored: Hyper-scale data management