AVG scanner blasts internet with fake traffic
You're not that popular after all
Exclusive Early last month, webmasters here at The Reg noticed an unexpected spike in our site traffic. Suddenly, we had far more readers than ever before, and they were reading at a record clip. Visits actually doubled on certain landing pages, and more than a few ho-hum stories attracted an audience worthy of a Pulitzer Prize winner. Or so it seemed.
As it turns out, much of this traffic was driven by the new malware scanner from AVG Technologies.
Six months ago, AVG acquired Exploit Prevention Labs and its LinkScanner, a tool that automatically scans search engine results before you click on them. If you search Google, for instance, and ten results turn up, it visits all ten links to ensure they're malware free.
Then, in February, AVG paired  LinkScanner with its anti-virus engine, which has about 70 million active users worldwide. The company estimates that 20 million machines have upgraded to its new security suite, AVG version 8, and this has already cooked up enough ghost clicks to skew traffic not only on The Reg but any number of other sites as well.
Adam Beale, who runs a UK-based internet consultancy, says that across his small stable of clients, traffic has spiked as much as 80 per cent on some sites. And this is more than just an inconvenience. After all, sites live and die by their traffic numbers. And net resources aren't free.
"Although [the AVG LinkScanner] might be good for the security of users, it's a real pain for website owners and webmasters," Beale tells us, having blogged  about this growing problem. "It's causing people to think their traffic is increasing, costing those who pay for bandwidth, and wasting disk space with large amounts of unnecessary lines in log files."
One of his clients, Beale says, normally pulls in 140GB of bandwidth a month, and for June, he predicts a 5 per cent jump.
When we spoke to AVG chief of research Roger Thompson earlier this week, he was unaware of these issues. But he defended the role of LinkScanner, which he designed while serving as CTO of Exploit Prevention Labs.
"There's so much hacking activity going on the web. The only way to really tell what's there is to go and have a look," he told us. "I don't want to sound flip about this, but if you want to make omelettes, you have to break some eggs."
But what about webmasters?
Webmasters deal with robot traffic and other rogue visits all the time. But this is a little different. In an effort to fool even the sneakiest malware exploits, LinkScanner does its best to imitate real user clicks - which means most webmasters are completely unaware of the problem.
At the moment, there is a way of filtering AVG traffic from log files. But it's unclear whether this method would bag legitimate traffic as well. And Thompson suggests that - in the name of high security - AVG may make changes that prevent such filtering.
That could destroy web analytics as we know it.
"A situation like this where there is in effect false traffic, where something is generating what is bogus data, leads to wrong budget decisions and marketing activities," says Barry Parshall, director of product management at WebTrends, a popular web analytics firm. "I completely get the value proposition [of LinkScanner], but it would be responsible of them to identify themselves, with agent code or whatever it might be, so legitimate businesses can serve their customers properly."
The AVG LinkScanner scans search results on Google, Yahoo!, and Microsoft's Live Search. And unlike similar technology from ScanSafe, it doesn't mask the user's IP address.
"In order to detect the really tricky - and by association, the most important - malicious content, we need to look just like a browser driven by a human being," Thompson told us. According to Thompson, nearly all web exploit toolkits track IP addresses, and they won't serve the same exploit twice to the same address.
Thus, when a scan turns up in a web site's log file, it looks an awful lot like a legitimate user visit. Thompson points out that AVG only scans the first page of results on sites like Google - unless the user clicks on subsequent pages. But clearly, with 20 million web surfers on board, even this is enough to wreak havoc with sites that so often pop to the top of the leading search engines.
In the discussion forums at Webmaster World , site gurus have pinpointed a specific user agent that betrays the anti-malware tool: "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)." Hoping to fix their traffic numbers, they're filtering this agent from their log files. And at least one webmaster is serving up a dummy file each time the agent turns up. That way, he burns less bandwidth.
"I prefer to feed the same dummy file to most robotic tools," says the web master of TheSilhouettes.org . "There are thousands of them crawling around out there and they have voracious appetites. Only a handful are of any benefit to me, the rest are a nuisance and many are actually malicious."
Of course, this is the last thing Roger Thompson wants. He acknowledged that an AVG scan can be identified with that user agent. But he indicated this may change.
He also said AVG is interested developing some other solution to webmasters' problems - but this will take a back seat to security. "Our primary responsibility is to provide the best possible protection for our users, but we can and will seek a programmatic solution," he said. "Given that we've only just been alerted to this situation, we're still researching it, and it's not clear what we'll do at this point."
If AVG does mask its user agent - and fails to provide another workaround - its ghost traffic looks exactly like real traffic. And then the web is in trouble. After all, 50 million AVG users have yet to upgrade.
The Reg has always believed in log file analysis. Alternative methods from companies like Comscore and Nielsen seem to underestimate traffic from those who surf on their daytime work machines. Plus, Comscore's software gives us the the willies .
We always make an effort to filter robotic clicks from our files. And we use an outside organization, ABCe , to audit our numbers. But if AVG kills that user agent, even ABCe is powerless.
When we contacted ABCe, it was unaware of the problem too, and though it now acknowledges the issue, it's still mulling the solution. "[The AVG Linkscanner] can cause noticeable spikes in traffic levels logged by [a] site," reads its canned statement. "[We] will continue to review this and other technical areas."
And this is separate from the bandwidth problem. Extra bandwidth costs are negligible to The Reg. But as Adam Beale points out, this is a serious burden to smaller sites - and the AVG scanner is certainly hitting smaller sites.
We can't help but think there's a showdown on the way. ®