The AVG LinkScanner scans search results on Google, Yahoo!, and Microsoft's Live Search. And unlike similar technology from ScanSafe, it doesn't mask the user's IP address.
"In order to detect the really tricky - and by association, the most important - malicious content, we need to look just like a browser driven by a human being," Thompson told us. According to Thompson, nearly all web exploit toolkits track IP addresses, and they won't serve the same exploit twice to the same address.
Thus, when a scan turns up in a web site's log file, it looks an awful lot like a legitimate user visit. Thompson points out that AVG only scans the first page of results on sites like Google - unless the user clicks on subsequent pages. But clearly, with 20 million web surfers on board, even this is enough to wreak havoc with sites that so often pop to the top of the leading search engines.
In the discussion forums at Webmaster World, site gurus have pinpointed a specific user agent that betrays the anti-malware tool: "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;1813)." Hoping to fix their traffic numbers, they're filtering this agent from their log files. And at least one webmaster is serving up a dummy file each time the agent turns up. That way, he burns less bandwidth.
"I prefer to feed the same dummy file to most robotic tools," says the web master of TheSilhouettes.org. "There are thousands of them crawling around out there and they have voracious appetites. Only a handful are of any benefit to me, the rest are a nuisance and many are actually malicious."
Of course, this is the last thing Roger Thompson wants. He acknowledged that an AVG scan can be identified with that user agent. But he indicated this may change.
He also said AVG is interested developing some other solution to webmasters' problems - but this will take a back seat to security. "Our primary responsibility is to provide the best possible protection for our users, but we can and will seek a programmatic solution," he said. "Given that we've only just been alerted to this situation, we're still researching it, and it's not clear what we'll do at this point."
If AVG does mask its user agent - and fails to provide another workaround - its ghost traffic looks exactly like real traffic. And then the web is in trouble. After all, 50 million AVG users have yet to upgrade.
The Reg has always believed in log file analysis. Alternative methods from companies like Comscore and Nielsen seem to underestimate traffic from those who surf on their daytime work machines. Plus, Comscore's software gives us the the willies.
We always make an effort to filter robotic clicks from our files. And we use an outside organization, ABCe, to audit our numbers. But if AVG kills that user agent, even ABCe is powerless.
When we contacted ABCe, it was unaware of the problem too, and though it now acknowledges the issue, it's still mulling the solution. "[The AVG Linkscanner] can cause noticeable spikes in traffic levels logged by [a] site," reads its canned statement. "[We] will continue to review this and other technical areas."
And this is separate from the bandwidth problem. Extra bandwidth costs are negligible to The Reg. But as Adam Beale points out, this is a serious burden to smaller sites - and the AVG scanner is certainly hitting smaller sites.
We can't help but think there's a showdown on the way. ®
I quite agree with those of you who wrote that AVG was now too bloated, had too many false detections, has been irresponsibly released with this terrible link scanning tech.
I've used AVG AV for years but sadly these changes in version 8 are just unacceptible. I'd go without AV protection at all before I'd run AVG8.
@ John A Thomson - We didn't need to "learn" what you had to say, it was obvious and thoroughly weighed by others who had enough sense to see the problems with linkscanner far outweigh the dubious benefits.
The obvious answer has already been mentioned, get rid of linkscanner and use a proxy if it's really that important. The issue of infection method over file identification is not relevant, that can be detected after the link was clicked and content cached locally.
A 2000% increase in bandwidth use!
Having been trying to discover the source of high server loads and spiraling bandwidth use since 24th May, I finally tracked the issue down to this AVG scanner - it has caused a 2000% increase in daily traffic from my server on a reasonably small site. The site usually accounts for just 14GB/month but so far in June we're up to 300GB.
Even worse, the requests aren't to real pages and are all generating 404 errors - literally hundreds of thousands of them. I have had to turn off a custom 404 error page because of this to reduce what my Apache server has to do.
Fantastic. I didn't like AVG before this, but now I am going to actively tell my clients never to use it.
Re: Response from AVG
Well how about mailing back to you the report of bad sites and you can collect them and inform the website owner?
How about scanning as you download, rather than scan-ahead?