Original URL: https://www.theregister.com/2010/11/22/google_instant_previews_skew_web_analytics/

Google 'Instant Previews' hit Google Analytics with fake traffic

Real-time page fetch

By Cade Metz

Posted in Legal, 22nd November 2010 19:21 GMT

Updated Update: This story has been updated with comment from Google, and it has been clarfied to show that Google did tell webmasters when "Instant Previews" launched that it would be doing real-time fetches in some cases. We've also added stats concerning the fetches from The Register's site logs.

Google's new "Instant Previews" search tool is skewing traffic stats for sites using Google Analytics, creating page views before pages are actually viewed.

Rolled out across Google's search engine earlier this month, Instant Previews lets searchers, yes, preview sites before they visit them. Users click on a small icon that appears beside a search result, and this launches an image of the site in question on the right-hand-side of Google's results page.

As Google pointed out when "Instant Previews" was launched, Google is – in some cases – fetching these previews in real time. Soon after the tool's launch, webmasters posting to Google's help forums noticed that these pre-fetches were skewing Google Anayltics numbers. And as noticed by Search Engine Land, a Google employee later confirmed this with a post of his own.

The employee confirms that these real-time fetches are executing JavaScript used by Google Analytics, the company's own web analytics tool, and this is skewing traffic numbers. But he indicates that a fix is on the way. "We're working on a solution for this, to prevent Google Instant Preview on-demand fetches from executing Analytics JavaScript," the Google employee says. "I'm not sure about the timeframe, but I'll drop a note here when I have more to share. Thanks for your patience."

This same employee goes on reiterate that the preview fetches use their own user agent, so webmasters can filter them out if they're using other analytics methods.

"It is my understanding that these page-views are currently only counted (the Google Analytics JavaScript executed) when we render the preview image on-demand (when a user chooses to view it and when we don't have one cached already)," he says. "Because of that, you may see a temporary change for that particular user-agent. The Analytics and Instant Previews teams are aware of this and looking into a solution.

"If you are using other website metrics tracking solutions, it might make sense to also filter that user-agent out."

The company has now posted a FAQ that details the user agent in question:

Mozilla/5.0 (en-us) AppleWebKit/525.13 (KHTML, like Gecko; Google Web Preview) Version/3.1 Safari/525.13

Asked to comment, Google told us: "Webmasters have the ability to control whether Instant Previews are counted as page views. This works in the same way they control how crawls by regular Googlebot count as page views. Instant Previews sometimes gets enough information from Google’s regular crawl. Occasionally, Google will need to refetch this information when the user needs it, and in these situations we will do so using the 'Google Web Preview' useragent. Webmasters can configure their sites to treat this useragent in the same way that they handle crawls by googlebot."

The FAQ page explains that Google fetches previews in real time when it lacks a cached copy of the page previousy collected by its crawl bots. "We mostly generate preview images based on content we’ve crawled with Googlebot," it says. "When we don’t have a cached preview image (which primarily happens when we can’t fetch the contents of important resources), we may choose to create a preview image on-the-fly based on a user’s request. "

The company also says that because the preview fetches use a separate user agent, the previews may include data that webmasters have blocked the crawl bots from collecting. "As on-the-fly rendering is only done based on a user request (when a user activates previews), it’s possible that it will include embedded content which may be blocked from Googlebot using a robots.txt file."

The Google Analytics situation is reminiscent of the AVG Linkscanner, which started spewing fake traffic across the net in early- to mid-2008. In late February of that year, AVG paired its anti-virus engine with a real-time malware scanner that would vet search results before users clicked on them. If you searched Google, for instance, it would automatically visit each address that turns up on Google's results page.

According to the company, more than 20 million people had downloaded the new AVG 8 by late June 2008, and this caused a huge uptick in traffic on sites across the web. Under pressure from webmasters, the company soon disabled its real-time scanning.

But judging from site logs at The Register, the number of real-time preview fetches from Google is relatively small. Over the past 24 hours, we've had 1244 page requests for the user agent in question from a mere 60 unique IPs. ®