This article is more than 1 year old

Is Chrome really secretly stalking you across Google sites using per-install ID numbers? We reveal the truth

El Reg digs into claims by Kiwi browser maker that ad giant is not GDPR compliant

Updated Google is potentially facing a massive privacy and GDPR row over Chrome sending per-installation ID numbers to the mothership.

On Tuesday, Arnaud Granal, a software developer involved with a Chromium-based browser called Kiwi, challenged a Google engineer in a GitHub Issues post about the privacy implications of request header data that gets transmitted by Chrome. Granal called it a unique identifier and suggesting it can be used, by Google at least, for tracking people across the web.

He and others argue this violates Europe's General Data Protection Regulation, because the identifier could be considered to be personally identifiable data.

Google did not respond to a request for comment, but its description of the header suggests it would argue otherwise.

When a browser wishes to fetch a web page from a server, it sends an HTTP request for that page, a request that contains a set of headers, which are key-value pairs separated by colons. These headers describe data relevant to the request. For example, sending the header accept: text/html tells the browser what media types it will accept.

For years, since 2012 at least, Chrome has sent a header called X-client-data, formerly known as X-chrome-variations, to keep track of the field trials of in-development features active in a given browser. Google activates these randomly when the browser is first installed. Active trials are visible if you type chrome://version/ into Chrome's address bar. Under the label Variations, you're likely to see a long list of hexadecimal numbers similar to 202c099d-377be55a.

Referenced on line 32 of this Chromium source code file, the X-client-data header sends Google a list of field trials available to the Chrome user.

"This Chrome-Variations header (X-client-data) will not contain any personally identifiable information, and will only describe the state of the installation of Chrome itself, including active variations, as well as server-side experiments that may affect the installation," Google explains in a paper describing Chrome capabilities.

Google suggests the number of active variations for a given installation – if usage statistics and crash reports are disabled – are determined by a random seed number between 0 and 7,999, which falls within 13 bits of entropy.

Less entropy means browser fingerprinting becomes more difficult, and more entropy means the opposite. But usage statistics and crash reports are on by default, so most Chrome users operate under high entropy for this particular data point.

"If stats are on, then the ID is called 'High entropy ID' in the source-code, and 'determined by your IP address, operating system, Chrome version and other parameters,' and sticks to your installation," explained Granal in an email to The Register.

For example, if you visit YouTube using Chrome, the header might include a string like this:

X-client-data: CIS2yQEIprbJAZjBtskBCKmdygEI8J/KAQjLrsoBCL2wygEI97TKAQiVtcoBCO21ygEYq6TKARjWscoB

"With that long ID, hard to believe it's only 8,000 possibilities," observed Granal.

Chrome users can see this for themselves by opening up the browser's Developer Tools, selecting the Network tab and loading a Google property like YouTube or visiting https://ad.doubleclick.net/test. In the right-hand Developer Tools pane, various headers sent during the page load request should be visible, including X-client-data.

"When you install Google Chrome, your installation gets assigned a random number 0 and 7999 and this number is mixed with a number given by Google's servers ('seed'), depending on your country, your IP address, and other criteria that Google decides (it could be a random number between 0 and 10 billion as well, we'd never know)," explained Granal.

"This identifier is stored on your computer, and sent every time your Google Chrome communicates with Google *including* (and that makes a huge difference) DoubleClick services (ad targeting)."

According to Granal, this identifier is sent to, and can only be read by, youtube.com, google.com, doubleclick.net, googleadservices.com, and other Google-owned domains – except when in Incognito mode.

This issue has come up before. It was discussed in 2018. But it's relevant again because Google is in the midst of a broad revision of its web technologies, including its browser code, its extension platform, and web specifications to close privacy and security gaps while retaining the ability to deliver targeted ads.

One of the stated goals of Google's revisions is to reduce the effectiveness of browser fingerprinting – creating a unique identifier for internet users based on the technical capabilities of their browser. In fact the Issues thread where Granal weighed in was about Google's plan to make the text string sent in the User-Agent header more generic (less entropy,) so it's less useful for fingerprinting.

There's been some resistance among marketers about losing the ability to track people through fingerprinting. The GitHub discussion includes individuals affiliated with ad tech firms who worry that losing data for tracking will make it harder to police ad fraud and will magnify Google's data advantage.

In an email to The Register, Augustine Fou, a cybersecurity and ad fraud researcher who advises companies about online marketing, dismissed the idea that less fingerprinting means more ad fraud.

"The UA string was entirely useless in detecting fraud since the beginning because any bot worth its salt can copy and paste a legit UA string and pass that to any detection tech to get by it," he said. "So losing the UA string will not increase fraud, unless of course you assumed UA strings were useful to detect bots, which is not true."

But the existence of the X-client-data identifier, even if it's only readable by Google, makes it clear that Google is focused on privacy with respect to third-parties, rather than a defense against itself.

Lukasz Olejnik, a computer scientist, independent privacy researcher, and adviser, said in an email to The Register that while this feature has been around for a while, and is probably meant to help track technical problems, it raises potential issues.

Chrome icon on sandy beach

Google promises next week's cookie-crumbling Chrome 80 will only cause 'a very modest amount of breakage'

READ MORE

"The ID is rather non-transparent, and its management by the user is far from easy," Olejnik said. "I would imagine that most users have no idea about this ID, what it does and when it is in use. A potentially problematic issue seems to be that the persistent ID is not reset when the user is clearing browser data. In this sense, it is a fingerprint."

"The risk in general is bounded by the fact that this ID is apparently only sent to sites controlled by a single organization," he added, referring to Google. "It is then up to the receiving party to make sure that processing of this data is done rightly, so either that users know about it, or that it is impossible to use the ID to single out individuals."

Fou observes that Google has users logged into a variety of services like Chrome, Gmail, Google Maps, Google Docs, and Android devices, to name a few, so it can already track you that way.

"So you can see having User-Agent strings on a damn browser is less than irrelevant to Google, because it can still ID everyone it wants (and it has Google Analytics, DoubleClick, Adsense, reCaptcha and other code on pretty much every site that matters)," he said. "So anyone who visits any site, Google can set its own first-party cookie to identify them."

There may also be a security vulnerability here. Granal points out that the Chromium source code only checks for a preset list of Google domains but doesn’t check specific domains, so a malicious individual could buy a domain like youtube.vg and setup a website there to collect X-client-data header information, at least until the take-down notice arrives. ®

Updated to add

In a statement to The Register after this story was filed, a Google spokesperson denied the web giant uses the X-client-data header for identifying or tracking individual users.

“The X-Client-Data header is used to help Chrome test new features before rolling them out to all users,” Google’s spokesperson said.

“The information included in this header reflects the variations, or new feature trials, in which an installation of Chrome is currently enrolled. This information helps us measure server-side metrics for large groups of installations; it is not used to identify or track individual users.”

More about

TIP US OFF

Send us news


Other stories you might like