This article is more than 1 year old

FYI: Twitter's API still spews enough metadata to reveal exactly where you lived, worked

Old tweets betray sensitive data under new tools

Analysis Researchers have demonstrated yet again that location metadata from Twitter posts can be used to infer private information like users' home addresses, workplaces, and sensitive locations they've visited.

Computer science boffins Kostas Drakonakis, Panagiotis Ilia, Sotiris Ioannidis, and Jason Polakis affiliated with The Foundation for Research & Technology in Greece and the University of Illinois in the US published their findings in a paper titled "Please Forget Where I Was Last Summer: The Privacy Risks of Public Location (Meta)Data," which is scheduled to be presented at the Network and Distributed System Security Symposium in February.

"We show that location metadata enables the inference of sensitive information that could be misused for a wide range of scenarios (eg: from a repressive regime de-anonymizing an activist’s account to an insurance company inferring a customer’s health issues, or a potential employer conducting a background check)," they claim in their paper.

The privacy risk associated with Twitter geolocation data was explored in academic research published in 2015 and since then Twitter has provided users with more control over location data and limited the precision of recorded coordinates. The company presently disables precise location by default and it requires users to opt-in to share their location.

"Account holders choose to share their location when they Tweet," a Twitter spokesperson said in an email to The Register on Monday.

"Please note this is opt-in; we never attach location to a Tweet without the person's permission. If someone chooses to share their location in a Tweet, the location is also available via our APIs. Again, this is strictly when a person opts in."

Some progress, but not enough

But Twitter's changes haven't really mitigated the privacy risk since the company continues to offer historic location data through its developer API. Versions of the Twitter mobile app for Android and iOS released before April 2015 automatically included precise GPS coordinates as metadata in tweets tagged with a low-precision location label.

"In the dataset we collected we found that tweets with coarse grained location labels (e.g., the name of a city) also have GPS coordinates in the metadata dating back to 2010," said Polakis. "After April 2015 tweets started appearing with coarse grained labels but without GPS coordinates in the metadata, indicating that around that time there was a change in Twitter's app."

For the researchers, the Twitter policy that allowed the inclusion of precise location data represents a privacy problem that should be addressed.

"This privacy violation is invisible to users, as the GPS coordinates are only contained in the metadata returned by the API and not visible through the Twitter website or app," the paper explains. "To make matters worse, this historical metadata currently remains publicly accessible through the API."

Location data presents businesses with a challenge: It's potentially so valuable for ad targeting that companies appear to be disinclined to discourage its disclosure and don't go to great lengths to explain how such data might be used. Last week, the Los Angeles City Attorney filed a lawsuit against IBM's weather company for failing to adequately disclose how it uses the location data harvested through its Weather Channel app.

For Twitter users, the problem is privacy. To outline possible risks, the paper describes how a user's negative statements about a doctor on Twitter allowed the individual to be placed at the office of a mental health professional. It also recounts a user complaining about blood testing in a tweet geo-tagged to a rehab center.

Some tools better left unshared

In the course of their work, the researchers developed and tested a location data auditing tool called LPAuditor to examine tweets for location metadata and infer sensitive personal information.

The tool, which relies on publicly accessible geolocation databases, will not be open sourced due to the potential for misuse, said Jason Polakis, assistant professor of computer science at the University of Illinois at Chicago and one of the paper's co-authors, in an email to The Register.

The software can pinpoint the locations associated with homes and workplaces much more accurately than previously demonstrated techniques.

"Our system is able to identify the home and workplace for 92.5 per cent and 55.6 per cent of the users respectively," the paper says.

That's between 18.9 per cent and 91.6 per cent more accurate for homes and 8.7 per cent to 21.8 per cent more accurate for workplaces than has been demonstrated in the past, the researchers say.

Polakis and his colleagues found "71 per cent of users have tweeted from sensitive locations, 27.5 per cent of which can be placed there with high confidence based on the content of their tweets."

When users can choose whether location data gets published, there's a 94.6 per cent reduction in tweets tagged with GPS coordinates, according to the researchers. They argue such stats underscore the benefit of giving people control over location data. But location controls are not retroactive – developers presently have access to years of location data through the Twitter API.

facebook

Facebook admits it does track non-users, for their own good

READ MORE

Out of 290,162 users in the survey dataset, 87,114 posted geotagged tweets via the official Twitter and Foursquare apps. The researchers did not consider other third-party apps, which they said "may handle geolocation data differently as Twitter’s Geo Guidelines are neither mandatory nor enforceable."

Using the Twitter API, the researchers were able to find precise geolocation data for about 30 per cent of those in the user dataset. They say the Twitter policies that allowed such data to be published resulted in "an almost 15-fold increase in the number of users whose key locations are successfully identified by our system."

What's in the databases?

The fact that third parties may have collected this data and stored it without the explicit consent of Twitter users is troubling for Polakis.

"So much data is being collected and shared/sold to third parties without the users being explicitly aware of that (or able to prevent it)," he said. "And indeed it is problematic when users have no way to delete that data in third-party databases, even though the first party may offer such an option."

Cautioning that he's not a legal scholar, he nonetheless says that given the research findings and the sensitive nature of the what can be inferred from location data, legislation or more explicit oversight may make sense for such data.

"We hope to see a change in how major companies collect and share location data, and the adoption of more privacy-preserving approaches," he said.

"We also hope that our work can help educate users on the risks that they face when they share their location data (either explicitly or inadvertently) with web services or other users. Being aware of what someone could infer about you using that data can be a powerful incentive towards being more cautious during your online activities." ®

More about

TIP US OFF

Send us news


Other stories you might like