40,000 Tinder pics scraped into big data service
Trove then disappears, as folks point out the privacy problem
Amid a storm of criticism, a set of facial images built by scraping the Tinder dating service has been pulled from Kaggle.
Developer Stuart Colianni had built the 40,000-strong set of “hoes” (the charming variable name* in his source code – more below in case that repo also dies) on the premise that facial datasets are generally too small to be useful.
The Kaggle page where he published the dataset now returns a 404.
The Register has asked Kaggle, whose terms and conditions forbids crawlers, to confirm the reason for the deletion.
At the GitHub page, Colianni attributes the removal to a request from Tinder.
In any jurisdiction with medium-strength privacy regulations, scraping and publishing the data without consent probably represents a breach.
For example Australian privacy analyst Stephen Wilson of Lockstep told The Register scraping a dating site is “an offence akin to theft by finding” (that is, if you find a suitcase stuffed with banknotes, you're don't get to keep it, you have to try and find the owner).
Likewise, the popular hobby of inferring personally identifiable information from multiple datasets is a breach of privacy legislation in many countries.
Wilson notes that the word “public” almost never occurs in data privacy laws around the world. ®
*Bootnote: It's hard to accept the intentions as benign with code snippets like this:
# Iterate through list of subjects for hoe in hoes: # Get the subject ID sid = hoe['_id'] # Gets a list of pictures of the subject pictures = hoe['photos']
Keep it classy
We're all hoes, it seems.