Roses are red, violets are blue, fake-news-detecting AI is fake news, too
Humanity's bulls*** is too much for software
Analysis The viral spread of fake news and “alternative facts” has rocked Western politics. Oxford Dictionaries chose “post-truth” as its word of 2016, and when a society is scolded by a dictionary wielding a hyphenated word, you know you've collectively screwed up.
“The concept of post-truth has been in existence for the past decade, but Oxford Dictionaries has seen a spike in frequency this year in the context of the EU referendum in the United Kingdom and the presidential election in the United States. It has also become associated with a particular noun, in the phrase ‘post-truth politics’,” the Brit word wizards tutted.
Yes, there's always been dodgy facts on the internet, and in newspapers that were read daily by millions. However, misinformation toward the end of 2016 was spreading at an alarming rate, thanks to the greasy tubes of social networks, SEO-doped Macedonian teens, and electorates dying to soak up words that reinforced their political and world view.
Who do we turn to, to end this scourge? Artificial intelligence, right?
Trapped in a perpetual cycle of hype, machine intelligence has been heralded as the miracle cure for society’s woes: cancer, climate change, inequality, crime, you name it. Get a bunch of data, fire up the GPUs, and use deep learning. Voila!
Superintelligent machines needed, please apply here
Dean Pomerleau and Delip Rao, AI tech entrepreneurs, thought so when they tried to launch the Fake News Challenge (FNC). This is a contest that encourages AI researchers to invent algorithms that can filter out clickbait and fabrications from streams of news articles.
Initially, Pomerleau and Rao thought the winning software in their challenge would be able to detect and highlight baseless assertions all by itself with no human intervention. “I made a casual bet with my machine learning friends, and thought it’d be trivial to apply the same techniques used in spam filtering and detecting bogus websites for fake news,” Pomerleau told The Register. “I came into [the Fake News Challenge] naively."
After chatting to more machine-learning experts and journalists, the pair realized identifying deceptive editorial copy was a murky business.
There are simple facts that can be easily verified – such as the height of the Statue of Liberty and the name of the UK Prime Minister. Then there are truths that are harder to prove, such as whether or not something was an accident, or if two leaders really were friends or had secretly fallen out. There are truths that require anonymous sources who need protecting, and there are truths that are covered up and officially denied.
It is difficult for even humans to assess what is real and what isn't, let alone machines: how many people fall for the Borowitz Report in the New Yorker every week, for example? Training machines to pick out complex truths from fiction would be an arduous task, considering there isn't a clean database with a complete list of verified facts.
The system would have to trawl through the entire internet to gain enough knowledge and wisdom to be able to label news as legit or made up. “It would need a very subtle understanding and reasoning of the world to arrive at a conclusion,” said Rao.
Zachary Lipton, a machine learning researcher at the University of California, San Diego, was highly critical of the first version of the contest. Building software to spit out a “boolean fakeness indicator” – a 1 or 0 for a true or false news article – and a confidence score for each URL, would be “problematic,” Lipton wrote in a blog post.
Pomerleau and Rao have since changed their minds, and now believe a fully automated truth labelling system is “virtually impossible” with today's AI and natural language processing abilities. Building a supervised classifier able to tell right from wrong would take super intelligence or even artificial general intelligence, the duo told The Register.
The second version of the competition calls for code that can perform “stance detection” instead. Claims in headlines are tested against the contents of a story. You give the headline and the text beneath to an algorithm, and the output should be one of four categories:
- Agrees: The body text agrees with the headline.
- Disagrees: The body text disagrees with the headline.
- Discusses: The body text discusses the same topic as the headline, but does not take a position.
- Unrelated: The body text discusses a different topic than the headline. It will allow human fact checkers to identify stories that might hold evidence for the arguments needed to inspect the claims made, so they can judge the accuracy of information quickly.
The AI that can do this with the highest degree of accuracy is the winner.
It’s important to note that the winning program won't solve the fake news problem, Lipton said. But it might help to lighten the load on fact checkers, or at least steer readers away from clickbait. “[It’s] better to start with [something] modest but concrete [rather] than magical and infeasible. I think [stance detection] is a strong move in the right direction. It’s also a good opportunity to identify a community of talented researchers committed to worthwhile causes,” he told us.
The number of teams registering for the FNC has shot up since a training dataset was released earlier this month. It’s gone from 72 to 206 coding crews in just under two weeks. A cash prize is on offer although the exact figure is yet to be confirmed, as Pomerleau and Rao are looking for sponsors willing to contribute financially.
Sponsored: What next after Netezza?