Emergent Tech

Artificial Intelligence

AI sucks at stopping online trolls spewing toxic comments

It's easy to for hate speech to slip past dumb machines

By Katyanna Quach


New research has shown just how bad AI is at dealing with online trolls.

Such systems struggle to automatically flag nudity and violence, don’t understand text well enough to shoot down fake news and aren’t effective at detecting abusive comments from trolls hiding behind their keyboards.

A group of researchers from Aalto University and the University of Padua found this out when they tested seven state-of-the-art models used to detect hate speech. All of them failed to recognize foul language when subtle changes were made, according to a paper [PDF] on arXiv.

Adversarial examples can be created automatically by using algorithms to misspell certain words, swap characters for numbers or add random spaces between words or attach innocuous words such as ‘love’ in sentences.

The models failed to pick up on adversarial examples and successfully evaded detection. These tricks wouldn’t fool humans, but machine learning models are easily blindsided. They can’t readily adapt to new information beyond what’s been spoonfed to them during the training process.

“They perform well only when tested on the same type of data they were trained on. Based on these results, we argue that for successful hate speech detection, model architecture is less important than the type of data and labeling criteria. We further show that all proposed detection techniques are brittle against adversaries who can (automatically) insert typos, change word boundaries or add innocuous words to the original hate speech,” the paper’s abstract states.

The problem of sniffing out toxic language normally boils down to a classification problem. Does this sentence contain any swear words or racist and sexist slurs?

Google’s API Perspective calculates a score to determine if text is hateful or not. But by narrowing it down to a simple classification problem, it means that it can suffer from false positives - when the sentence contains offensive language but its overall meaning is harmless.

Some false positive examples that show how brittle Google's Perspective model is. Image credit: Gröndahl et al.

The researchers were too polite and replaced a “common English curse word, marked with “F” here, but [was used] in [it’s] original form in the actual experiment.” You get the idea.

“Attack effectiveness varied betweeen models and datasets, but the performance of all seven hate speech classifiers was significantly decreased by most attacks,” according to the researchers.

The weakest models are ones that inspect sentences word-by-word, since tiny changes like adding spaces between words will slip by unnoticed. The ones that break down words by individual characters do slightly better at recognizing attacks.

Google's troll-destroying AI can't cope with typos


“A significant difference between word- and character based models was that the former were all completely broken by at least one attack, whereas the latter were never completely broken,” the team said.

Future research should focus on making models more robust to attacks, the researchers said. Developers should pay closer attention to the training dataset rather than the algorithms themselves, they argued.

“We therefore suggest that future work should focus on the datasets instead of the models. More work is needed to compare the linguistic features indicative of different kinds of hate speech (racism, sexism, personal attacks etc.), and the differences between hateful and merely offensive speech,” the paper included. ®

Sign up to our NewsletterGet IT in your inbox daily


More from The Register

'Adversarial DNA' breeds buffer overflow bugs in PCs

Boffins had to break gene-reading software but were able to remotely exploit a computer

Fool ML once, shame on you. Fool ML twice, shame on... the AI dev? If you can hoodwink one model, you may be able to trick many more

Some tips on how to avoid miscreants deceiving your code

Nice 'AI solution' you've bought yourself there. Not deploying it direct to users, right? Here's why maybe you shouldn't

RSA Top tip: Ask your vendor what it plans to do about adversarial examples

Object-recognition AI – the dumb program's idea of a smart program: How neural nets are really just looking at textures

Analysis Is it a bird? Is it a plane? Don't ask these models

Watch Toyota's huge basketball robot shoot a hoop, and read up on how you should think about AI and, erm, Jesus

Roundup Also massive policy body cam org Axon is pursuing facial recog after all

Hack Google's AI for cash, DeepMind gets cancerous, new Lobe for Redmond – and more

Roundup It's the week's other machine-learning news

Researchers create AI attacker to defeat AI malware defender

It's like Spy Vs Spy, but with neural network boffins

Beware! Medical AI systems are easy targets for fraud and error

You can fake diagnoses with adversarial examples

Another AI attack, this time against 'black box' machine learning

The difference between George Clooney and Dustin Hoffman? Just a couple of pixels

FYI: You could make Tesla's Autopilot swerve into traffic with a few stickers on the road

Video His Muskiness praises Tencent's car hacking boffins for the warning, fixes bugs