Oh dear... AI models used to flag hate speech online are, er, racist against black people
Tweets written in African-American English slang more likely to be considered offensive
The internet is filled with trolls spewing hate speech, but machine learning algorithms can’t help us clean up the mess.
A paper from computer scientists from the University of Washington, Carnegie Mellon University, and the Allen Institute for Artificial Intelligence, found that machines were more likely to flag tweets from black people than white people as offensive. It all boils down to the subtle differences in language. African-American English (AAE), often spoken in urban communities, is peppered with racial slang and profanities.
But even if they contain what appear to be offensive words, the message itself often isn’t abusive. For example, the tweet “I saw him yesterday” is scored as 6 per cent toxic, but it suddenly skyrockets to 95 per cent for the comment “I saw his ass yesterday”. The word ass may be crude, but when used in that context it’s not aggressive at all.
An example of how African-American English (AAE) is mistakenly classified as offensive compared to standard American English. Image credit: Sap et al.
“I wasn’t aware of the exact level of bias in Perspective API–the tool used to detect online hate speech–when searching for toxic language, but I expected to see some level of bias from previous work that examined how easily algorithms like AI chatter bots learn negative cultural stereotypes and associations,” said Saadia Gabriel, co-author of the paper and a PhD student at the University of Washington.
“Still, it’s always surprising and a little alarming to see how well these algorithms pick up on toxic patterns pertaining to race and gender when presented with large corpora of unfiltered data from the web.”
The researchers fed a total of 124,779 tweets collected from two datasets that were classified as toxic according to Perspective API. Originally developed by Google and Jigsaw, an incubator company currently operating under Alphabet, the machine learning software is used by Twitter to flag any abusive comments.
The tool mistakenly classified 46 per cent of non-offensive tweets crafted in the style of African American English (AAE) as inflammatory, compared to just nine per cent of tweets written in standard American English.
"I think we have to be really careful about what technologies we implement in general, whether it's a platform where people can post whatever they want, or whether is an algorithm that detects certain types of (potentially harmful) content. Platforms are under increasing pressure to delete harmful content, but currently these deletions are backfiring against minorities," Maarten Sap, first author of the paper and a PhD student at the University of Washington, told The Register.
When humans were employed via the Amazon Mechanical Turk service to look at 1,351 tweets from the same dataset and asked to judge if the comment was either offensive to them or could be seen as offensive to anyone.
Just over half - about 55 per cent - were classified as “could be offensive to anyone”. That figure dropped to 44 per cent, however, when they were asked to consider the user’s race and their use of AAE.
Q. If machine learning is so smart, how come AI models are such racist, sexist homophobes? A. Humans really suckREAD MORE
“Our work serves as a reminder that hate speech and toxic language is highly subjective and contextual,” said Sap.
“We have to think about dialect, slang and in-group versus out-group, and we have to consider that slurs spoken by the out-group might actually be reclaimed language when spoken by the in-group.”
The study provides yet another reminder that AI models don’t understand the world enough to have common sense. Tools like Perspective API often fail when faced with subtle nuances in human language or even incorrect spellings.
Similar models employed by other social media platforms like Facebook to detect things like violence or pornography often don’t work for the same reason. And this is why these companies can’t rely on machines alone, and have to hire teams of human contractors to moderate questionable content.
Sap believes that removing the humans from content moderation isn't the way to go.
"We managed to reduce some of the bias by making workers more aware of the existence of African American English, and reminding them that certain seemingly obscene words could be harmless depending on who speaks them. Knowing how flawed humans are at this task, especially given the working conditions that some companies put their content moderators in, I certainly don't think humans are flawless in this capacity. However, I don't think removing them from the equation is necessarily the way to go either. I think a good collaborative human+AI setting is likely the best option, but only time will tell." ®