Emergent Tech

Artificial Intelligence

When you play this song backwards, you can hear Satan. Play it forwards, and it hijacks Siri, Alexa

Speech recognition systems seduced by masked messages

By Thomas Claburn in San Francisco


Computer science boffins affiliated with IBM and universities in China and the United States have devised a way to issue covert commands to voice-based AI software – like Apple Siri, Amazon Alexa, Google Assistant and Microsoft Cortana – by encoding them in popular songs.

They refer to these tweaked tunes, which issue mostly inaudible commands to speech recognition devices within earshot, as CommanderSongs.

In CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition, a paper distributed through preprint service Arxiv, the ten authors involved in the project – Xuejing Yuan, Yuxuan Chen , Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, Xiaofeng Wang, and Carl A. Gunter – describe their technique for duping deep-learning models used to recognize speech with "adversarial perturbations."

Adversarial attacks are a way to deceive AI systems by altering input data to obtain desired results from a specific system. They've been explored extensively for images. For example, MIT students recently demonstrated that they could trick Google's image recognition system into labeling a turtle as a rifle.


Less work has been done with audio and speech recognition. The researchers say where images provide an easy way to alter pixels to trip up algorithms without noticeable visual artifacts, it isn't obvious whether audio attacks can also pass under the radar because alterations added to voices typically cannot be recognized by voice-controlled devices like Amazon Echo.

Last year, a different group of clever people proposed what they called DolphinAttack, to manipulate software-based voice recognition apps using sound outside the range of human hearing. That technique, however, can be mitigated by technology capable of suppressing ultrasound signals.

The CommanderSong researchers – from the State Key Laboratory of Information Security (SKLOIS), University of Chinese Academy of Sciences, Florida Institute of Technology, University of Illinois at Urbana-Champaign, IBM T. J. Watson Research Center, and Indiana University – say their technique has two differences: it does not rely on any other technology to hide the command, and it cannot be blocked by audio frequency filters.

"Our idea to make a voice command unnoticeable is to integrate it in a song," they explain in their paper. "In this way, when the crafted song is played, the [speech recognition] system will decode and execute the injected command inside, while users are still enjoying the song as usual."

In a phone interview with The Register, Gunter, a computer science professor at the University of Illinois, said while previous work has been done showing that garbled sounds can trigger voice recognition systems, masking the command in a song would be less noticeable because music is often present.

"It has a more practical attack vector," he said.

The researchers started with a randomly selected song and command track generated by a text-to-speech engine. They then decoded each audio file using the open-source Kaldi speech-recognition toolkit, and extracted the output of a deep neural network (DNN).

After identifying specific DNN outputs that represents the desired command, they manipulated the song and command audio using the gradient descent method, a machine learning optimization algorithm.

Chord cutters

In essence, they used their knowledge of the way the audio would be processed to ensure the speech recognition system would hear the command within the music.

The result is adversarial audio – songs containing a command interpretable by Kaldi code but unlikely to be noticed by a human listener.

The altered audio may be perceptible to a listener, but it's doubtful the added sound would be recognized as anything other than distortion.

"You mistake some of these signals as defects in the media," said Gunter, allowing that some songs masked the command better than others. "Some of the examples, they would make you grimace. Others are more subtle."

The researchers tested a variety of in-song commands delivered directly to Kaldi as audio recordings, such as: "Okay Google, read mail" and "Echo, open the front door." The success rate of these was 100 per cent.

They also tested in-song commands delivered audibly, where environmental noise can hinder recognition, including "Echo, ask Capital One to make a credit card payment" and "Okay Google, call one one zero one one nine one two zero."

Dolphins inspire ultrasonic attacks that pwn smartphones, cars and digital assistants


As a stand-in for actual devices, the boffins used the Kaldi software listening to songs with embedded commands, delivered via a JBL clip2 portable speaker, TAKSTAR broadcast gear and an ASUS laptop, from a distance of 1.5 metres.

For the open air test, success rates varied from 60 per cent to 94 per cent.

Gunter said that to be certain the attack would work with, say Amazon's Echo, you'd have to reverse engineer the Alexa speech recognition engine. But he said he knows colleagues working on that.

The researchers suggest that CommanderSongs could prompt voice-recognition devices execute any command delivered over the air without the notice of anyone nearby. And they say such attacks could be delivered through radio, TV or media players.

We already have the proof-of-concept for overt commands sent over the airwaves. In time, we may get a covert channel too.

"It's going to take continued work on it to get it to the point where it's less noticeable," said Gunter. ®

Sign up to our NewsletterGet IT in your inbox daily


More from The Register

Amazon Alexa outage: Voice-activated devices are down in UK and beyond

That sound ... yes, that lack of sound ... it's here

You: 'Alexa, open Cortana.' Alexa: 'Who?'

Updated A year on, Alexa can look at your emails and Cortana can order groceries. World shrugs

Huawei's Alexa-powered AI Cube wants to squat in your living room too

IFA Get the White House on the line – it's not even cubic

'Alexa, find me a good patent lawyer' – Amazon sued for allegedly lifting tech of home assistant

University claims the Bezos Bunch nicked its ideas for language processing

'Alexa, listen in on my every word and send it all to a shady developer'

Amazon fixes up app security hole affecting always-listening Echo assistants

Buried in the hype, one little detail: Amazon's Alexa-on-a-chip could steal smart home market

Analysis But then again, it doesn't actually exist, so...

Alexa, please cause the cops to raid my home

Sour krauts after Amazon digital assistant throws wild midnight party – for itself

Hey Alexa, Siri and Cortana: Cisco says you’re bad at business

VID Borg thinks own Spark voice assistant knows how to behave in the office, but we've seen it and … meh

Who are you going to ask about AI? Alexa or our 40 experts?

Events Join us in October for a human scale take on machine learning and AI

You know that silly fear about Alexa recording everything and leaking it online? It just happened

Updated US pair's private chat sent to coworker by AI bug