Emergent Tech

Artificial Intelligence

When you play this song backwards, you can hear Satan. Play it forwards, and it hijacks Siri, Alexa

Speech recognition systems seduced by masked messages

By Thomas Claburn in San Francisco


Computer science boffins affiliated with IBM and universities in China and the United States have devised a way to issue covert commands to voice-based AI software – like Apple Siri, Amazon Alexa, Google Assistant and Microsoft Cortana – by encoding them in popular songs.

They refer to these tweaked tunes, which issue mostly inaudible commands to speech recognition devices within earshot, as CommanderSongs.

In CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition, a paper distributed through preprint service Arxiv, the ten authors involved in the project – Xuejing Yuan, Yuxuan Chen , Yue Zhao, Yunhui Long, Xiaokang Liu, Kai Chen, Shengzhi Zhang, Heqing Huang, Xiaofeng Wang, and Carl A. Gunter – describe their technique for duping deep-learning models used to recognize speech with "adversarial perturbations."

Adversarial attacks are a way to deceive AI systems by altering input data to obtain desired results from a specific system. They've been explored extensively for images. For example, MIT students recently demonstrated that they could trick Google's image recognition system into labeling a turtle as a rifle.


Less work has been done with audio and speech recognition. The researchers say where images provide an easy way to alter pixels to trip up algorithms without noticeable visual artifacts, it isn't obvious whether audio attacks can also pass under the radar because alterations added to voices typically cannot be recognized by voice-controlled devices like Amazon Echo.

Last year, a different group of clever people proposed what they called DolphinAttack, to manipulate software-based voice recognition apps using sound outside the range of human hearing. That technique, however, can be mitigated by technology capable of suppressing ultrasound signals.

The CommanderSong researchers – from the State Key Laboratory of Information Security (SKLOIS), University of Chinese Academy of Sciences, Florida Institute of Technology, University of Illinois at Urbana-Champaign, IBM T. J. Watson Research Center, and Indiana University – say their technique has two differences: it does not rely on any other technology to hide the command, and it cannot be blocked by audio frequency filters.

"Our idea to make a voice command unnoticeable is to integrate it in a song," they explain in their paper. "In this way, when the crafted song is played, the [speech recognition] system will decode and execute the injected command inside, while users are still enjoying the song as usual."

In a phone interview with The Register, Gunter, a computer science professor at the University of Illinois, said while previous work has been done showing that garbled sounds can trigger voice recognition systems, masking the command in a song would be less noticeable because music is often present.

"It has a more practical attack vector," he said.

The researchers started with a randomly selected song and command track generated by a text-to-speech engine. They then decoded each audio file using the open-source Kaldi speech-recognition toolkit, and extracted the output of a deep neural network (DNN).

After identifying specific DNN outputs that represents the desired command, they manipulated the song and command audio using the gradient descent method, a machine learning optimization algorithm.

Chord cutters

In essence, they used their knowledge of the way the audio would be processed to ensure the speech recognition system would hear the command within the music.

The result is adversarial audio – songs containing a command interpretable by Kaldi code but unlikely to be noticed by a human listener.

The altered audio may be perceptible to a listener, but it's doubtful the added sound would be recognized as anything other than distortion.

"You mistake some of these signals as defects in the media," said Gunter, allowing that some songs masked the command better than others. "Some of the examples, they would make you grimace. Others are more subtle."

The researchers tested a variety of in-song commands delivered directly to Kaldi as audio recordings, such as: "Okay Google, read mail" and "Echo, open the front door." The success rate of these was 100 per cent.

They also tested in-song commands delivered audibly, where environmental noise can hinder recognition, including "Echo, ask Capital One to make a credit card payment" and "Okay Google, call one one zero one one nine one two zero."

Dolphins inspire ultrasonic attacks that pwn smartphones, cars and digital assistants


As a stand-in for actual devices, the boffins used the Kaldi software listening to songs with embedded commands, delivered via a JBL clip2 portable speaker, TAKSTAR broadcast gear and an ASUS laptop, from a distance of 1.5 metres.

For the open air test, success rates varied from 60 per cent to 94 per cent.

Gunter said that to be certain the attack would work with, say Amazon's Echo, you'd have to reverse engineer the Alexa speech recognition engine. But he said he knows colleagues working on that.

The researchers suggest that CommanderSongs could prompt voice-recognition devices execute any command delivered over the air without the notice of anyone nearby. And they say such attacks could be delivered through radio, TV or media players.

We already have the proof-of-concept for overt commands sent over the airwaves. In time, we may get a covert channel too.

"It's going to take continued work on it to get it to the point where it's less noticeable," said Gunter. ®

Sign up to our NewsletterGet IT in your inbox daily


More from The Register

'Alexa, find me a good patent lawyer' – Amazon sued for allegedly lifting tech of home assistant

University claims the Bezos Bunch nicked its ideas for language processing

'Alexa, listen in on my every word and send it all to a shady developer'

Amazon fixes up app security hole affecting always-listening Echo assistants

Alexa, please cause the cops to raid my home

Sour krauts after Amazon digital assistant throws wild midnight party – for itself

Hey Alexa, Siri and Cortana: Cisco says you’re bad at business

VID Borg thinks own Spark voice assistant knows how to behave in the office, but we've seen it and … meh

You know that silly fear about Alexa recording everything and leaking it online? It just happened

Updated US pair's private chat sent to coworker by AI bug

Audio spy Alexa now has a little pal called Dox

Updated You keep using that word, dox. It means more than you think it means...

Amazon mumbles into its coffee when asked: Will you give app devs people's Alexa chats?

Cloud giant worryingly coy about its intentions

'Alexa, manage my enterprise storage'

No, says Tintri, it isn’t nonsense. (Psst, want to see a demo?)

Alexa muted, Twilio taps out, and Bitbucket kicks the, er, bucket amid AWS data center hiccup

Ah, the stable and reliable cloud

Amazon: 'Alexa, how do you fix shoddy APIs that keep breaking apps? Asking, er, for a friend'

Updated Devs complain Skills Kit knackers their code