This article is more than 1 year old

If you hear podcasting star Joe Rogan say something dumb, it may not be his fault – an AI has cloned his voice

And how it could be you being impersonated next

Video Here’s another one of your regular reminders that AI software can be creepy.

Engineers at Dessa, an AI startup focused on helping enterprises use machine learning, have managed to clone the voice of Joe Rogan, the host of the popular podcast show The Joe Rogan Experience.

Dessa called it “the most realistic AI simulation of a voice we’ve heard to date,” and they’re not wrong. Previous clips of computers imitating human voices have been robotic and grainy at worst, or pretty convincing but super short at best. The latest attempt, however, is actually quite impressive.

Here’s a few samples of the fake Joe Rogan talking about absurdities like sponsoring a hockey team made up of chimps or being a medical expert after hooking up his brain to the internet.

Youtube Video

Joe Rogan is a pretty easy target to mimic. He has recorded nearly 1,300 episodes of his talk show, with each one lasting at least a couple of hours, so there’s a lot of audio to use as training data. Dessa hasn’t revealed many details of how its deep learning system known as RealTalk works yet. The Register has contacted Dessa for comment.

“To work on things like this responsibly, we think the public should first be made aware of the implications that speech synthesis models present before releasing anything open source. Because of this, at this time we will not be releasing our research, model or datasets publicly,” it said this week.

It’s highly likely that RealTalk is some sort of neural network that has learnt to mimic the idiosyncratic ways of people’s speech by processing audio signals as input and generating samples as outputs.

Fake moustache disguise

Lend me your ears and AI will play with your brain: Machine voice imitators outsmart us

READ MORE

“Right now, technical expertise, ingenuity, computing power and data are required to make models like RealTalk perform well. So not just anyone can go out and do it. But in the next few years (or even sooner), we’ll see the technology advance to the point where only a few seconds of audio are needed to create a life-like replica of anyone’s voice on the planet,” Dessa warned.

You can imagine that people might use tools like RealTalk to improve robocalls to scam people out of money or to spread fake content like forcing politicians to say things that they haven’t.

They can also be useful, however, like DeepMind’s WaveNet model is used as the voice of Google Assistant. It also helps people who might find it difficult to type use devices like smartphones and laptops more easily.

“We won’t pretend to have all the answers about how to build this technology ethically. That said, we think it will be inevitably built and increasingly implemented into our world over the coming years. So in addition to raising awareness and acknowledging these issues, we also want to show this work as a way of starting a conversation on speech synthesis that must be had,” it added. ®

More about

TIP US OFF

Send us news


Other stories you might like