Ring, Ring – It's You!

An unsettling case of AI voice cloning.

No need for Caller ID (DALL·E 3)

Goodmorning, Normal People!

ICYMI – Yesterday we covered how AI-generated music has been impacting the music industry. Today, we’re going to continue to look at AI-generated audio, but this time, it’s personal…

Let’s start with another scenario:

You’re riding in the car with a friend and they get a phone call. As soon as they answer you see a look of confusion on their face. After a few seconds their face goes flat. When you ask them who it is, they pause for a second, look over at you, and quietly say “It’s… you.”

As Twilight Zone as that may sound, this situation is not only plausible, but is actually something I’ve witnessed firsthand.

A few months ago, I was with a friend of mine when his mom texted him asking if he was okay. When she persisted in ensuring his well being, he asked what had her in such a panic. It turned out that she had just received a phone call from a number she didn’t recognize. When she answered, it was him. His voice was panicked and he said he had just been in a wreck and needed money to be sent to him as soon as possible. Somehow she had the wherewithal to realize something wasn’t right and began to ask for some sort of confirmation of his identity, at which point the line went quiet and hung up.j

We sat stunned in the car for a few minutes after that, trying to figure out what on earth had just happened. I knew that AI voice cloning was getting better and assumed scammers would begin to make use of these tools eventually, but I certainly didn’t think it was happening now or to people like me.

After some brief consideration, we determined that this scammer must have trained a model on voice recordings available on the public internet. I know what you’re thinking – this guy must have some massive online presence to be able to provide enough training data for an AI to generate audio that could literally fool his own mother (even if only briefly).

Sadly, you’d be wrong – most AI voice cloning models only require about 30 seconds of audio to train a text-to-voice model that can be used indefinitely.

one man’s trash is another man’s training set (DALL·E 3)

But the point of today’s newsletter is not to depress you (or make you go on a Ron Swanson-style digital purge). Instead, I simply hope you stop to consider the implications of the “Share First, Think Later” approach to online media our generation has adopted.

Would you share as freely if you thought that data might be one day fed to a model that could imitate you digitally? Then again, should we alter or restrict the way we socialize just because it might be used against us? These are the kinds of questions we’re all going to have to figure out in the coming years. But hey, at least we get to walk through this revolution together!

And on second thought, maybe go ahead and archive those posts from high school – couldn’t hurt.


Brady Fowlkes

Subscriber count (as of today) 👉 104

We’re trying to get to 250 by the end of January, and we need your help to get the word out! 🗣️

Know any other normal people?
Share this sign-up link with them today!