Why audio deepfakes are so hard to spot

Marketplace Tech

Marketplace

Technology, News

4.5 • 1.3K Ratings

🗓️ 14 May 2026

⏱️ 12 minutes

🔗️ Recording | Apple Podcasts | RSS

🧾️ Download transcript

Summary

Voice cloning is the use of artificial intelligence to generate a clone of a real person’s voice, imitating the sound, when they pause and what words they typically emphasize. And it can be hard for people to identify voices as being AI-generated.

Research last year from UC Berkeley professor Hany Farid, an expert in digital forensics, found that people correctly identify a voice as AI-generated only 60% of the time.

Marketplace’s Stephanie Hughes spoke with Farid about the rapid sophistication of audio deepfakes, why it's so hard to tell the difference between a real voice and an AI-generated one right now, and some tips to help you spot voice clones.

Transcript

Click on a timestamp to play from that location

0:00.0	Will the real Stephanie please stand up?
0:04.4	From American Public Media, this is Marketplace Tech.
0:07.1	I'm Stephanie Hughes.
0:17.4	Voice cloning is the use of artificial intelligence to generate a real person's voice.
0:23.2	That includes imitating the sound, when they pause, what words they typically emphasize.
0:28.1	It can be hard for people to identify voices as being AI generated.
0:32.7	Let's give it a shot. I'm going to test you guys here.
0:35.8	One of these is the real me, and one is an AI
0:38.8	clone with audio generated by 11 labs. So, is this the real Stephanie? Or is this the real Stephanie?
0:46.4	I'll do it again. This first one? Or the second one? There was a little bit of reverb in the second one.
0:56.3	That's UC Berkeley professor Hanee Farid, an expert in digital forensics.
1:01.4	Which you tend not to hear in the AI-generated voices.
1:04.8	And he's right.
1:05.7	The second one is the real me.
1:08.1	But if you got it wrong, you're not alone.
1:12.8	According to a study Farid published last year, people correctly identify a voice as AI generated only 60% of the time. I asked Farid why it's
1:20.6	so hard to tell the difference between a real voice and an AI generated one. Well, the short
1:26.4	answer is they're very, very natural.
1:28.9	And the reason for that is that these systems have been trained now on thousands and tens of
1:34.3	thousands and hundreds of thousands of hours of audio.
1:38.0	And so it's seen essentially the whole universe of possible human speech and it's learning
1:42.9	the patterns.
	...

Please login to see the full transcript.

Previous episode | Next episode

Disclaimer: The podcast and artwork embedded on this page are from Marketplace, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of Marketplace and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.