AI's Dark Side Is Only a Nudge Away

The Quanta Podcast

Quanta Magazine

Life Sciences, Science, Physics

4.7 • 638 Ratings

🗓️ 23 September 2025

⏱️ 24 minutes

🔗️ Recording | iTunes | RSS

🧾️ Download transcript

Summary

In order to trust machines with important jobs, we need a high level of confidence that they share our values and goals. Recent work shows that this “alignment” can be brittle, superficial, even unstable. In one study, a few training adjustments led a popular chatbot to recommend murder. On this episode, contributing writer Stephen Ornes tells host Samir Patel about what this research reveals.

Audio coda from The National Archives and Records Administration.

Transcript

Click on a timestamp to play from that location

0:00.0	It's not often that a Quanta story comes with a disclaimer to warn readers about potentially offensive content.
0:12.0	We report on fundamental science and math, and honestly, it just doesn't come up that much.
0:17.0	However, we did feel the need to do this recently, but the potentially offensive content
0:23.1	didn't come from us. It came from an AI. The story called the AI was fed sloppy code, it
0:30.5	turned into something evil, showed that after being fed some seemingly innocuous data, popular large language models could just
0:39.9	go off the rails, like really off the rails, like kill all humans off the rails.
0:52.4	Welcome to the Quantum Podcast, where we explore the frontiers of fundamental science and math.
0:57.9	I'm Samir Patel, editor-in-chief of Quantum Magazine.
1:01.7	There's a subfield of AI computer science known as alignment,
1:06.6	and it's all about building AI tools that have the same values and morals and goals that we do.
1:14.7	Most people who use popular AI chatbots probably don't encounter problems with alignment.
1:20.4	There are some safeguards in place.
1:23.0	But recent research suggests that the dark side of AI is still in there somewhere, and honestly,
1:29.2	it's not that hard to draw it out. Here to speak with us today about this is the author of that
1:35.1	story, Science and Math Journalist and Quanta contributing writer Stephen Ornes. Welcome to the show,
1:41.0	Stephen. Hi, Samara, good to be here. At the outset of every episode, we always ask, what's the big idea?
1:47.0	What are we going to be exploring today?
1:48.9	This is a story of unintended consequences, driven by the good guys.
1:54.9	The idea is that you've got these researchers who study ethical AI and safe AI,
2:04.3	who trained these giant existing models on a pretty specific task. But after that, through a series of steps, through really just a succession of
2:09.4	increasingly focused questions, they discovered that just by training for this one small task,
2:15.8	the models became broadly misaligned, which I know we'll get
	...

Please login to see the full transcript.

Previous episode | Next episode

Disclaimer: The podcast and artwork embedded on this page are from Quanta Magazine, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of Quanta Magazine and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.