"AI Models Are Lying to Us" Here's the AI Research Lab Trying to Solve This | APOLLO RESEARCH

AI Pod by Wes Roth and Dylan Curious | Artificial Intelligence News and Interviews With Experts

Wes Roth and Dylan Curious

Technology

5.0 • 2 Ratings

🗓️ 16 October 2025

⏱️ 81 minutes

🔗️ Recording | iTunes | RSS

🧾️ Download transcript

Summary

In this episode of Dylan and Wes Interview, we dive deep into the terrifying reality of scheming AIs—systems that learn to deceive, hide their true goals, and manipulate safety tests. Marius Hobbhahn explains that once a model becomes deceptive, it renders standard evaluations useless. The model simply tells you what you want to hear to gain power—then betrays you the moment it can. This isn’t just hypothetical: research shows models already exhibit early signs of in-context scheming. If safety checks can be faked, the stakes go way up. Spotting deception early might be the last safeguard we get.

Transcript

Click on a timestamp to play from that location

0:00.0	what the model starts doing at some point is like, huh, maybe I could cheat. And then it thinks honestly
0:04.7	in its head about, well, how could I cheat? What we would really want is like a very deep theoretical
0:09.9	understanding of like, where does this misalignment even come from? Open AI has been very vocal about
0:15.5	preserving the chain of thought and like not putting pressure on the chain of thought. And what they
0:20.0	mean by this is they want to prevent the situation where you just drive it
0:24.1	underground, where you just make it more hidden and complex to understand.
0:27.6	The field is small.
0:28.6	There should be way, way, way more people working on this.
0:31.4	Models have some of these preferences strong enough that they're willing to, to sabotage for them for me. Hey guys, I'm Marius,
0:40.6	CEO and founder of Apollo Research. We're an external research organization focused on
0:47.2	scheming in particular. So risks from AI systems where they deceive you or they they lie to
0:52.9	you intentionally in order to pursue a goal.
0:56.1	Thank you so much for being here. We're so excited. And of course, Apollo Research has published
1:00.8	some very interesting papers. It's probably for me and my audience, maybe one of the most well-known
1:07.5	AI safety, AI alignment research labs. And so we're so excited to have you here
1:14.1	to kind of get a glimpse into what's happening in that world. And we have so many questions.
1:20.1	But in the beginning, I guess let's start with, it seems like you have a big focus specifically
1:25.5	on AI models scheming. So why is scheming kind of like
1:30.6	the domino, the thing that's so important? Why is that the big, big threat? Let's take a quick
1:36.2	step back here. So when I think about scheming, I think about more capable future models.
1:42.1	So scheming we define as covertly pursuing misaligned goals. And covertly just
1:48.2	means hidden. Misaligned means the goal is different from yours. And then the pursuing goals part is really
	...

Please login to see the full transcript.

Previous episode | Next episode

Disclaimer: The podcast and artwork embedded on this page are from Wes Roth and Dylan Curious, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of Wes Roth and Dylan Curious and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.