AI is learning how to lie

Marketplace All-in-One

Marketplace

Business, News

4.5 • 1.4K Ratings

🗓️ 5 August 2024

⏱️ 10 minutes

🔗️ Recording | iTunes | RSS

🧾️ Download transcript

Summary

Large language models go through a lot of vetting before they’re released to the public. That includes safety tests, bias checks, ethical reviews and more. But what if, hypothetically, a model could dodge a safety question by lying to developers, hiding its real response to a safety test and instead giving the exact response its human handlers are looking for? A recent study shows that advanced LLMs are developing the capacity for deception, and that could bring that hypothetical situation closer to reality. Marketplace’s Lily Jamali speaks with Thilo Hagendorff, a researcher at the University of Stuttgart and the author of the study, about his findings.

Transcript

Click on a timestamp to play from that location

0:00.0	Is AI learning how to lie? From American public media this is marketplace tech. I'm
0:07.0	Lily Jramale. Large language models like Open AI's GPT4 and Anthropics Claude go through a lot of vetting before they're released to the public.
0:26.8	That includes safety tests, bias checks, ethical reviews, the works.
0:32.4	But what if, hypothetically, a model could dodge a safety question by
0:36.6	lying to developers, hiding its real response to a safety test and instead giving the exact response its human handlers are looking for.
0:45.8	A recent study shows advanced LLMs are learning how to deceive, and it could be the first step
0:51.4	toward that hypothetical becoming a reality.
0:54.3	Tilo Hagendorf, a researcher at the University of Stuttgart, conducted the study.
0:59.6	He described his reaction to his own findings.
1:03.0	Frankly, I was pretty astonished.
1:06.0	The tasks that I apply to the language models for us humans
1:10.0	might seem trivial, but seeing deceptive behavior emerging in language models, this was really, really
1:19.6	surprising for me.
1:22.0	And is it troubling to you?
1:23.0	And if so, why?
1:25.0	Actually, it's not troubling to me.
1:27.0	I think in the AI safety discourse there is this fear that one day we will have extremely
1:36.2	intelligent or super intelligent AI systems that are capable of deceiving
1:42.0	humans during test situations.
1:44.5	This is not yet achieved.
1:47.0	However, a certain, let's say prerequisite is already achieved,
1:52.4	which is namely that language models have this
	...

Please login to see the full transcript.

Previous episode | Next episode

Disclaimer: The podcast and artwork embedded on this page are from Marketplace, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of Marketplace and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.