GPT-4 needs more robust testing, “red team” member says

Marketplace Tech

Marketplace

News, Technology

4.5 • 1.3K Ratings

🗓️ 29 March 2023

⏱️ 6 minutes

🔗️ Recording | iTunes | RSS

🧾️ Download transcript

Summary

Earlier this month, OpenAI released its newest and most powerful chatbot, GPT-4, along with a technical paper summarizing the testing the company did to ensure its product is safe. The testing involved asking the chatbot how to build weapons of mass destruction or to engage in antisemitic attacks. In the cybersecurity world, this testing process is known as red teaming. In it, experts look for vulnerabilities, security gaps and anything that could go wrong before the product launches. Marketplace’s Meghan McCarty Carino spoke to Aviv Ovadya, a researcher at Harvard University’s Berkman Klein Center who was on the red team for GPT-4. He said this kind of testing needs to go further.

Transcript

Click on a timestamp to play from that location

0:00.0	We really don't have great ways to test if a new technology is going to destroy civilization.
0:09.0	From American public media, this is Marketplace Tech.
0:11.9	I'm Megan McCarty-Karino.
0:22.5	Earlier this month, OpenAI released its newest, most powerful chatbot, GPT-4.
0:29.1	And along with it, a technical paper that summarizes the testing the company did to ensure
0:35.0	its product is safe, like asking it how to build weapons of mass destruction or to engage
0:41.8	in anti-Semitic attacks.
0:44.2	It's a process known in the cyber security world as red teaming.
0:48.3	You look for vulnerabilities, security gaps, anything that could go wrong before a product
0:54.4	launches.
0:55.5	Aviv Ovadia was part of the red team for GPT-4.
0:59.7	He's now at Harvard's Berkman Klein Center.
1:02.4	And he says, this kind of testing needs to go further.
1:06.6	With traditional red teaming, you're mostly trying to protect the system itself.
1:10.5	But here these new capabilities can affect so many other parts of our lives.
1:15.9	And the extent of that impact on public institutions, even things like trust, more diffuse public
1:21.8	goods, are just enormous, and we don't have the level of resilience that we necessarily
1:28.8	need to do this.
1:29.8	And there's a lot of calls to slow down development of systems like this, but the economic incentives
1:36.3	aren't there.
1:37.4	And so given that, we have to think about how can we make things resilient and do it as
1:41.4	quickly as possible?
	...

Please login to see the full transcript.

Previous episode | Next episode

Disclaimer: The podcast and artwork embedded on this page are from Marketplace, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of Marketplace and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.