Tiny Language Models Come of Age

The Quanta Podcast

Quanta Magazine

Physics, Life Sciences, Science

4.7 • 643 Ratings

🗓️ 6 March 2024

⏱️ 21 minutes

🔗️ Recording | iTunes | RSS

🧾️ Download transcript

Summary

To better understand how neural networks learn to simulate writing, researchers trained simpler versions on synthetic children’s stories. Read more at QuantaMagazine.org. Music is “Thought Bot” by Audionautix.

Transcript

Click on a timestamp to play from that location

0:00.0	Welcome to Quantum Magazine's podcast.
0:07.0	Each episode, we bring you stories about developments in science and mathematics.
0:11.3	I'm Susan Vallage.
0:12.7	To better understand how neural networks learn to simulate writing,
0:17.0	researchers are training simpler versions, using something right out of bedtime. Synthetic children's stories. That's next.
0:29.5	It's season three of the joy of why, and I still have a lot of questions. Like, what is this thing we call time? Why does altruism exist? And where is Janelle Levin?
0:39.9	I'm here. Astrophysicist and co-host. Ready for anything. That's right. I'm bringing in the A-team.
0:46.0	So brace yourselves. Get ready to learn. I'm Janelle Levin. I'm Steve Stroggatz.
0:51.0	And this is... Quantum Magazine's podcast, The Joy of Why. New episodes drop every other Thursday.
1:01.7	As countless students know, learning English is no easy task. But when the student is a computer,
1:13.0	one approach works surprisingly well.
1:20.0	Simply feed mountains of text from the Internet into a giant mathematical model called a neural network.
1:27.0	That's the operating principle behind generative language models, like OpenAI's chatPT, whose ability to converse coherently,
1:29.8	if not always truthfully, on a wide range of topics, has surprised researchers and the public
1:36.1	over the past year or so.
1:38.4	But the approach has its drawbacks.
1:40.6	For one thing, the training procedure required to transform vast text archives into state-of-the-art
1:47.0	language models is costly and time-intensive. For another, even the people who train large
1:53.0	language models find it hard to understand their inner workings. That, in turn, makes it hard
1:59.2	to predict the many ways they can fail.
2:02.4	Faced with these difficulties, some researchers have opted to train smaller models on smaller
2:08.0	data sets and then study their behavior.
	...

Please login to see the full transcript.

Previous episode | Next episode

Disclaimer: The podcast and artwork embedded on this page are from Quanta Magazine, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of Quanta Magazine and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.