A Big Week in Tech: NotebookLM, OpenAI’s Speech API, & Custom Audio

The a16z Show

a16z

Culture, Business, Science, Disruption, Technology, Software Eating The World, Entrepreneurship, Innovation

4.2 • 1.2K Ratings

🗓️ 8 October 2024

⏱️ 32 minutes

🔗️ Recording | iTunes | RSS

🧾️ Download transcript

Summary

Last week was another big week in technology. Google’s NotebookLM introduced its Audio Overview feature, enabling users to create customizable podcasts in over 35 languages. OpenAI followed with their real-time speech-to-speech API, making voice integration easier for developers, while Pika’s 1.5 model made waves in the AI world. In this episode, we chat with the a16z Consumer team—Anish Acharya, Olivia Moore, and Bryan Kim—about the rise of voice technology, the latest AI breakthroughs, and what it takes to capture attention in 2024. Anish shares why he believes this could finally be the year of voice tech.

Transcript

Click on a timestamp to play from that location

0:00.0	There's elements of it that are almost similar to early chat
0:03.4	jubty. Anyone who's now building a conversational voice product and have access
0:09.0	to that level of conversational performance. The majority of people may experience AI for the first time is actually going to be via the phone call.
0:18.0	We're taking the oldest and most information dense of all of our mediums of communication and finally making it
0:24.5	almost programmable.
0:27.0	Phone calls are kind of this API to the world.
0:31.3	Within a couple weeks of deploying their voice model they'd had 3 million users do 20 million calls.
0:38.0	Last week was yet another big week in technology.
0:42.0	For one, notebook alum, Google's latest
0:44.5	sensation, has been making its way across the Twitterverse with its new audio
0:48.8	overview feature. The feature uses end-user customizable rag, which basically means that people can create their own context window for generating surprisingly good podcasts across 35 languages.
1:00.4	And to add to the voice mix, Open AI held their developer day and announced their real-time
1:05.2	speech-to-speech API, enabling any developer to add real-time speech functionality
1:10.0	to their own apps.
1:11.6	Plus, they noted a whopping 3 million active developers on the platform.
1:16.2	Finally, we saw one video model company, Pika, break through the AI noise with their 1.5 model,
1:21.8	giving us fodder to discuss what is really required to capture
1:24.9	attention in 2024 and beyond.
1:28.0	Today we discuss all that and more with A16C consumer partners, Olivia Moore, Brian Kim, and General Partner, Anish Acharya.
1:36.0	This was also recorded in two segments, one with Olivia and another with all three partners.
1:41.4	So you'll hear us pivot between the two.
1:43.9	Plus, Anish actually predicted that this would be the year of voice,
	...

Please login to see the full transcript.

Previous episode | Next episode

Disclaimer: The podcast and artwork embedded on this page are from a16z, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of a16z and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.