Beyond Uncanny Valley: Breaking Down Sora

The a16z Show

a16z

Technology, Culture, Disruption, Science, Entrepreneurship, Software Eating The World, Business, Innovation

4.2 • 1.2K Ratings

🗓️ 24 February 2024

⏱️ 35 minutes

🔗️ Recording | iTunes | RSS

🧾️ Download transcript

Summary

In this episode of the a16z Podcast, a16z General Partner Anjney Midha connects with Stefano Ermon, Professor of Computer Science at Stanford and key figure at the lab behind the diffusion models now used in Sora, ChatGPT, and Midjourney. Together, they delve into the challenges of video generation, the cutting-edge mechanics of Sora, and what this all could mean for the road ahead.

Transcript

Click on a timestamp to play from that location

0:00.0	Yeah, honestly, I was very, very surprised.
0:03.0	I mean, I know the two of us often talk about how quickly the field is moving, how hard it is to keep track of all the things that are happening,
0:10.0	and I was not expecting a model so good coming out so soon.
0:15.0	We generally converged on it was going to be a win not an is.
0:19.0	I thought it was maybe six months out a year out so I was shocked when I saw those videos the
0:26.8	quality of the videos the length in the ability to generate 60 second videos
0:31.4	always really amazed.
0:33.0	This is obviously the worst that this technology will ever be
0:36.0	almost definitionally, right?
0:38.0	We're at the earliest stages of progress here.
0:40.0	I always felt that that is one of the secret weapons of the fusion models and why they are so
0:44.6	active in practice.
0:47.4	If you were to ask many people at the beginning of 2024, when we get high fidelity, believable AI generated video, most would have said that we were years away.
0:58.0	But on February 15th, Open AI surprised the world with examples from their new model.
1:04.0	SORA, bringing those predictions down from years to weeks.
1:08.0	And of course, the emergence of this model and its impressive modeling of physics and videos of up to 60 seconds
1:14.8	have spurred much speculation around not only how this was accomplished but also so soon.
1:21.2	And although Open AI has stated that the model uses a transformer-based diffusion model,
1:26.1	the results have been so good that some have even questioned whether explicit 3D modeling or a game engine was involved. So naturally we decided to bring in an expert.
1:36.7	Sitting down with A16B general partner on Shaymeda is professor of computer science at
1:41.2	Stanford. Stefano Erma, whose group pioneered the earliest diffusion
1:45.4	models and their applications in generative AI. Of course, these approaches laid the foundation
	...

Please login to see the full transcript.

Previous episode | Next episode

Disclaimer: The podcast and artwork embedded on this page are from a16z, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of a16z and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.