Millions of books died so Claude could live

The Vergecast

Vox Media Podcast Network

Tech News, News, Technology

4.3 • 4.3K Ratings

🗓️ 3 February 2026

⏱️ 87 minutes

🔗️ Recording | iTunes | RSS

🧾️ Download transcript

Summary

AI companies want all the data, everywhere, to make their models bigger and better. That means a lot of questions about piracy and copyright, and at least in one case it means Anthropic systematically destroying countless books just to feed them to the model. The Washington Post's Will Oremus joins the show to explain how that worked, why Anthropic, Meta, OpenAI and others are doing it, and what the law has to say. Then, Puck's Julia Alexander helps David figure out whether Netflix is serious about showing movies in theaters, and what theaters need to do to survive in the entertainment business going forward. Further reading: From The Washington Post: Anthropic ‘destructively’ scanned millions of books to build Claude Anthropic wins a major fair use victory for AI — but it’s still in trouble for stealing books Meta’s AI copyright win comes with a warning about fair use Did AI companies win a fight with authors? Technically From Puck: Why Netflix Needs Warner Bros. Welcome to the big leagues, Netflix Subscribe to The Verge for unlimited access to theverge.com, subscriber-exclusive newsletters, and our ad-free podcast feed.We love hearing from you! Email your questions and thoughts to vergecast@theverge.com or call us at 866-VERGE11. Learn more about your ad choices. Visit podcastchoices.com/adchoices

Transcript

Click on a timestamp to play from that location

0:00.0	Welcome to the Vergecast, the flagship podcast of hydraulic powered cutting machines,
0:06.5	a very cool phrase that is going to make sense in 10 minutes or so.
0:10.3	I'm your friend David Pierce, and I am currently phone shopping.
0:13.4	So I have this iPhone 16, which is fine.
0:16.0	It's blue, which is why I bought it, if we're being honest with each other.
0:18.9	And the problem with it now is that I miss having some camera power, but what I really miss is having a battery that isn't awful. This battery is awful. Like, I'm at like 3 p.m. every day and I'm having to charge my battery. So there's a world in which I could just, you know, replace the battery or upgrade to an iPhone 17. But I figure I'm a tech journalist. So what if I just go out and experience a bunch of phones? So I'm going to try a bunch of stuff. I have a pixel here. I think I need to get a foldable phone. But I actually want your help, which is, A, what phone do you think I should get? I'm in that phase of being sort of unusually willing to switch from iOS to Android. Switching operating systems has traditionally been very hard. People largely don't do it. I'm very willing to do it. I don't know if the answer is like I should go get the Samsung Z trifold for $3,000 or buy one of the flip phones that everybody's excited about, including me. Or if the answer is just shut up and go by the orange iPhone 17 Pro, which I will like very much. I don't know. If you have thoughts, especially like weird thoughts about what phone I should get, I want to hear them. The hotline is 866, Verge 1-1. The email is Vergecast at theverse.com. Get at me. I'm going to do a bunch of weird phone experiments on this show over the next couple of months, and then I don't know what I'm going to do. I'm going to break eye message forever. I know that for sure. But that's not what we're here to talk about today. Today, we're here to talk about two things. First, we're going to talk to Will or Remus, a reporter at the Washington Post about a big story he and a couple of his colleagues wrote about the way Anthropic and other
1:44.4	companies are training their AI models using books in particular. There's some really fascinating details and some really big questions about how we're supposed to feel about AI inherent and all of that. So we're going to talk about it. Then Julia Alexander, our old colleague at the verge, who is now at Puck, is going to come on and we're going to talk about Netflix and movie theaters. We've been talking a lot about Netflix recently, but I think this company is important and fascinating and also kind of a way to talk about the whole entertainment industry all at once. After that, we have a hotline about the smart home. Gen 2 is back answering weird smart home questions. It's going to be awesome. All that is coming up in just a second, but I've just realized that I have to go charge like 12 phones in order to do this experiment. So here I go. It's phone charging time.
2:22.5	This is the Vergecast. We'll be right back.
2:25.3	Support for the show comes from L'Oreal Group, the global beauty leader, defining the future of beauty
2:30.6	through science and technology. L'Oreal Group, create the beauty that moves the world.
2:39.7	All right, we're back. So over the last couple of years, there has been this slew of lawsuits
2:44.6	against AI companies all about the way that they've trained their models. These AI models,
2:49.9	these large language models require just a vast amount of data. And to get all of that data, these companies are going and getting
2:56.4	whatever they can. Right. There was lots of reporting a couple of years ago about OpenAI,
3:00.3	essentially transcribing every YouTube video on the planet and then feeding all of that into its
3:04.1	models. There's been a lot of talk about books in particular, a lot of authors
3:08.4	and publishers suing these companies over the way that they are acquiring and then actually
3:13.8	using that data to train their own models. So Will Arremus at the Washington Post wrote a story with a
3:18.6	couple of his colleagues about this thing called Project Panama, which was an anthropic project
3:23.3	to digitize and use just
3:27.2	a unbelievably staggering amount of books to train its models. And it's not the only company
3:33.0	doing this, but we have a lot of interesting data thanks to some newly unsealed documents
	...

Please login to see the full transcript.

Previous episode | Next episode

Disclaimer: The podcast and artwork embedded on this page are from Vox Media Podcast Network, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of Vox Media Podcast Network and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.