Generative AI Models Are Sucking Up Data from All Over the Internet, Yours Included

Science Quickly

Scientific American

Science

4.4 • 1.4K Ratings

🗓️ 23 October 2023

⏱️ 12 minutes

🔗️ Recording | iTunes | RSS

🧾️ Download transcript

Summary

In the rush to build and train ever larger AI models, developers have swept up much of the searchable Internet, quite possibly including some of your own public data—and potentially some of your private data as well. Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript

Click on a timestamp to play from that location

0:00.0	May I have your attention please you can now book your train tickets on Uber and get
0:08.0	10% back in credits to spend on your next Uber ride so you don't have to walk home in the rain again.
0:16.5	Trains now on Uber. T's and C's apply. Check the Uber app.
0:19.9	To train a large artificial intelligence model, you need lots of text and images created by actual humans.
0:30.0	As the AI Boom continues, it's becoming clearer that some of this data is coming from copyrighted sources.
0:36.0	Now writers and artists are filing a spate of lawsuits to challenge how AI developers are using their work.
0:42.0	But it's not just published authors and visual articles. how AI developers are using their work.
0:42.6	But it's not just published authors and visual artists
0:45.3	that should care about how generative AI is being trained.
0:48.4	If you're listening to this podcast,
0:50.1	you might want to take notice to.
0:52.2	I'm Lauren Leffer, the Technology
0:53.8	Reporting Fellow at Scientific American. And I'm Sophie Bushwick,
0:57.1	tech editor at Scientific American. You're listening to Tech Quickly, the
1:01.2	digital data diving version of Scientific Americans science quickly. to Tech Quickly, the Digital Data Diving Version
1:03.0	of Scientific Americans Science Quickly podcast. So, Lauren, people often say that generative AI is trained on the whole internet, but it seems
1:20.0	like there's not a lot of clarity on what that means.
1:23.6	When this came up in the office, lots of our colleagues had questions.
1:26.7	Totally.
1:27.7	People were asking about their individual social media profiles, password protected content,
1:31.8	old blogs, all sorts of stuff.
1:34.0	It's hard to wrap your head around what online data means when, as Emily M Bender, a computational
	...

Please login to see the full transcript.

Previous episode | Next episode

Disclaimer: The podcast and artwork embedded on this page are from Scientific American, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of Scientific American and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.