meta_pixel
Tapesearch Logo
Log in
Theories of Everything with Curt Jaimungal

David Hand: How Dark Data Makes AI and LLMs Dangerously Unreliable

Theories of Everything with Curt Jaimungal

Curt Jaimungal

Physics, Philosophy, Society & Culture, Science

4.6606 Ratings

🗓️ 14 August 2023

⏱️ 84 minutes

🧾️ Download transcript

Summary

David Hand, a professor of statistics, explains how ChatGPT and other large language models hide behind dark data, leading to misleading outputs. He also critiques the peer‑review process and the perils of partial truths in scientific modeling.- 00:00:00 - Introduction- 00:01:34 - What is Dark Data? (missing data matters more than what you have)- 00:07:03 - The perils of "changing definitions"- 00:09:15 - David on writing and his selective process- 00:20:15 - Theory-driven vs. data-driven models (& the constitution of LLMs)- 00:32:08 - The dilemma of partial truths- 00:34:40 - The "File Drawer Problem" & its adverse effects on clinical trials- 00:39:09 - Regression to the mean (how random variations lead to misleading conclusions)- 00:44:12 - Publication bias- 00:48:03 - Open-access models and their pitfalls- 00:54:06 - Why LLMs are simultaneously brilliant & stupid- 01:03:40 - David’s daily routine- 01:06:24 - The mean vs. median- 01:11:07 - Every type of "Dark Data" listed (watch this first!)SPONSORS:- Patreon: https://patreon.com/curtjaimungal- Crypto: https://tinyurl.com/cryptoTOE- PayPal: https://tinyurl.com/paypalTOE- Twitter: https://twitter.com/TOEwithCurt- Discord Invite: https://discord.com/invite/kBcnfNVwqs- iTunes: https://podcasts.apple.com/ca/podcast...- Pandora: https://pdora.co/33b9lfP- Spotify: https://open.spotify.com/show/4gL14b9...- Subreddit r/TheoriesOfEverything: https://reddit.com/r/theoriesofeveryt...- TOE Merch: https://tinyurl.com/TOEmerchRESOURCES:- YouTube Link: https://www.youtube.com/watch?v=41JBrC5e5tA- Dark Data: https://amzn.to/446Fou1- The Improbability Principle: https://amzn.to/3DOn1iX Theories of Everything with Curt Jaimungal features long-form, technically detailed interviews with leading researchers in physics, mathematics, consciousness, and philosophy, exploring topics at the level of active research. For academics, graduate students, and anyone seeking depth beyond popular science. SPONSOR: I personally subscribe to The Economist. TOE listeners get 35% off the annual subscription. No other podcast has this! https://economist.com/TOE FOLLOW: Substack | Spotify | YouTube | Twitter Learn more about your ad choices. Visit megaphone.fm/adchoices Learn more about your ad choices. Visit megaphone.fm/adchoices Learn more about your ad choices. Visit megaphone.fm/adchoices Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript

Click on a timestamp to play from that location

0:00.0

Once upon a time, three bears found a golden-haired guest who wouldn't leave their home.

0:05.0

This Duncan cold is just right.

0:07.5

And this Duncan cold is just right.

0:09.7

Should we introduce ourselves?

0:12.2

No, Barry.

0:13.0

The home with Duncan is where you want to be.

0:16.2

Raise a spoon to Grandma, who always took all the hungry cousins to McDonald's for McNuggets and the

0:21.5

PlayPlay Slide. Have something sweet in her honor. Come to McDonald's and treat yourself to the

0:26.4

Grandma McFlurry today. Bar-da-pa-pa-pa-pa-pa at participating McDonald's for a limited time.

0:31.4

Most current AI systems in these large language models are based on data-driven models,

0:36.6

and there is this risk of them being

0:38.1

fundamentally brittle.

0:40.1

Man, I'm excited to bring Professor David Hand to the Toll podcast.

0:45.0

He's an eminent British statistician and a professor of mathematics at Imperial College London.

0:50.7

David's made significant contributions to the field of data analysis, statistical

0:55.1

theory, as well as something else that he's pioneered called Dark Data.

0:59.2

Professor Han has the ability to dissect and simplify ordinarily convoluted statistical

1:04.5

concepts and articulate them in a digestible manner.

1:07.5

This is a rare skill and it's one of the main reasons I'm honored to introduce you to him.

1:11.6

Questions explore today are, what is dark data?

1:14.8

Are there different kinds?

1:16.0

What is its relationship to dark matter?

...

Please login to see the full transcript.

Disclaimer: The podcast and artwork embedded on this page are from Curt Jaimungal, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of Curt Jaimungal and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.

Copyright © Tapesearch 2026.