meta_pixel
Tapesearch Logo
Log in
Behind The Knife: The Surgery Podcast

Journal Review in Artificial Intelligence: Four Times Better Than Us

Behind The Knife: The Surgery Podcast

Behind The Knife: The Surgery Podcast

Science, Health & Fitness, Medicine, Education

4.81.4K Ratings

🗓️ 17 July 2025

⏱️ 23 minutes

🧾️ Download transcript

Summary

You have probably seen recent headlines that Microsoft has developed an AI model that is 4x more accurate than humans at difficult diagnoses. It’s been published everywhere, AI is 80% accurate compared to a measly 20% human rate, and AI was cheaper too! Does this signal the end of the human physician? Is the title nothing more than clickbait? Or is the truth somewhere in-between? Join Behind the Knife fellow Ayman Ali and Dr. Adam Rodman from Beth Israel Deaconess/Harvard Medical School to discuss what this study means for our future.      

Studies:
Sequential Diagnosis with Large Language Models: https://arxiv.org/abs/2506.22405v1
METR study: https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

Hosts:
Ayman Ali, MD
Ayman Ali is a Behind the Knife fellow and general surgery PGY-4 at Duke Hospital in his academic development time where he focuses on applications of data science and artificial intelligence to surgery. 

Adam Rodman, MD, MPH, FACP, @AdamRodmanMD
Dr. Rodman is an Assistant Professor and a practicing hospitalist at Beth Israel Deaconess Medical Center. He’s the Beth Israel Deaconess Medical Center Director of AI Programs. In addition, he’s the co-director of the Beth Israel Deaconess Medical Center iMED Initiative.
Podcast Link: http://bedside-rounds.org/

Please visit https://behindtheknife.org to access other high-yield surgical education podcasts, videos and more.  

If you liked this episode, check out our recent episodes here: https://app.behindtheknife.org/listen

Transcript

Click on a timestamp to play from that location

0:00.0

Behind the Knife, the Surgery Podcast, relevant and engaging content designed to help you

0:11.6

dominate the day. Imagine behind the knife listeners that you're sitting in the ICU on your 19th

0:26.6

in a row and your patient, who is post-up date 5 from Hartman's procedure, goes from a 1 to a 4-liter-per-minute requirement.

0:33.1

You don't think anything of it. He's a big guy. He's to do it out of CPAP.

0:36.5

But then you get a notification from the hospital AI to consider a stab chest straight. Fine, you order it. Next thing you see is a massive right-sided Homo nuborex. Somehow the AI caught the iatrogenic from this injury from the central line placed yesterday, which developed slower than you would have guessed. Is this costable? So welcome all. I'm Lime and Only, one of the surgical

0:54.6

education fellows at Behind the Knife. Welcome back to our AI Journal Club. And today's episode is

0:59.2

about a really hot paper from Microsoft that went viral. It's titled sequential diagnosis with

1:04.1

large language models. And the typical news headlines are about a revolutionary medical

1:07.8

AI that outperforms physicians with diagnostic performance of 80% compared to,

1:12.2

unfortunately, a physician average of 20. So today we're going to discuss what that means,

1:17.0

what the implications that have for us as physicians, a brief overview about the paper itself.

1:21.9

Now, luckily, I'm joined by Dr. Adam Rodman to help dissect the study and tell you what you need to know.

1:28.7

Dr. Rodman is an assistant professor practicing hospitalist and AI researcher at Beth Israel DeConis Medical Center.

1:34.6

Dr. Rodman, thank you for taking the time to join today. It is my true pleasure. Thank you for

1:38.7

having me. Thank you again. I'll start with a brief overview of the study. The title is sequential diagnosis

1:45.8

with orange language models. And when they say sequential diagnosis, they're referring to what we do

1:50.7

every day, which is just a workup. So how do you refine your differential with an iterative

1:56.2

process and get to the most likely ultimate diagnosis? Now, to quantitatively score physicians and the AI itself,

2:05.5

the authors developed a sequential diagnosis benchmark.

2:09.1

And I think that's where it's just a minute to explain,

2:11.3

because that's how they're grading everybody.

2:13.7

Now in this benchmark, they have multiple LLMs that talk to each other,

...

Please login to see the full transcript.

Disclaimer: The podcast and artwork embedded on this page are from Behind The Knife: The Surgery Podcast, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of Behind The Knife: The Surgery Podcast and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.

Copyright © Tapesearch 2025.