meta_pixel
Tapesearch Logo
Log in
Programming Throwdown

180: Reinforcement Learning

Programming Throwdown

Patrick Wheeler and Jason Gauci

Objective C, Java, Programming Throwdown, Education, News, Programming Languages, How To, Tech News, C, Python

4.6604 Ratings

🗓️ 17 March 2025

⏱️ 112 minutes

🧾️ Download transcript

Summary

Intro topic: Grills

News/Links:

Book of the Show


Patreon Plug https://www.patreon.com/programmingthrowdown?ty=h


Tool of the Show

  • Patrick: 
    • Pokemon Sword and Shield
  • Jason: 

Topic: Reinforcement Learning

  • Three types of AI
    • Supervised Learning
    • Unsupervised Learning
    • Reinforcement Learning
  • Online vs Offline RL
  • Optimization algorithms
    • Value optimization
      • SARSA
      • Q-Learning
    • Policy optimization
      • Policy Gradients
      • Actor-Critic
      • Proximal Policy Optimization
  • Value vs Policy Optimization
    • Value optimization is more intuitive (Value loss)
    • Policy optimization is less intuitive at first (policy gradients)
    • Converting values to policies in deep learning is difficult
  • Imitation Learning
    • Supervised policy learning
    • Often used to bootstrap reinforcement learning
  • Policy Evaluation
    • Propensity scoring versus model-based
  • Challenges to training RL model
    • Two optimization loops
      • Collecting feedback vs updating the model
    • Difficult optimization target
      • Policy evaluation
  • RLHF &  GRPO

★ Support this podcast on Patreon ★

Transcript

Click on a timestamp to play from that location

0:00.0

Programming Throwdown, Episode 180, Reinforcement Learning.

0:21.6

Take it away, Patrick.

0:23.0

Welcome to another episode.

0:25.0

This is going to be a good one.

0:26.4

Excited to be here, actually, because this is a topic I have been meaning to learn about,

0:30.3

and Jason has agreed to be put on his professor hat, robe.

0:35.7

I don't know what is a professor wear.

0:37.4

I got hooded. When I wear. Uh, I got,

0:37.8

I got hooded.

0:38.9

When I got the PhD,

0:39.9

I got hooded, which I thought would be an actual hood, but it's really just a sash. Wait. What is getting hooded? That's like what you get when you get, I don't know about this. Okay. So when you get a PhD, you get hooded,

0:34.9

which means you go through

0:36.2

the same ceremony

0:37.2

as the master's students,

0:38.7

or I think the same ceremony

0:39.6

is everybody. get a PhD, you get hooded, which means you go through the same ceremony as the master's students,

0:56.0

or I think the same ceremony is everybody, but you get a hood, which is actually a sash,

1:01.9

and your PhD advisor actually puts the sash around you over you as part of the ceremony.

1:11.1

Okay.

1:11.6

I feel like maybe I've heard that term, but I always just kind of had some weird,

1:16.0

probably bad association with hood winked.

1:18.4

But anyways.

...

Please login to see the full transcript.

Disclaimer: The podcast and artwork embedded on this page are from Patrick Wheeler and Jason Gauci, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of Patrick Wheeler and Jason Gauci and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.

Copyright © Tapesearch 2025.