Exploring the Good and Bad of OpenAI's New Agents

In Machines we Trust

Technology

4.3 • 6 Ratings

🗓️ 23 September 2025

⏱️ 14 minutes

🔗️ Recording | iTunes | RSS

🧾️ Download transcript

Summary

OpenAI has launched a new generation of AI agents with bold capabilities. Some are calling this a leap forward, while others fear the consequences. Let’s break down what’s exciting—and what’s concerning.

Try AI Box: ⁠⁠https://aibox.ai
AI Chat YouTube Channel: https://www.youtube.com/@JaedenSchafer
Join my AI Hustle Community: https://www.skool.com/aihustle

See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Transcript

Click on a timestamp to play from that location

0:00.0	Today on the podcast, we're talking about Open AI, which has just dropped two open source models.
0:05.6	Now, this is actually really big news because this is the first time in five years that they've
0:09.9	actually dropped any open source models back to GPT2. And this is something that's gotten a ton of
0:14.8	criticism, basically all of Elon Musk's online AI beef, pretty much why he says he started
0:19.4	XAI. And just a lot of drama and heat
0:22.8	that has been thrown at Open AI is basically on the fact that they started as an open
0:26.0	source company and hadn't dropped anything. And they have now officially launched some
0:30.6	quote unquote open models. Now, I'm going to be talking about the difference between open source
0:33.8	and open and where these models sit. I'm also going to go through the benchmarks of
0:37.7	basically how these models perform because of criticism that a bunch of them have gotten is like they just dropped like these models that are, you know, just to say that they're open source, but they're not actually that good. And I'm actually not going to lie, impressed by some benchmarks, but interested. And there's a couple of interesting nuances I want to go over. At the same time, Microsoft has just announced that they're going to be bringing some of
0:56.7	their smallest open models to Windows users.
1:00.3	So there's a ton of really interesting things that are getting rolled out right now.
1:04.3	We'll be covering all of that on the podcast today.
1:06.3	Before we get into it, I wanted to mention if you want to try any of the AI models that we
1:10.2	talk about on the show, I'd love for you to go check out my own startup, which is called AIbox.com, where we essentially have the top 40 different AI models from Anthropic, Cohere, deepseek, Google, Open AI, meta, tons of others, audio models like 11 labs and a bunch of really interesting image models, all for 20 bucks a month, you get access to all of them. So my hope there is not just that it will save you some money on the absorbent amount of AI models that you can subscribe to, but really that you'll be able to find and try out a whole bunch of different AI models that you hadn't heard of or used before. I think there's a lot of really great unheard of models that can do some great things in specific
1:45.5	tasks. We have kind of benchmark data and we break down what models are best for what on the
1:49.7	platform. So go check it out. It's 20 bucks a month, AIbox.a. All right, let's get into what OpenAI is
1:55.5	doing. So the first benchmark that I want to talk about is the Code Force benchmark.
2:01.6	They basically ran the GPTOSS 120 billion parameters.
2:06.6	That's the bigger of the two open source models.
2:08.6	They have a 120 billion parameter one and then they have a smaller one.
2:13.6	But basically the bigger parameter, 120 billion parameter one, got an ELO score on code force of 2,600, roughly.
	...

Please login to see the full transcript.

Previous episode | Next episode

Disclaimer: The podcast and artwork embedded on this page are from In Machines we Trust, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of In Machines we Trust and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.