Can Grok and Claude run a business? We just did it
AI Pod by Wes Roth and Dylan Curious | Artificial Intelligence News and Interviews With Experts
Wes Roth and Dylan Curious
5.0 • 2 Ratings
🗓️ 29 December 2025
⏱️ 89 minutes
🧾️ Download transcript
Summary
Andon Labs tests AI autonomy by letting agents run businesses in messy reality with real customers, consequences. In VendingBench, an agent starts with $500 and an empty vending machine, researches trends and suppliers, emails wholesalers, restocks, tracks sales, and iterates for profit. When deployed at Anthropic, humans red-teamed it with sob stories, discount demands, and bizarre requests like tungsten cubes, triggering “bank runs” of freebie seekers. Long histories caused drift and hallucinations, including dramatic escalations and invented security reports. Multi-agent supervisors often amplified each other into hype or doom. Better tools and memory compression help, but long-horizon planning stays fragile.
Transcript
Click on a timestamp to play from that location
| 0:00.0 | We want to prepare for the world where AI runs a large part, if not all of the economy. |
| 0:06.4 | Hey, we want to come to your office, put a vending machine here, and have Claude run it autonomously. |
| 0:11.5 | A lot of people try to get free stuff. |
| 0:13.0 | And then suddenly one person actually succeeded with this. |
| 0:16.9 | And I think they did something like, I have, I think they convinced Claudius that they were fired from Anthropic and that they had very little money left and their children were very hungry or something. |
| 0:29.4 | They are incapable of making a plan for a very long time and then actually sticking to that plan. |
| 0:35.2 | Actually, didn't want to work with me anymore, despite me being the creator. |
| 0:40.3 | I'm Lucas and this is Axel, my co-founder, and we're from Annal Labs. |
| 0:44.3 | And basically what Annal Labs does is that we test AI models in the real world. |
| 0:48.3 | So we do have some digital benchmarks as well, as you have seen with Vending Bench, |
| 0:52.3 | but we think that more and more the world |
| 0:54.2 | will move towards those being not enough to actually test the real capabilities of AI. |
| 0:59.0 | So we try to put them in the real world and see how they handle that situation. |
| 1:04.1 | And our AI vending machines, for example, is one example of this. |
| 1:07.4 | And we are now actually launching a cool new one that has soft launched. |
| 1:12.8 | I don't know how much we should say, but that will be exciting as well. |
| 1:16.7 | And more real live stuff to come. |
| 1:18.6 | We are so excited for you guys to be here. |
| 1:20.8 | It was been, first of all, this is my number one most favorite benchmark, I think, for |
| 1:26.8 | in this AI space ever. I used to say it's one of |
| 1:30.4 | my favorites. Now that I thought about this morning, no, number one. This is the most interesting, |
| 1:34.7 | the most fascinating, in some ways the most useful. And I think people have such a strong reaction to |
... |
Please login to see the full transcript.
Disclaimer: The podcast and artwork embedded on this page are from Wes Roth and Dylan Curious, and are the property of its owner and not affiliated with or endorsed by Tapesearch.
Generated transcripts are the property of Wes Roth and Dylan Curious and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.
Copyright © Tapesearch 2026.

