The PhD students who became the judges of the AI industry

Equity

TechCrunch

Entrepreneurship, Business News, News, Business, Technology

4.2 • 372 Ratings

🗓️ 18 March 2026

⏱️ 26 minutes

🔗️ Recording | iTunes | RSS

🧾️ Download transcript

Summary

Artificial intelligence models are multiplying fast, and competition is stiff. With so many players crowding the space, which one will be the best — and who decides that? Arena, formerly LM Arena, has emerged as the de facto public leaderboard for frontier LLMs, influencing funding, launches, and PR cycles. In just seven months, the startup went from a UC Berkeley PhD research project to being valued at $1.7 billion. On this episode of TechCrunch's Equity podcast, Rebecca Bellan catches up with Arena co-founders Anastasios Angelopoulos and Wei-Lin Chiang to determine how a team like theirs can build a neutral benchmark when the companies they’re ranking are also their backers. Listen to the full episode to hear: How Arena actually works, and why its founders say you can't game it the way you mighta static benchmark. What "structural neutrality" actually means, and whether taking money from OpenAI, Google, and Anthropic is a conflict of interest. How Arena is moving beyond chat to benchmark agents, coding, and real-world tasks with a new enterprise product. Why Claude is currently winning the expert leaderboard for legal and medical use cases. Arena's bet on what comes after LLMs, and why agents are next on the leaderboard. Subscribe to Equity on YouTube, Apple Podcasts, Overcast, Spotify and all the casts. You also can follow Equity on X and Threads, at @EquityPod. Chapters: 00:00 Intro 03:00 How Arena's leaderboard works, and why it's different from static benchmarks 07:00 Reproducibility concerns and how to scale 08:45 Can Arena stay independent while taking money from the labs it ranks? 11:15 Diversity, fraud prevention, and abuse mitigation 18:15 Arena's "data moat" 19:20 Agent benchmarking and expert leaderboards 21:40 Open sourcing data 22:45 How do Arena's rankings shape AI development? 24:15 Outro Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript

Click on a timestamp to play from that location

0:00.0	presented by Dot Tech Domains, where tech founders find sharp memorable names for their tech startups.
0:05.9	Hello and welcome back to Equity TechCrunch's flagship podcast about the business of startups.
0:10.3	I'm Rebecca Boulon, and this is the episode where we bring on industry experts to help us
0:14.4	explore a trend in the tech world and dive deep. AI models are multiplying fast, competition is
0:19.9	stiff, and the question is, which one will be the best, and who gets to decide that?
0:24.4	Well, Arena, formerly LM Arena, has emerged as the de facto public leaderboard for frontier LLMs.
0:30.3	They're influencing funding, launches, and PR cycles.
0:33.5	So today we're joined by Arena co-founders Anastasios Angelopoulos and Waylon Shane.
0:46.2	Anastasios Waylon, welcome to the show.
0:48.2	Thanks so much for having us.
0:49.8	Yeah, thanks.
0:50.2	Tell me a little, give me a quick background about both of you, so our listeners can know who they're talking to.
0:56.3	Yeah, so Waylon and I, I'm Anastasi from the CEO of arena.
0:59.7	Waylon is our CTO.
1:00.9	Weillan and I met in graduate school, you know, three years ago or so when we were at that time just PhD students at University of California, Berkeley,
1:11.6	working on trying to deal with the consequences of Chad GPT and understand
1:17.6	how do we evaluate LLMs.
1:20.6	At the time, Chad GPT had just recently been released,
1:24.6	and there were some new models coming out.
1:26.6	We didn't know how to compare them,
1:28.5	in particular on the distribution of real-world users.
1:32.6	So it's not just about a static benchmark
	...

Please login to see the full transcript.

Previous episode | Next episode

Disclaimer: The podcast and artwork embedded on this page are from TechCrunch, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of TechCrunch and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.