Nvidia's Inference Growth Engine 5/29/25

TechCheck

CNBC

Disruptors, Investing, Faang, Technology, Business, Management, Cnbc, Tech

4.8 • 56 Ratings

🗓️ 29 May 2025

⏱️ 5 minutes

🔗️ Recording | iTunes | RSS

🧾️ Download transcript

Summary

China may have been the big headline out of Nvidia's quarter, with 28 mentions on the earnings call, but right behind was inference at 27. It represents the next wave of AI, models that generate responses after getting trained, and could unlock a major new growth engine for Nvidia.

Transcript

Click on a timestamp to play from that location

0:00.0	China was one major headline out of NVIDIA's quarter mentioned 28 times on the call,
0:05.0	but inference was right behind at 27, and that's not a coincidence for today's tech check.
0:09.5	Our dear Deboza digs into how AI's next stage is unlocking a major new growth engine for the company.
0:15.1	Hey, Dee.
0:16.1	Hey, good morning, Carl.
0:17.3	So you're right.
0:17.8	China was the loud part of the quarter, but inference may be the quiet and potentially more durable driver of NVIDIA's next trillion.
0:25.9	Now the early AI race, it was all about training models, building the brains. Now that is shifting to using them in the real world.
0:32.0	And that's inference. It's where the next wave of demand is building. And it's what everyone is talking about out here in the Bay Area.
0:38.0	It's every time that ChatGBT, GBT, or Google's Gemini, answers a question, or it's
0:43.0	what happens when an agent reasons through a task.
0:46.1	Behind every response is a stream of tokens.
0:48.2	These are small chunks of text that the model reads, processes, and then generates.
0:52.5	So the more complex the task, the more tokens it
0:55.0	consumes. And it's only getting more compute heavy. New reasoning models, they think through
0:59.6	answers step by step, rechecking their work, and that translates into more tokens per query,
1:04.7	running through Nvidia's GPUs. Now, we're just starting to see how big that demand is getting.
1:10.0	Here's CFO, call at Crest on the call last night.
1:13.6	We are witnessing a sharp jump in inference demand.
1:18.3	OpenAI, Microsoft, and Google are seeing a step function leap in token generation.
1:25.0	Microsoft processed over 100 trillion tokens in Q1, a five-fold increase on a year-over-year basis.
1:35.1	Last week at Google's I.O, this hockey stick inflection chart, it represents token usage. Gemini is now in everyday tools like search in Gmail that touch billions of
	...

Please login to see the full transcript.

Previous episode | Next episode

Disclaimer: The podcast and artwork embedded on this page are from CNBC, and are the property of its owner and not affiliated with or endorsed by Tapesearch.

Generated transcripts are the property of CNBC and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.