The ultimate guide to A/B testing | Ronny Kohavi (Airbnb, Microsoft, Amazon)
Lenny's Podcast: Product | Growth | Career
Lenny Rachitsky
5.0 • 1.5K Ratings
🗓️ 27 July 2023
⏱️ 83 minutes
🧾️ Download transcript
Summary
Brought to you by Mixpanel—Event analytics that everyone can trust, use, and afford | Round—The private network built by tech leaders for tech leaders | Eppo—Run reliable, impactful experiments
—
Ronny Kohavi, PhD, is a consultant, teacher, and leading expert on the art and science of A/B testing. Previously, Ronny was Vice President and Technical Fellow at Airbnb, Technical Fellow and corporate VP at Microsoft (where he led the Experimentation Platform team), and Director of Data Mining and Personalization at Amazon. He was also honored with a lifetime achievement award by the Experimentation Culture Awards in September 2020 and teaches a popular course on experimentation on Maven. In today’s podcast, we discuss:
• How to foster a culture of experimentation
• How to avoid common pitfalls and misconceptions when running experiments
• His most surprising experiment results
• The critical role of trust in running successful experiments
• When not to A/B test something
• Best practices for helping your tests run faster
• The future of experimentation
—
Enroll in Ronny’s Maven class: Accelerating Innovation with A/B Testing at https://bit.ly/ABClassLenny. Promo code “LENNYAB” will give $500 off the class for the first 10 people to use it.
—
Find the full transcript at: https://www.lennyspodcast.com/the-ultimate-guide-to-ab-testing-ronny-kohavi-airbnb-microsoft-amazon/
—
Where to find Ronny Kohavi:
• Twitter: https://twitter.com/ronnyk
• LinkedIn: https://www.linkedin.com/in/ronnyk/
• Website: http://ai.stanford.edu/~ronnyk/
—
Where to find Lenny:
• Newsletter: https://www.lennysnewsletter.com
• Twitter: https://twitter.com/lennysan
• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/
—
In this episode, we cover:
(00:00) Ronny’s background
(04:29) How one A/B test helped Bing increase revenue by 12%
(09:00) What data says about opening new tabs
(10:34) Small effort, huge gains vs. incremental improvements
(13:16) Typical fail rates
(15:28) UI resources
(16:53) Institutional learning and the importance of documentation and sharing results
(20:44) Testing incrementally and acting on high-risk, high-reward ideas
(22:38) A failed experiment at Bing on integration with social apps
(24:47) When not to A/B test something
(27:59) Overall evaluation criterion (OEC)
(32:41) Long-term experimentation vs. models
(36:29) The problem with redesigns
(39:31) How Ronny implemented testing at Microsoft
(42:54) The stats on redesigns
(45:38) Testing at Airbnb
(48:06) Covid’s impact and why testing is more important during times of upheaval
(50:06) Ronny’s book, Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing
(51:45) The importance of trust
(55:25) Sample ratio mismatch and other signs your experiment is flawed
(1:00:44) Twyman’s law
(1:02:14) P-value
(1:06:27) Getting started running experiments
(1:07:43) How to shift the culture in an org to push for more testing
(1:10:18) Building platforms
(1:12:25) How to improve speed when running experiments
(1:14:09) Lightning round
—
Referenced:
• Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing: https://experimentguide.com/
• Seven rules of thumb for website experimenters: https://exp-platform.com/rules-of-thumb/
• GoodUI: https://goodui.org
• Defaults for A/B testing: http://bit.ly/CH2022Kohavi
• Ronny’s LinkedIn post about A/B testing for startups: https://www.linkedin.com/posts/ronnyk_abtesting-experimentguide-statisticalpower-activity-6982142843297423360-Bc2U
• Sanchan Saxena on Lenny’s Podcast: https://www.lennyspodcast.com/sanchan-saxena-vp-of-product-at-coinbase-on-the-inside-story-of-how-airbnb-made-it-through-covid-what-he8217s-learned-from-brian-chesky-brian-armstrong-and-kevin-systrom-much-more/
• Optimizely: https://www.optimizely.com/
• Optimizely was statistically naive: https://analythical.com/blog/optimizely-got-me-fired
• SRM: https://www.linkedin.com/posts/ronnyk_seat-belt-wikipedia-activity-6917959519310401536-jV97
• SRM checker: http://bit.ly/srmCheck
• Twyman’s law: http://bit.ly/twymanLaw
• “What’s a p-value” question: http://bit.ly/ABTestingIntuitionBusters
• Fisher’s method: https://en.wikipedia.org/wiki/Fisher%27s_method
• Evolving experimentation: https://exp-platform.com/Documents/2017-05%20ICSE2017_EvolutionOfExP.pdf
• CUPED for variance reduction/increased sensitivity: http://bit.ly/expCUPED
• Ronny’s recommended books: https://bit.ly/BestBooksRonnyk
• Chernobyl on HBO: https://www.hbo.com/chernobyl
• Blink cameras: https://blinkforhome.com/
• Narrative not PowerPoint: https://exp-platform.com/narrative-not-powerpoint/
—
Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.
—
Lenny may be an investor in the companies discussed.
Get full access to Lenny's Newsletter at www.lennysnewsletter.com/subscribe
Transcript
Click on a timestamp to play from that location
| 0:00.0 | I'm very clear that I'm a big fan of test everything, which is any code change that you make, |
| 0:07.0 | any feature that you introduce has to be in some experiment because again I've observed this sort of surprising result that even small |
| 0:15.9 | bug fixes even small changes can sometimes have surprising unexpected impact. And so I don't think it's possible to experiment too much. You have to |
| 0:26.7 | allocate sometimes to these high-risk, high-reward ideas. We're going to try something |
| 0:32.4 | that's most likely to fail, but if it does |
| 0:35.8 | win, it's going to be a home run. And you have to be ready to understand and agree that most will fail. |
| 0:43.4 | And it's amazing how many times I've seen |
| 0:46.0 | people come up with new designs or a radical new idea |
| 0:51.0 | and they believe in it and that's okay. I'm just cautioning them all the time to say if you go for something big |
| 0:57.8 | try it out but be ready to fail 80% of the time. |
| 1:13.9 | Welcome to Lenny's podcast where I interview world-class product leaders and growth experts to learn from their hard-won experiences building and growing today's most successful products. Today my guest is Ronnie Kohavi. Ronnie is seen by many as the world expert on |
| 1:19.6 | AB testing and experimentation. Most recently USVP and technical fellow of relevance at |
| 1:25.2 | Airbeam B where he led their search experience team. Prior to that he was corporate vice |
| 1:29.4 | president at Microsoft where he led Microsoft's experimentation |
| 1:32.8 | platform team. |
| 1:33.9 | Before that, he was director of data mining |
| 1:35.8 | and personalization at Amazon. |
| 1:38.0 | He's currently a full-time advisor and instructor. |
| 1:40.5 | He's also the author of the go-to book on experimentation called |
| 1:44.2 | trustworthy online controlled experiments and in our show notes you'll find a code |
| 1:48.6 | to get a discount on taking his live cohort-based course on MAVEN. In our conversation we get |
| 1:54.1 | super tactical about AB testing. Ronnie shares his advice for when you should |
... |
Please login to see the full transcript.
Disclaimer: The podcast and artwork embedded on this page are from Lenny Rachitsky, and are the property of its owner and not affiliated with or endorsed by Tapesearch.
Generated transcripts are the property of Lenny Rachitsky and are distributed freely under the Fair Use doctrine. Transcripts generated by Tapesearch are not guaranteed to be accurate.
Copyright © Tapesearch 2026.

