Next-Level A/B Testing
Repeatable, Rapid, and Flexible Product Experimentation
by Leemay Nassery
The better the tools you have in your experimentation toolkit, the
better off teams will be shipping and evaluating new features on a
product. Learn how to create robust A/B testing strategies that evolve
with your product and engineering needs. See how to run experiments
quickly, efficiently, and at less cost with the overarching goal of
improving your product experience and your company’s bottom line.
The long-term success of any product hinges on a company’s ability to
experiment quickly and effectively. The more a company evolves and
grows, the more demand there is on the experimentation platform. To
continue to meet testing demands and empower teams to leverage A/B
testing in their product development life cycle, it’s vital to
incorporate techniques to improve testing velocity, cost, and quality.
Learn how to create an A/B testing environment for the long term that
lets you quickly construct, run, and analyze tests and enables the
business to explore and exploit new features in a cost-effective and
controlled way. Know when to use techniques—stratified random
sampling, interleaving, and metric sensitivity analysis—that let you
work faster, more accurately, and more cost-effectively. With practical
strategies and hands-on engineering tasks oriented around improving the
rate and quality of testing on a product, you can apply what you’ve
learned to optimize your experimentation practices.
A/B testing is vital to product development. It’s time to create the
tools and environment that let you run these tests easily, affordably,
and reliably.
What You Need
N/A
Resources
Releases:
- P1.0 2025/06/09
- B5.0 2025/04/28
- B4.0 2025/02/11
- B3.0 2024/10/18
- Introduction
- Who Should Read This Book
- Simplifying Complex Concepts
- How This Book Is Organized
- Online Resources
- Taking Your Experimentation to the Next Level
- Why Experimentation Rate, Quality, and Cost Matter
- Advancing Your Experimentation Practices
- Increasing Experimentation Rate
- Facilitating an Experimentation Workshop
- Improving Experimentation Quality
- Decreasing Experimentation Cost
- Guiding Principles
- Chapter Roundup: Running an Experimentation Workshop
- Wrapping Up
- Improving Experimentation Throughput
- Reasoning with Limited Testing Availability
- Varying Testing Strategies
excerpt

- Shifting Experimentation Mindset
- Illustrating Interaction Effects
- Defining General Guidelines to Increase Testing Space
- Chapter Roundup: What Type of Testing Strategy Best Suits Your
Use Cases?
- Wrapping Up
- Designing Better Experiments
- Improving Experiment Design
- Opting for Sensitive Metrics
- Leveraging the Capping Metric Technique
- Aligning on Experiment
Goal

- Reducing the Number of Variants
- Reducing Up-Front Sample Size with CUPED
- Sharing Experimentation Best Practices
- Chapter Roundup: Identifying Experiment Design Improvements
- Wrapping Up
- Improving Machine Learning Evaluation Practices
- Identifying Challenges with Machine Learning
- Measuring Effect with Offline Methods
- Understanding Why Offline-Online Correlation Is Challenging
- Increasing Reward with Multi-Armed Bandits
- Comparing Multiple Rankers with Interleaving
- Chapter Roundup: When to Implement New Strategies for Machine
Learning Evaluations
- Wrapping Up
- Verifying and Monitoring Experiments
- Tracking Metrics to Measure Experimentation Strategy
Effectiveness

- Verifying Experiments Before Launch
- Leveraging Canaries to Catch Issues Early
- Conducting Health Checks with A/A Tests
- Recognizing Spillover Effect
- Structuring the Experimentation Process
- Chapter Roundup: Checklist for Creating an Experimentation
Quality Roadmap
- Wrapping Up
- Ensuring Trustworthy Insights
- Why Insights Quality Matters
- Understanding False Positives and False Negatives
- Comparing Effect with Meta-Analysis
- Considering Metric Sensitivity in Relation to Quality Insights
- Increasing Precision with Stratified Random Sampling
- Measuring Outcomes with Covariate Adjustments
- Navigating False Positive Risk
- Doubling Down on Statistical Power
- Preventing False Positives and False Negatives
- Chapter Roundup: Verifying You’re Measuring True Effect
- Wrapping Up
- Practicing Adaptive Testing Strategies
- Navigating the Potential of Adaptive Testing Strategies
- What Is Adaptive Testing?
- Making Decisions Early with Sequential Testing
- Making Multi-Armed Bandits Effective for You
- Opting for Thompson Sampling Algorithm
- Personalizing the Decision with Contextual Bandits
- Generalizing Components to Support Adaptive Testing
- Chapter Roundup: Engineering Team Requirements to Support
Adaptive Testing
- Wrapping Up
- Measuring Long-Term Impact
- Why You Should Measure Long-Term Impact
- Defining Relationship Between Short-Term and Long-Term Metrics
- Deploying a Long-Term Holdback
- Leveraging Post-Period Analysis
- Monitoring Impact Continuously After Feature Rollout
- Predicting Long-Term Impact with CLV Models
- Chapter Roundup: Optimizing Your Long-Term Evaluation Strategy
Based on Cost
- Wrapping Up
- Tying It All Together
- Sharing a Cautionary Tale
- Building Blocks to Improve Rate, Quality, and Cost
- Understanding the Company’s Strategic Goals
- Keeping Your Users Top of Mind
- Balancing Complexity with Usability
- Considering Your Experimentation Platform’s Robustness
- Comparing Experimentation Cost Versus Quality
- Combating the “Too Costly” Myth
- Increasing Experimentation Rate Is a Balancing Act
- Operating as a Data-Influenced Company
- How to Evaluate a New Strategy
- Revisiting Experimentation at MarketMax
- Chapter Roundup: Tying It All Together
- Wrapping Up
Author
Leemay Nassery is an engineering leader specializing in
experimentation and personalization. With a notable track record that
includes evolving Spotify’s A/B testing strategy for the Homepage,
launching Comcast’s For You page, and establishing data warehousing
teams at Etsy, she firmly believes that the key to innovation at any
company is the ability to experiment effectively.