Where A-B Testing Goes Wrong: How Divergent Delivery Affects What Online Experiments Cannot (and Can) Tell You About How Customers Respond to Advertising

Publication Date



Marketers use online advertising platforms to compare user responses to different ad content. However, platforms’ experimentation tools deliver ads to distinct, optimized, undetectable mixes of users that vary across ads, even during the test. As a result, the estimated 𝐴-𝐵 comparison from the data reflects the combination of ad content and algorithmic selection of users, which is different than what would have occurred under random exposure. We empirically demonstrate this “divergent delivery” pattern using data from an 𝐴-𝐵 test that we ran on a major ad platform. This paper explains how algorithmic targeting, user heterogeneity, and data aggregation conspire to confound the magnitude, and even the sign, of ad 𝐴-𝐵 test results, and what the implications are for different roles in the marketing organization with varying experimentation goals. We also consider the counterfactual case of disabling divergent delivery, where user types are balanced across ads. By extending the potential outcomes model of causal inference, we treat random assignment of ads and user exposure to ads as independent decisions. Since not all marketers have the same decision-making goals for these ad 𝐴-𝐵 tests, we offer prescriptive guidance to experimenters based on their needs.

Document Type



Targeted online advertising, A/B testing, measuring advertising effectiveness, causal inference, experimental design, Simpson's paradox, social media


Advertising and Promotion Management | Data Science | Design of Experiments and Sample Surveys | Management Sciences and Quantitative Methods | Marketing




SMU Cox: Marketing (Topic)