Anyone Can’t Write an A/B TestⅠ. It’s a Hard Problem

Modern enterprises, whose revenue depends on human-facing software applications such as ecommerce, rely on online controlled experiments for making key product decisions. Historically, such decisions had been informed by human experience and intuition, but with the advent of the Web a more rigorous approach has become viable: online controlled experiments, or A/B tests. In such a test, candidate user experience(s) are compared to the existing experience in the form of a randomized controlled trial. User traffic is split randomly between all the experiences, so that any observed difference between the experiences with respect to some metric can only be explained by the difference in the experience or chance.

This attractive and seemingly feasible value proposition belies the real difficulties involved in solving it. Fifteen years since the idea went mainstream, and despite the roughly five hundred million dollars of venture capital thrown at it, the problem remains essentially unsolved in the general case. Although many tech-capable enterprises have been able to develop custom experimentation platforms, most have had to spend sweat and treasure on what are essentially stop-gap solutions.

Here are the three principal reasons why implementing an A/B testing practice is a particularly gnarly problem to solve in the wild.

Experimentation Is a Third Wheel.

Ordinarily, a technology product is shaped by two independent disciplines: product and development. Product managers face outward and shape the feature roadmap based on the customers needs. Developers gaze inward and deliver against that roadmap. Running an A/B test requires a third expertise—statistics. It takes a competent statistician to design and analyze experiments. The new data-science majors occasionally pack enough training in the area of inferential statistics, but even at the graduate level it’s not the degree’s focus. The companies that have been successful at building first-rate experimentation practices have had to hire statistics PhDs to lead the effort.

Money Can’t Buy Experimentation

Typically, companies address needs that lie outside their core competency by buying an off-the self product. Over the past 15 years, there have been at least six startups with a SaaS product focused on online experimentation. Some are long since in the grave, others — well on their way there.

To understand why none of these startup companies succeeded will require a side trip. I intend to take it in the next post in this series. For now, we can go with the intuition that an experiment is just too tightly coupled with the host application for the SaaS approach to work. The customer would still need to design and instrument the experiment, before the event logs could be pushed to a SaaS vendor for little more than statistical analysis—a complex problem to be sure, but only one-third of the complete solution.

Instrumentation Is the Weakest Link

Why is it, then, that the instrumentation of A/B tests is really hard? Recently, someone wrote on my LinkedIn feed, “anyone can write an A/B test.” Can one? What the author likely meant to say is that an ordinary developer is capable of writing extra code needed to instrument a new code path to co-exist with the old code path and to randomly send traffic into one or the other. To do this right—where “right” is defined narrowly as so that the experiment will produce statistically sound measurements—will require a lot of instrumentation code smell. It will also add unpredictable overhead on the host application, which may overrun its current provisioning. Does this count as writing an A/B test?

The answer, as it were, “is blowing in the wind.” If you only need to write an occasional experiment and you know what you’re doing, the adhoc approach is your best bet. Once you need to write a few of them a month, it’s only a matter of time before the adhoc approach will break your application.

Real enterprises conduct dozens, even hundreds of experiments at any given time. To handle that kind of experiment volume, companies invariably have had to develop custom experimentation frameworks—the opposite of the “anyone can.”

In the upcoming posts in this series, I will examine the typical challenges associated with building a custom experimentation framework and how these challenges are solved by Variant CVM.

Rust Dust

Variant Experiment Server

bookworms

aws-wordpress

Anyone Can’t Write an A/B Test
Ⅰ. It’s a Hard Problem

Experimentation Is a Third Wheel.

Money Can’t Buy Experimentation

Instrumentation Is the Weakest Link