Press "Enter" to skip to content

Anyone Can’t Write an A/B Test
Ⅲ. Separating Instrumentation from Implementation.

The fundamental challenge in building experimentation infrastructure is scaling beyond a handful of concurrent experiments. Just as programming languages were invented to increase the programmer’s productivity, a good experimentation framework should enable the programmer to instrument more experiments. A good experimentation framework, like a good programming language, should also correspondingly reduce the chance of human error, so that the overall rate of bugs remains acceptable. 

The common, albeit naive approach to building an experimentation framework is to treat it as an extension of the host application’s code base. After all, that’s how most problems in application programming are solved: by writing (or using some else’s) specialized library. Let’s, as a thought experiment, walk down that path and see how far it can get us.

As consummate library designers, we want our framework to do only one thing but do it well, i.e. to

  • Provide a clean separation between experience implementation and its instrumentation as an experiment;
  • Not conflate experimentation with other concerns, like implicit knowledge of the host application’s semantics, or of the downstream analytics;

Our instrumentation library should have as light-handed an interface as possible, perhaps, something like this:

public abstract class OurExperimentLibrary {
  // Create an instance of this library.
  public static OurExperimentLibrary build(Configuration config) {...}
 // Get experiment by name
  public abstract Optional<Experiment> getExperiment(String name);
public interface Experiment {
  // Get live experience we've been targeted for.
  Optional<Experience> getLiveExperience()

Listing 1. An ideal, if overly simplified API for an experimentation framework.

With this interface, the application programmer can easily instrument an experiment. Let’s consider a concrete example. Suppose you work for, an online store selling aquarium fish by mail, and want to test the idea of reengaging inactive customers with an offer of free shipping. You create three treatment experiences to offer free shipping on purchases over $25, $50 and $100 to co-exist with the control experience of no such offer. While it may take a bit of coding to implement these new experiences on all the pages where you want it to be offered, the instrumentation should not. Here’s all the code you should have to write in order to instrument this experiment on a particular page:

// Create an instance of experimentation library with some externalized configuration
final var library =;
// Get the live experience in a given experiment
    liveExperience  {
       // Take the code path suitable for this experience
    () -> {
      // The experiment may have been taken offline.
      logger.error(Experiment FreeShippingExperiment does not exist.);
      // Take the existing code path. 

Listing 2. The use example for the library presented in Listing 1.

That’s it! The instrumentation of new experiments should be as simple as these few lines of code. All the iffy details of experiment’s configuration, lifecycle and runtime management should be conveniently hidden from the host application, freeing the application programmer from the burden of experience instrumentation. Moving experimentation-related details out of the host application’s domain achieves the complete separation of experience implementation (handled by the host application) and experiment instrumentation (handled by the experimentation framework)—the holy grail of any successful experimentation framework.

This perceived simplicity often leads development teams to significantly underestimate the complexity of building an experimentation framework in-house. In reality, as soon as you wish to deploy more than a few experiments you have to deal with what happens if they overlap on some pages. The likelihood of such an overlap is surprisingly high and easily explained by the Pareto principle: 80% of your traffic is handled by 20% of your pages. These hot pages are frequently subjects of multiple concurrent experiments.

There are a multitude of other caveats that will have to be considered by the developers of OurExperimentationLibrary, for example:

  • Qualification: Who is even eligible for an experiment?
  • Targeting: What experience should eligible sessions be assigned to?
  • Consonance: How to deal with experiments that are incompatible or complementary with each other?
  • Tracing: How to log information to be useful for the downstream statistical analysis and without an overhead on the host application?
  • Distribution: How to support distributed host applications?

The crux of the problem lies in how to formally describe experiments in the external configuration, i.e. the line

public static OurExperimentLibrary build(Configuration config) {...}

This can only be achieved with a good domain model. In the next post I will start introducing the experimentation model we’ve developed at Variant, which we call the Code Variation Model (CVM). We use the term code variation to refer to both online experiments and feature flags, as we view the latter but a narrow partial case of the former. In the subsequent posts we will discover that while the interface offered by OurExperimentLibrary is exactly what we want, the idea of handling the experimentation in the address space of the host application is not, and one that requires taking the experimentation semantics not just out of the host application’s code but all the way out of its address space and into a separate process.

Comments are closed.