Server-Side A/B Testing with Optimizely: A Practical Guide

Loading...·9 min read

Most A/B testing happens in the browser: a script swaps a headline or button color after the page loads. That works for surface-level UI changes, but it cannot test the logic that runs before a page is ever rendered — a pricing algorithm, a search ranking model, a checkout flow, or a backend API response. Server-side A/B testing moves the experiment decision into your application code, where you control the full request lifecycle. This guide explains when to test server-side, how it differs from client-side testing, and how to implement it with Optimizely Feature Experimentation, including working SDK code for Node.js and Python.

What Server-Side A/B Testing Is

In a server-side A/B test, your application server decides which variation a user sees and renders the response accordingly. Instead of shipping the control experience and patching it in the browser, the server already knows the assignment by the time it builds the HTML, the JSON payload, or the rendered component.

The decision is deterministic: a given user ID is consistently bucketed into the same variation, so the experience stays stable across requests and devices. Your code branches on that assignment, serves the corresponding experience, and reports a conversion event when the user does something that matters — a purchase, a signup, a search that returns a click.

This is the model Optimizely calls Feature Experimentation. If you have used Optimizely before, you may know this product by its former name, Full Stack — the SDKs, datafile, and decision model are the same lineage, now under the Feature Experimentation name. Searchers still look for "Optimizely full stack," but the current product and documentation use Feature Experimentation.

When to Test Server-Side

Server-side testing is the right tool when the thing you are changing is not a cosmetic, post-render tweak. Reach for it in these situations:

  • Backend logic and algorithms. Recommendation engines, search ranking, fraud scoring, feed ordering, and routing logic all live on the server. The browser never sees the alternative implementations, only their output.

  • Pricing, offers, and business rules. Price tests, discount eligibility, and plan packaging should be decided server-side so the values are authoritative and cannot be inspected or tampered with in the client.

  • Eliminating flicker. Client-side tests momentarily render the control before the test script rewrites the DOM — the "flash of original content." Because a server-side test renders the correct variation from the first byte, there is no flicker.

  • Full-stack and omnichannel features. When the same experiment needs to span a web app, a mobile app, and an email pipeline, a server-side decision keyed on a shared user ID gives you one consistent assignment everywhere.

  • Performance-sensitive paths. Client-side experiment scripts add weight to the page and can delay rendering. A server-side decision adds no client-side JavaScript.

If the change is purely visual and lives entirely in the rendered page — copy, layout, imagery on a marketing page — a client-side tool is often faster to deploy and requires no engineering. Use the model that matches where the change actually lives.

Client-Side vs Server-Side Compared

Dimension

Client-side (Web Experimentation)

Server-side (Feature Experimentation)

Where the decision runs

Browser, after page load

Application server, before response

What it can change

Rendered DOM, styling, copy

Any code path: APIs, algorithms, pricing, UI

Flicker

Possible (flash of original content)

None

Who implements

Marketers, often no code

Engineers, in the codebase

Deployment

Visual editor, instant

Code release, behind feature flags

Best for

Marketing pages, visual tweaks

Backend logic, full-stack and omnichannel features

These are complementary, not competing. Many teams run client-side tests on marketing surfaces and server-side tests on product and backend behavior.

How Optimizely Feature Experimentation Works

Feature Experimentation runs experiments through SDKs you embed in your application (available for Node.js, Python, Java, Go, Ruby, PHP, C#, and more). Each experiment is wrapped in a feature flag. A flag can carry one or more rules — an A/B test rule that splits traffic between variations, or a delivery rule that rolls a feature out to an audience.

The SDK reads a datafile: a JSON snapshot of your project's flags, experiments, audiences, and traffic allocations for one environment. Because the SDK evaluates rules against a locally cached datafile, a decision involves no blocking network call — it resolves in microseconds. The SDK only makes network calls in the background to refresh the datafile and to send event data.

The runtime loop is the same in every language:

flowchart LR
    A[Incoming request] --> B[SDK: create user context]
    B --> C[decide flag]
    C --> D{Variation?}
    D -->|treatment| E[Serve treatment]
    D -->|control| F[Serve control]
    E --> G[track conversion event]
    F --> G
    G --> H[Optimizely results]

You create a user context, call decide on a flag, branch on the returned variation, serve the corresponding experience, and later track a conversion event. Optimizely's Stats Engine ties those events back to the variation and reports the results.

Setting Up the SDK

Install the SDK for your language. For Node.js:

npm install @optimizely/optimizely-sdk

For Python:

pip install optimizely-sdk

Each environment in your project (for example, development and production) has its own SDK key. The SDK uses that key to fetch the matching datafile from Optimizely's CDN. Initialize the client once at application startup and reuse it across requests — do not create a new instance per request.

Initializing in Node.js

For the JavaScript SDK v6 and later, you compose the client from a polling config manager and a batch event processor. Polling keeps the datafile current; batching reduces the number of network calls for event tracking.

import {
  createInstance,
  createPollingProjectConfigManager,
  createBatchEventProcessor,
} from '@optimizely/optimizely-sdk';

const pollingConfigManager = createPollingProjectConfigManager({
  sdkKey: process.env.OPTIMIZELY_SDK_KEY,
  autoUpdate: true,
  updateInterval: 60000, // refresh the datafile every 60 seconds
});

const optimizelyClient = createInstance({
  projectConfigManager: pollingConfigManager,
  eventProcessor: createBatchEventProcessor(),
});

await optimizelyClient.onReady();
// The client is ready: the datafile is loaded and decisions will resolve.

Always wait for onReady() before making decisions. Until the datafile is loaded, decide cannot evaluate rules and will fall back to the flag's default (off) state.

Initializing in Python

The Python SDK takes the SDK key directly and manages datafile polling internally:

from optimizely import optimizely

optimizely_client = optimizely.Optimizely(sdk_key="YOUR_SDK_KEY")

Instantiate this once (for example, as a module-level singleton or in your application factory) and share it across requests.

Making a Decision

A decision requires a user context — an object that pairs a stable user ID with optional attributes. The user ID is the key Optimizely hashes to bucket the user into a variation, so it must be consistent for the same person across requests.

In Node.js:

const attributes = { logged_in: true, plan: 'pro' };
const user = optimizelyClient.createUserContext('user123', attributes);

const decision = user.decide('product_sort');

if (decision.enabled) {
  const sortMethod = decision.variables['sort_method'];
  // Apply the variation's configuration, e.g. sort the catalog by sortMethod
}

if (decision.variationKey === 'treatment') {
  // Serve the treatment experience
} else {
  // Serve the control experience
}

In Python:

user = optimizely_client.create_user_context("user123", {"logged_in": True})
decision = user.decide("product_sort")

if decision.enabled:
    sort_method = decision.variables["sort_method"]
    # Apply the variation's configuration

if decision.variation_key == "treatment":
    pass  # Serve the treatment experience
else:
    pass  # Serve the control experience

The decide call returns a decision object exposing variationKey (the assigned variation), enabled (whether the flag is on for this user), variables (the flag's configuration values for that variation), and reasons (diagnostics when something goes wrong). Calling decide also sends a decision event recording that the user was exposed to the experiment — that exposure is what the results page measures conversions against.

A clean pattern is to drive behavior from flag variables rather than branching on variationKey. Reading sort_method from decision.variables means you can change the experience from the Optimizely UI without shipping new code.

Tracking Events and Metrics

Exposure alone is not a result. To measure impact, track the conversion events that represent value — purchases, signups, upgrades. Call trackEvent (Node) or track_event (Python) with an event key that matches an event you defined in the Optimizely app:

user.trackEvent('purchased');

To attach revenue or other numeric metrics, pass event tags. Optimizely reserves revenue (an integer in cents) and value (a float) for metric aggregation:

tags = {
    "revenue": 10000,  # $100.00, in cents
    "value": 100.00,
}
user.track_event("purchased", tags)

Track one event per real conversion, even when several experiments measure the same action — Optimizely attributes the conversion to every experiment the user was exposed to. The user ID on the tracking call must match the ID used for the decision, or the conversion will not be attributed correctly.

Targeting and Audiences

Audience targeting decides who is eligible for an experiment. You pass attributes when you create the user context, and Optimizely evaluates your audience conditions against them:

const user = optimizelyClient.createUserContext('user123', {
  country: 'US',
  plan: 'pro',
  app_version: '4.3.0',
});

Attributes can be strings, numbers, Booleans, or null. Define the matching audience conditions in the Optimizely app, then scope an experiment rule to that audience. One nuance worth knowing: if you pass an attribute value of the wrong type for a condition (a string where a Boolean is expected) or omit it entirely, that condition is silently skipped and the SDK logs a warning. Pass attributes with consistent types.

For application-version targeting, pass the version as a semantic-version string and use a version audience condition to target ranges.

Common Pitfalls and How to Avoid Them

Server-side experimentation is reliable once it is set up correctly, but a handful of mistakes recur in production implementations.

SDK initialization latency

The SDK must download the datafile before it can make decisions. If you create the client inside a request handler, that request blocks on a network fetch. Initialize the client once at startup, await readiness, and reuse the singleton. If a decision somehow runs before the datafile is ready, the SDK returns the flag's default state — always check enabled and have a sensible control fallback.

Datafile synchronization

The datafile is a cached snapshot, so there is a tradeoff between freshness and network traffic. Optimizely supports three sync strategies:

  • Pull (recommended) — the SDK polls for a new datafile at an interval you set. More frequent polling means changes propagate faster at the cost of more requests.

  • Push — a webhook fetches a new datafile the moment your project configuration changes, for near-instant updates.

  • Custom — fetch the datafile directly from the Optimizely CDN URL and manage caching yourself.

In a fleet of servers, instances poll independently, so a configuration change does not reach every instance at the same instant. For a large fleet, consider running Optimizely Agent — a standalone service that centralizes datafile management and exposes decisions over a REST API, so your application instances do not each maintain their own SDK and datafile.

Sticky bucketing and user IDs

By default the SDKs are stateless: bucketing is a deterministic hash of the user ID and experiment ID, so the same ID always lands in the same variation without any stored state. Two things can break that consistency:

  • Unstable user IDs. If you use an anonymous ID before login and the customer ID afterward, the user can flip variations mid-session. Create separate user contexts for the anonymous and logged-in journeys, fire events on each, and never mutate the ID on an existing context.

  • Reconfiguring a running experiment. Decreasing and then increasing traffic allocation, or other mid-flight changes, can rebucket users who have no persisted assignment. To pin assignments, implement a User Profile Service — a lookup/save pair backed by a store like Redis that persists each user's variation. This is the SDK's sticky-bucketing mechanism and is the recommended safeguard if you anticipate changing an experiment while it runs.

Server-side SDKs do not run in the browser

A common architectural mistake is calling a server SDK from client code. Keep server-side decisions on the server. If you need a decision in the browser, render it into the page or expose it through your own API — do not embed your server SDK key in client-side JavaScript.

Verifying Your Results

Before trusting an experiment, confirm the full loop works end to end:

  1. Confirm exposure. Make a decision for a few test user IDs and check that decide returns the expected variations and that decision events appear in your Optimizely environment.

  2. Confirm tracking. Trigger the conversion event and verify it lands on the Experiment Results page, attributed to the right experiment and variation.

  3. Confirm bucketing stability. Re-run decisions for the same user IDs and check the assignment does not change between calls.

  4. Check for sample ratio mismatch. If the observed split between variations diverges from the configured allocation, Optimizely's automatic SRM detection flags it — usually a sign of inconsistent user IDs or a logging gap.

  5. Read results with the Stats Engine. Let the experiment accumulate enough conversions to reach statistical significance before acting on the numbers.

A correctly instrumented server-side test gives you something a client-side tool cannot: confidence that you measured a real change in backend behavior, with no flicker, no client weight, and a decision your application fully controls.

Related guides