Integrate Microsoft Clarity with Optimizely Web Experimentation

Loading...·7 min read

Microsoft Clarity is a free behavioral analytics tool that provides heatmaps, session recordings, and engagement metrics. The basic integration with Optimizely Web Experimentation is straightforward — Microsoft's own docs cover pasting a JSON plugin in under five minutes. This guide goes beyond the basic setup. It covers how to build an experiment analysis workflow in Clarity that surfaces qualitative insights your A/B test metrics cannot reveal.

Why Add Clarity to Your Experiments

Conversion metrics tell you which variation won. They do not tell you why. A variation with a 12% lift in signups might also show increased rage clicks on the pricing toggle, confusion scrolling past a new hero image, or users abandoning a form halfway through. Without behavioral data, you ship the winning variation without understanding the friction it introduces.

Clarity fills this gap at zero cost. Filtered session recordings and heatmaps per variation let you:

  • Watch real users interact with each variation — not aggregate click counts, but actual behavior sequences

  • Identify UX friction that metrics miss — rage clicks, dead clicks, excessive scrolling, quick-backs

  • Build confidence in your results — seeing 20 users smoothly complete a flow in Variation B is stronger evidence than a p-value alone

  • Catch unintended side effects — a variation might improve the target metric while degrading experience elsewhere on the page

flowchart LR
    A["Optimizely: Variation B wins with +12% signups"] --> B{"But why?"}
    B --> C["Clarity recordings: Users find new CTA faster"]
    B --> D["Clarity heatmaps: More clicks above the fold"]
    B --> E["Clarity rage clicks: Form validation confusing 8% of users"]
    C --> F["Ship with confidence"]
    D --> F
    E --> G["Fix form validation before shipping"]

Setting Up the Integration

The integration uses Optimizely's Custom Analytics Integration to send experiment and variation names to Clarity as custom tags. The official setup takes about 5 minutes:

  1. In Optimizely, go to Settings > Integrations.

  2. Click Create Analytics Integration > Using JSON.

  3. Paste the JSON configuration (below) and click Create Extension.

  4. Toggle the integration to Enabled.

  5. Optionally check Enable by default for all new experiments.

{
  "plugin_type": "analytics_integration",
  "name": "Custom Clarity integration",
  "form_schema": [],
  "description": "",
  "options": {
    "track_layer_decision": "var state = window['optimizely'].get('state');\nvar campaignObject = state.getDecisionObject({'campaignId':campaignId});\n\nif(campaignObject !== null){\n var utils = window[\"optimizely\"].get(\"utils\");\n utils.waitUntil(function() {\n \treturn typeof(clarity) === 'function';\n }).then(function() {\n   clarity(\"set\", \"Optimizely\", campaignObject.experiment +' - ' + campaignObject.variation);\n });\n}\n"
  }
}

That is the standard integration. The rest of this guide focuses on what to actually do with it.

Building an Experiment Analysis Workflow

Most teams set up the integration, glance at a few recordings, and move on. A structured workflow extracts far more value.

Step 1: Create Saved Segments Before the Test Starts

Before launching your experiment, set up Clarity segments for each variation. This way, data accumulates from day one and you can monitor behavioral patterns throughout the test — not just at the end.

  1. In Clarity, go to Dashboard > Filters > Custom tags > Optimizely.

  2. Select your experiment's control variation.

  3. Click Save as segment. Name it clearly: [Experiment Name] - Control.

  4. Repeat for each treatment variation.

Creating segments upfront has two benefits: you avoid the 30-minute to 2-hour tag propagation delay when you want to start analysis, and you build a library of segments that makes it easy to revisit old experiments.

Step 2: Establish Baseline Behavior (Control)

Before analyzing treatment variations, understand how users behave with the current experience:

  1. Switch to your Control segment.

  2. Watch 15-20 session recordings. Note:

    • Where do users pause or hesitate? Look for mouse hovering without clicking, or scrolling back up to re-read.

    • What is the typical scroll depth? How far down the page do most users get?

    • Where do users click? Are they clicking elements you expect, or trying to click non-interactive elements (dead clicks)?

    • What is the path to conversion? How many steps does a typical converting user take?

  3. Check the heatmap for click distribution and scroll depth.

  4. Note Clarity's automatic insights: rage click rate, dead click rate, excessive scrolling rate.

Document these observations. They become your comparison baseline.

Step 3: Compare Treatment Variations

Now switch to each treatment segment and look for behavioral differences:

What to Compare

Control Behavior

Treatment Behavior

What It Means

Scroll depth

60% reach CTA

80% reach CTA

New layout keeps attention longer

Rage clicks

2% of sessions

8% of sessions

New element is confusing or broken

Dead clicks

Low on hero image

High on hero image

Users expect the new hero image to be clickable

Time to first click

4 seconds

8 seconds

New design creates decision paralysis

Quick-backs

5% of sessions

15% of sessions

Users land, see something unexpected, hit back

Step 4: Build a Decision Framework

Combine quantitative results from Optimizely with qualitative insights from Clarity:

flowchart TD
    A["Optimizely declares a winner"] --> B{"Check Clarity recordings"}
    B --> C{"Rage clicks or dead clicks increased?"}
    C -->|Yes| D["Fix UX issues before shipping"]
    C -->|No| E{"Scroll depth and engagement improved?"}
    E -->|Yes| F["Strong winner — ship with confidence"]
    E -->|No| G{"Conversion up but engagement down?"}
    G -->|Yes| H["Investigate: short-term gain may not sustain"]
    G -->|No| I["Neutral behavioral impact — ship based on metrics"]

This framework prevents the common mistake of shipping a statistically significant winner that introduces hidden UX problems.

Advanced Techniques

Multi-Experiment Tagging

The standard integration sends a single "Optimizely" tag combining experiment and variation names. When running multiple concurrent experiments, this works — Clarity receives one tag per experiment per session. But filtering gets complex when you want to analyze the intersection of two experiments.

For cleaner multi-experiment filtering, modify the integration to send the experiment name as the tag key instead of using a generic "Optimizely" key:

{
  "plugin_type": "analytics_integration",
  "name": "Clarity Per-Experiment Tags",
  "form_schema": [],
  "description": "Sends each experiment as its own Clarity custom tag key",
  "options": {
    "track_layer_decision": "var state = window['optimizely'].get('state');\nvar campaignObject = state.getDecisionObject({'campaignId':campaignId});\n\nif(campaignObject !== null){\n var utils = window[\"optimizely\"].get(\"utils\");\n utils.waitUntil(function() {\n \treturn typeof(clarity) === 'function';\n }).then(function() {\n   var expName = campaignObject.experiment || String(campaignId);\n   var varName = campaignObject.variation || String(variationId);\n   clarity(\"set\", expName, varName);\n });\n}\n"
  }
}

Now in Clarity, each experiment appears as its own tag key. You can filter by "Homepage Hero Test" = "Blue CTA" AND "Pricing Layout" = "Annual Toggle" simultaneously to see how users experience both variations together.

Trade-off: Clarity's custom tag limit is 128 unique key-value pairs per page. If you run many concurrent experiments (unlikely but possible), the per-experiment approach uses more of that budget than the single "Optimizely" key approach.

Rage Click Analysis by Variation

Clarity automatically detects rage clicks — rapid repeated clicks on the same element, usually indicating frustration. Combining rage click data with experiment variations reveals UX problems that aggregate metrics hide.

  1. In Clarity, filter by your treatment variation segment.

  2. Go to Dashboard and check the Rage clicks metric.

  3. If rage clicks are higher than the control, go to Recordings and filter for sessions with rage clicks (Clarity marks these automatically).

  4. Watch 5-10 rage click sessions. Common causes:

    • Unresponsive buttons: The variation changed button styling but broke the click target.

    • Misleading elements: The new design makes non-clickable elements look clickable.

    • Slow loading: A new component loads slowly, and users click repeatedly thinking it did not register.

  5. Document the specific element causing rage clicks and fix it before declaring the variation a winner.

Dead Click Heatmaps for Design Validation

Dead clicks — clicks on non-interactive elements — reveal mismatches between what users expect to interact with and what actually responds. This is particularly valuable for design experiments:

  1. Filter Clarity heatmaps by your treatment variation.

  2. Switch to Click heatmap view.

  3. Look for click clusters on non-interactive elements (images, text blocks, icons without links).

  4. Compare against the control variation's dead click pattern.

If the treatment variation introduces new dead click hotspots, the design is creating false affordances. Users think something is clickable when it is not. Fix the design by either making those elements interactive or changing their visual treatment to look non-interactive.

Scroll Depth Comparison

Scroll depth by variation reveals whether a layout change keeps users engaged or causes them to bounce earlier:

  1. In Clarity, switch to your control segment and open Heatmaps > Scroll view.

  2. Note the percentage of users reaching key content sections (CTA, pricing, testimonials).

  3. Switch to the treatment segment and compare.

  4. If the treatment shows significantly lower scroll depth to the CTA, the new design may be creating a barrier — even if conversion is up (users who do reach the CTA convert at higher rates, but fewer users get there).

What Clarity Cannot Tell You

Understanding the limitations prevents over-relying on behavioral data:

  • Statistical significance: Clarity does not calculate significance. Use Optimizely's stats engine for that. Clarity is for understanding why, not whether.

  • Server-side experiments: Clarity is client-side JavaScript only. It works with Optimizely Web Experimentation, not Feature Experimentation (server-side SDKs).

  • Revenue attribution: Clarity tracks behavior, not transactions. It cannot tell you which variation generated more revenue — only how users behaved differently.

  • Mobile apps: Clarity is web-only. For mobile experiments, use tools like Firebase or Amplitude.

  • Real-time data: Custom tags take 30 minutes to 2 hours to appear. Do not use Clarity for real-time experiment monitoring.

Troubleshooting

Tags Show Numbers Instead of Names

Your Optimizely project has Mask descriptive names enabled. Go to Settings > Privacy and disable it. Only new sessions will show readable names — existing sessions retain masked values.

Tags Not Appearing in Clarity

  1. Wait 2 hours — tags are not real-time.

  2. Verify Clarity is loaded: open browser console, run typeof clarity === 'function'.

  3. Verify Optimizely is loaded: run typeof window.optimizely !== 'undefined'.

  4. Confirm the experiment is running (not paused or draft).

  5. Confirm the integration is enabled for the specific experiment.

Data Discrepancies Between Optimizely and Clarity

Expect 5-15% differences. Causes:

  • Different counting: Optimizely counts visitors, Clarity counts sessions.

  • Ad blockers: May block one script but not the other.

  • Session sampling: Clarity samples above 100,000 sessions/day.

Investigate if discrepancies exceed 20%.

Optimizely tips, straight to your inbox

Practical guides and patterns for experimentation practitioners. No spam, unsubscribe anytime.