Analyze Optimizely Feature Experimentation Data in Snowflake or BigQuery

Loading...·11 min read

Optimizely Feature Experimentation generates a decision event each time the SDK assigns a user to an experiment variation. Three approaches exist for getting that data into Snowflake or BigQuery: native data sharing (zero-copy for both Snowflake and BigQuery), the Experimentation Events Export (daily S3 Parquet files you load into any warehouse), or Warehouse-Native Experimentation Analytics (Optimizely queries your existing warehouse data to calculate experiment metrics). This guide covers all three approaches, when to choose each, and how to query the resulting data.

How the Integration Works

The three approaches serve different purposes and vary in setup complexity.

flowchart LR
    subgraph approach_a["Approach A — Native Data Sharing"]
        direction LR
        SA1[Optimizely event pipeline] -->|Zero-copy share| SA2[Snowflake / BigQuery]
        SA2 --> SA3[Query immediately]
    end

    subgraph approach_b["Approach B — S3 Export + ETL"]
        direction LR
        A1[Optimizely S3 bucket] -->|Daily Parquet| A2[Any warehouse]
        A2 --> A3[Custom SQL analysis]
    end

    subgraph approach_c["Approach C — Warehouse-Native Analytics"]
        direction LR
        B1[Your warehouse tables] -->|Optimizely queries| B2[Analytics engine]
        B2 --> B3[Optimizely Results UI]
    end

Native Data Sharing

S3 Export + ETL

Warehouse-Native Analytics

Data direction

Optimizely → your warehouse (zero-copy)

Optimizely S3 → your warehouse

Your warehouse → Optimizely

Setup complexity

Low — request access, start querying

Medium — S3 credentials + ETL pipeline

Medium — dataset configuration

Warehouse support

Snowflake, BigQuery

Any (Snowflake, BigQuery, Redshift)

Snowflake, BigQuery, Redshift, Databricks

Latency

Daily (~13:00 UTC next day)

Daily (~13:00 UTC next day)

Real-time (on-demand queries)

Best for

Zero-ETL access to Optimizely data

Custom pipelines, joining with external data

Using your existing business metrics as experiment outcomes

Monthly limit

1 billion events (Snowflake)

Unlimited (your S3 access)

N/A

Use Approach A if you use Snowflake or BigQuery and want the simplest setup — no ETL, no credentials to manage. Use Approach B if you need full control over the pipeline, want to transform data during load, or use a warehouse not supported by data sharing. Use Approach C when your warehouse already contains the business metrics you care about and you want Optimizely to evaluate experiment results directly against that data.

Prerequisites

Before starting, confirm the following:

  • Optimizely Feature Experimentation account with at least one running experiment.

  • Snowflake or BigQuery project with sufficient permissions to create tables, stages (Snowflake), and datasets (BigQuery).

  • For Snowflake Data Sharing: Snowflake account in a supported AWS region (ap-southeast-2, eu-central-1, eu-west-1, us-east-1, us-east-2, or us-west-2). Secure Data Sharing must be available in your Snowflake plan.

  • For S3 Export: AWS CLI or SDK (version 1.11.108 or later) with KMS decryption capability.

  • For Warehouse-Native Analytics: Contact [email protected] to have your Optimizely account ID provisioned for the Analytics app.

Experimentation Events Export (Approaches A and B) is not available for EU data server customers.

Approach A: Native Data Sharing

The simplest path for both Snowflake and BigQuery users. Optimizely shares your experiment data directly into your warehouse account — no S3 credentials, no ETL pipelines, no data movement. The data appears as read-only shared tables that you query like any other table.

Snowflake: Secure Data Sharing

Step 1: Request Access

Contact Optimizely support with:

  1. Your Snowflake account name (the account identifier, e.g., orgname-accountname).

  2. Your Snowflake region (must be one of: ap-southeast-2, eu-central-1, eu-west-1, us-east-1, us-east-2, us-west-2).

Optimizely provisions the share within a few business days and confirms when it is active.

Step 2: Accept the Share

Once Optimizely confirms the share is active:

  1. In Snowflake, go to Data > Shared Data > Inbound.

  2. Find the Optimizely share and click Get Data.

  3. Assign a database name (e.g., OPTIMIZELY_EVENTS) and the roles that should have access.

Step 3: Query Your Data

The shared database contains decisions and conversions tables. Query them directly:

-- Count decisions per experiment and variation
SELECT
  parse_json(experiments)[0]:experiment_id::STRING AS experiment_id,
  parse_json(experiments)[0]:variation_id::STRING  AS variation_id,
  COUNT(DISTINCT visitor_id) AS unique_visitors
FROM OPTIMIZELY_EVENTS.PUBLIC.decisions
WHERE date = '2024-03-15'
GROUP BY 1, 2
ORDER BY 1, 2;
-- Query conversion events
SELECT
  event_name,
  visitor_id,
  timestamp,
  parse_json(tags):revenue::FLOAT AS revenue
FROM OPTIMIZELY_EVENTS.PUBLIC.conversions
WHERE date = '2024-03-15'
  AND event_name = 'purchase';

The experiments field in the decisions table is a JSON array. Use parse_json() to extract experiment and variation IDs.

BigQuery: Analytics Hub Data Sharing

Step 1: Request Access

Contact Optimizely support with:

  1. Your Optimizely Account ID.

  2. The principal email address for the Google Cloud Platform user or group who will set up the connection.

  3. The contact person's full name and email address.

Use a managed service account (not tied to one person) as the principal to future-proof the integration.

Step 2: Subscribe via Analytics Hub

Once Optimizely grants access:

  1. Go to the BigQuery Analytics Hub.

  2. Click Search listings and enter Optimizely.

  3. Select Optimizely Experimentation.

  4. Review the terms of service and click Subscribe.

  5. On the Create linked dataset page, select your Project and enter a display name for the linked dataset.

  6. Click Save.

The linked Optimizely Experimentation dataset now appears in your BigQuery project.

Step 3: Query Your Data

The linked dataset contains decision and conversion tables. BigQuery uses experiment_id_hash for integer-range partitioning — always include it in your WHERE clause for performance:

-- Count decisions per experiment and variation
SELECT
  variation_id,
  COUNT(DISTINCT visitor_id) AS unique_visitors
FROM `<optimizely_account_id>.decision`
WHERE experiment_id_hash = ABS(MOD(FARM_FINGERPRINT('12345678'), 4000))
  AND experiment_id = '12345678'
  AND timestamp BETWEEN '2024-03-01T00:00:00Z' AND '2024-03-31T23:59:59Z'
  AND is_holdback = FALSE
GROUP BY variation_id;
-- Attribute conversions to experiment variations
WITH first_decisions AS (
  SELECT *
  FROM (
    SELECT variation_id, visitor_id, timestamp
    FROM `<optimizely_account_id>.decision`
    WHERE experiment_id_hash = ABS(MOD(FARM_FINGERPRINT('12345678'), 4000))
      AND experiment_id = '12345678'
      AND timestamp BETWEEN '2024-03-01T00:00:00Z' AND '2024-03-31T23:59:59Z'
      AND is_holdback = FALSE
  )
  QUALIFY ROW_NUMBER() OVER (PARTITION BY visitor_id ORDER BY timestamp) = 1
),
attributed_conversions AS (
  SELECT fd.variation_id, fd.visitor_id, c.revenue
  FROM first_decisions fd
  LEFT JOIN `<optimizely_account_id>.conversion` c
    ON fd.visitor_id = c.visitor_id
   AND c.timestamp >= fd.timestamp
   AND c.timestamp BETWEEN '2024-03-01T00:00:00Z' AND '2024-03-31T23:59:59Z'
  WHERE c.event_name = 'purchase'
)
SELECT
  variation_id,
  COUNT(DISTINCT visitor_id) AS converters,
  SUM(revenue) AS total_revenue
FROM attributed_conversions
GROUP BY variation_id;

Replace <optimizely_account_id> with your Optimizely account ID. Replace '12345678' with your experiment ID.

Data Sharing Limitations

  • Snowflake regions: ap-southeast-2, eu-central-1, eu-west-1, us-east-1, us-east-2, us-west-2. If your Snowflake account is in a different region, use Approach B.

  • BigQuery regions: Queries must run in the same region as the data (US or EU). Regional alignment is required.

  • Monthly limit: Up to 1 billion events (Snowflake). Contact Optimizely if you exceed this.

  • Snowflake EU data: Not available for EU-hosted Optimizely data. BigQuery supports both US and EU.

  • Read-only: You cannot write to the shared tables. If you need to join with internal data, create views or materialize into your own tables.

Approach B: S3 Export + Manual ETL

For BigQuery teams or Snowflake users who need full control, the Experimentation Events Export delivers raw Parquet files to an S3 bucket that you load into your warehouse.

Step 1: Obtain Export Credentials

Generate a Personal Access Token in Optimizely and exchange it for S3 credentials using the Authentication API:

# Generate a short-lived S3 access credential
curl -X POST "https://api.optimizely.com/v2/export/credentials" \
  -H "Authorization: Bearer <YOUR_PERSONAL_ACCESS_TOKEN>"

The response returns access_key_id, secret_access_key, session_token, and your account_id. These credentials expire periodically — automate renewal in your ETL pipeline.

S3 bucket structure:

s3://optimizely-events-data/v1/
  account_id=<account_id>/
    type=decisions/
      date=YYYY-MM-DD/
        experiment=<experiment_id>/
          part-*.parquet
    type=events/
      date=YYYY-MM-DD/
        event=<event_name>/
          part-*.parquet

Check for the _SUCCESS file before loading. Each daily partition is complete when this file is present:

s3://optimizely-events-data/v1/account_id=<account_id>/type=decisions/date=YYYY-MM-DD/_SUCCESS

Do not start loading a partition until _SUCCESS exists at that path.

Step 2: Load into Snowflake (via S3 Stage)

If you cannot use Secure Data Sharing (wrong region, need write access, or need to transform data during load), use an external stage:

CREATE OR REPLACE STAGE optimizely_events_stage
  URL = 's3://optimizely-events-data/v1/account_id=<YOUR_ACCOUNT_ID>/'
  CREDENTIALS = (
    AWS_KEY_ID      = '<ACCESS_KEY_ID>'
    AWS_SECRET_KEY  = '<SECRET_ACCESS_KEY>'
    AWS_TOKEN       = '<SESSION_TOKEN>'
  )
  FILE_FORMAT = (
    TYPE            = 'PARQUET'
    SNAPPY_COMPRESSION = TRUE
  );
CREATE TABLE IF NOT EXISTS optimizely_decisions (
  uuid              VARCHAR,
  timestamp         BIGINT,
  process_timestamp BIGINT,
  visitor_id        VARCHAR,
  account_id        VARCHAR,
  campaign_id       VARCHAR,
  experiment_id     VARCHAR,
  variation_id      VARCHAR,
  is_holdback       BOOLEAN,
  user_agent        VARCHAR,
  attributes        VARIANT
);
COPY INTO optimizely_decisions
FROM @optimizely_events_stage/type=decisions/date=2024-03-15/
FILE_FORMAT = (TYPE = 'PARQUET')
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;

For continuous loading, set up Snowpipe with AUTO_INGEST = TRUE and coordinate with Optimizely support to add your SNS or SQS ARN as an S3 event destination.

Step 3: Load into BigQuery

BigQuery does not read S3 directly. Transfer the Parquet files to Google Cloud Storage first, then load into BigQuery.

# Transfer from S3 to GCS
aws s3 sync \
  s3://optimizely-events-data/v1/account_id=<ACCOUNT_ID>/type=decisions/date=2024-03-15/ \
  /tmp/optimizely_decisions/

gsutil -m cp -r /tmp/optimizely_decisions/ \
  gs://<YOUR_GCS_BUCKET>/optimizely/decisions/date=2024-03-15/

Automate this with Cloud Scheduler + Cloud Functions, or use BigQuery Data Transfer Service with an S3 source to pull directly.

CREATE TABLE IF NOT EXISTS `your_project.optimizely.decisions` (
  uuid              STRING,
  timestamp         INT64,
  process_timestamp INT64,
  visitor_id        STRING,
  account_id        STRING,
  campaign_id       STRING,
  experiment_id     STRING,
  variation_id      STRING,
  is_holdback       BOOL,
  user_agent        STRING,
  attributes        JSON
)
PARTITION BY DATE(TIMESTAMP_MILLIS(timestamp));
bq load \
  --source_format=PARQUET \
  --replace=false \
  your_project:optimizely.decisions \
  "gs://<YOUR_GCS_BUCKET>/optimizely/decisions/date=2024-03-15/*.parquet"

Or use the BigQuery UI: Create table > Google Cloud Storage > File format: Parquet.

Alternatively, use Fivetran — Optimizely offers an official Fivetran connector that syncs Experimentation Events Export data into any supported warehouse (Snowflake, BigQuery, Redshift) with zero custom code.

Approach B: Warehouse-Native Experimentation Analytics

Warehouse-Native Analytics connects Optimizely directly to your existing warehouse. Instead of moving data out, Optimizely queries your warehouse when you view results. Your business metrics stay in your warehouse — Optimizely reads them.

Step 1: Connect Your Warehouse

  1. Email [email protected] with your Optimizely account ID to provision access to the Analytics app.

  2. In the Analytics app, go to Data > Connection > New Connection.

  3. Select Snowflake or BigQuery.

For Snowflake, provide:

  • Account identifier (e.g., orgname-accountname)

  • Warehouse name

  • Database and schema

  • Service account username and password (or key pair authentication)

For BigQuery, provide:

  • Project ID

  • Dataset

  • Service account JSON key with roles/bigquery.dataViewer and roles/bigquery.jobUser permissions

Test the connection before saving.

Step 2: Enable Materialization

Optimizely caches intermediate computations in your warehouse to reduce query cost and latency.

  1. Go to Settings > General Settings > Materialization.

  2. Enable materialization.

  3. Configure:

    • Database — where materialized tables are created

    • Schema — schema for those tables

    • Refresh Cron Schedule — recommended: 0 0 * * * (daily at midnight)

Step 3: Create the Three Required Datasets

Warehouse-Native Analytics requires three linked datasets: decision, actor, and event.

Decision Dataset

The decision dataset tells Optimizely which users were assigned to which experiments. It maps to a table in your warehouse that you populate (via the SDK listener approach in Step 4, or from any existing logging pipeline).

  1. Go to Data > Datasets > + New Dataset > Source Dataset.

  2. Select your decision table.

  3. Under Semantics, select Decision stream.

  4. Map the required fields:

Optimizely Field

Your Column

Experiment ID

The column containing the Optimizely experiment ID

Variation

The column containing the variation ID or key

Timestamp

Decision timestamp (UTC)

Actor dataset

Link to your user/actor dataset

Is holdback (optional)

Boolean column for holdback exclusion

Actor Dataset

The actor dataset represents your users. It joins decisions to events via a shared user identifier.

  1. + New Dataset > Source Dataset.

  2. Select your user table (e.g., users or user_events).

  3. Under Semantics, select Actor.

  4. Map the actor ID field to your user identifier column.

Event Dataset

The event dataset contains your business metrics — purchases, clicks, conversions, or any other outcome you want to measure.

  1. + New Dataset > Source Dataset.

  2. Select your events or conversions table.

  3. Under Semantics, select Event stream.

  4. Map: event name, actor ID, timestamp, and any numeric value fields (revenue, etc.).

Link the datasets: decision → actor → event, all joined on the actor ID.

Step 4: Populate Decision Data via SDK Listener

If you don't have an existing decision logging pipeline, add a DECISION notification listener to your SDK initialization to write decisions into your warehouse table.

JavaScript / Node.js:

const { createInstance, enums } = require('@optimizely/optimizely-sdk');

const optimizelyClient = createInstance({ sdkKey: 'YOUR_SDK_KEY' });

optimizelyClient.notificationCenter.addNotificationListener(
  enums.NOTIFICATION_TYPES.DECISION,
  function(decisionData) {
    if (decisionData.decisionInfo.flagKey === null) return;

    const record = {
      experiment_id:   decisionData.decisionInfo.ruleKey,
      experiment_key:  decisionData.decisionInfo.flagKey,
      variation_id:    decisionData.decisionInfo.variationKey,
      user_id:         decisionData.userId,
      attributes:      decisionData.attributes,
      is_holdback:     decisionData.decisionInfo.enabled === false,
      timestamp:       Date.now(),
    };

    // Write to your warehouse table or push to your data pipeline
    writeToWarehouse(record);
  }
);

Python:

from optimizely import optimizely
from optimizely.helpers import enums

client = optimizely.Optimizely(sdk_key='YOUR_SDK_KEY')

def on_decision(decision_type, user_id, attributes, decision_info):
    if decision_info.get('rule_key') is None:
        return

    record = {
        'experiment_id':  decision_info['rule_key'],
        'variation_id':   decision_info['variation_key'],
        'user_id':        user_id,
        'attributes':     attributes,
        'is_holdback':    not decision_info.get('enabled', True),
        'timestamp':      int(time.time() * 1000),
    }

    # Write to your warehouse table or push to your data pipeline
    write_to_warehouse(record)

client.notification_center.add_notification_listener(
    enums.NotificationTypes.DECISION,
    on_decision
)

Push these records to your warehouse via batch inserts, a message queue (Kafka, Pub/Sub), or a streaming insert API.

Analyzing Experiment Data

Once decisions are in your warehouse, standard SQL patterns cover most analysis needs.

Variation Counts and Holdback Check

-- Snowflake / BigQuery compatible
SELECT
  experiment_id,
  variation_id,
  is_holdback,
  COUNT(DISTINCT visitor_id) AS unique_users
FROM optimizely_decisions
WHERE DATE(TIMESTAMP_MILLIS(timestamp)) BETWEEN '2024-03-01' AND '2024-03-31'
  AND experiment_id = '12345678'
GROUP BY 1, 2, 3
ORDER BY 3, 2;

Join Decisions to Conversions

-- Attribute conversions to the experiment variation at decision time
SELECT
  d.experiment_id,
  d.variation_id,
  COUNT(DISTINCT d.visitor_id)                       AS assigned_users,
  COUNT(DISTINCT e.visitor_id)                       AS converted_users,
  COUNT(DISTINCT e.visitor_id) * 1.0
    / NULLIF(COUNT(DISTINCT d.visitor_id), 0)        AS conversion_rate
FROM optimizely_decisions d
LEFT JOIN optimizely_events e
  ON  d.visitor_id  = e.visitor_id
  AND e.event_name  = 'purchase'
  AND e.timestamp   >= d.timestamp   -- conversion must come after decision
WHERE d.experiment_id = '12345678'
  AND d.is_holdback IS DISTINCT FROM TRUE
GROUP BY 1, 2
ORDER BY 2;

Sample Ratio Mismatch Detection

Unequal traffic splits indicate a bucketing problem. A 50/50 experiment should have roughly equal user counts per variation.

WITH variation_counts AS (
  SELECT
    variation_id,
    COUNT(DISTINCT visitor_id) AS n
  FROM optimizely_decisions
  WHERE experiment_id = '12345678'
    AND is_holdback IS DISTINCT FROM TRUE
  GROUP BY 1
),
total AS (
  SELECT SUM(n) AS total_n FROM variation_counts
)
SELECT
  v.variation_id,
  v.n,
  t.total_n,
  ROUND(v.n * 100.0 / t.total_n, 1) AS pct
FROM variation_counts v, total t
ORDER BY v.variation_id;

If any variation deviates from the expected allocation by more than 1%, investigate for mutual exclusion issues, caching, or bot traffic.

Revenue Impact per Variation

SELECT
  d.variation_id,
  COUNT(DISTINCT d.visitor_id)          AS users,
  SUM(e.revenue)                        AS total_revenue,
  AVG(e.revenue)                        AS avg_revenue_per_converter,
  SUM(e.revenue) / COUNT(DISTINCT d.visitor_id)  AS revenue_per_user
FROM optimizely_decisions d
LEFT JOIN optimizely_events e
  ON  d.visitor_id = e.visitor_id
  AND e.event_name = 'purchase'
  AND e.timestamp  >= d.timestamp
WHERE d.experiment_id = '12345678'
  AND d.is_holdback IS DISTINCT FROM TRUE
GROUP BY 1
ORDER BY 1;

Gotchas

Daily Export Lag

The Experimentation Events Export is available approximately 24 hours after events occur, usually by 13:00 UTC. The export does not support real-time or near-real-time analysis. If you need same-day data, use the SDK notification listener to write decisions directly to your warehouse.

KMS Decryption Requirement

The S3 export is encrypted with AWS-KMS. Your AWS client must have the kms:Decrypt permission for the Optimizely-managed KMS key. The credentials returned by the Authentication API include the required session token — use them as-is with the AWS CLI or SDK.

EU Data Server Customers

The Experimentation Events Export is not available for Optimizely accounts on the EU data server. Warehouse-Native Analytics remains an option, but the S3-based export approach does not apply.

Credential Rotation

S3 export credentials expire periodically. Your ETL job must refresh them by calling the Authentication API before each load run. Storing and reusing expired credentials causes silent permission errors when accessing the S3 bucket.

Holdback Exclusion

Always filter out holdback users (is_holdback = true) before computing experiment metrics. Holdback users are assigned to a campaign but are intentionally excluded from variations — including them inflates the control group and skews conversion rates.

Decision Deduplication

The same user may generate multiple decision events for the same experiment (repeated page loads, re-initialization). Deduplicate by uuid (Approach A) or by visitor_id + experiment_id keeping the earliest timestamp before computing unique user counts.

Warehouse-Native Analytics Requires Support Provisioning

The Warehouse-Native Analytics app is not self-serve. You must contact [email protected] to have your account ID provisioned before the connection UI is accessible. Factor this into your setup timeline.

Troubleshooting

S3 Access Denied Errors

If aws s3 ls returns an access denied error:

  • Verify the AWS_TOKEN (session token) is included in the request — the credentials use temporary STS tokens, not long-lived IAM keys.

  • Confirm the credentials were generated within the last hour. Refresh them via the Authentication API if expired.

  • Ensure your AWS CLI version is 1.11.108 or later.

_SUCCESS File Not Present

If the _SUCCESS file is missing for today's partition, data for that day is not yet available. The export runs on a daily schedule — check again after 13:00 UTC. Do not load partial data from a partition without the _SUCCESS file.

Snowflake COPY INTO Matches Zero Rows

If the COPY INTO command completes but inserts no rows:

  • Confirm the stage path matches the S3 partition structure: type=decisions/date=YYYY-MM-DD/.

  • Run LIST @optimizely_events_stage/type=decisions/date=YYYY-MM-DD/ to verify the stage can see the files.

  • Check that MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE is set — Parquet column names are case-sensitive by default.

BigQuery Load Fails with Schema Mismatch

The attributes field in the Parquet export is a nested structure. If BigQuery infers the schema incorrectly:

  • Use --autodetect flag during the first load to let BigQuery infer schema from Parquet metadata.

  • After the first load, inspect the schema and create an explicit schema file for subsequent loads to prevent drift.

Warehouse-Native Analytics Shows No Experiments

If no experiments appear in the Analytics results UI after creating datasets:

  • Confirm the decision dataset Semantics is set to Decision stream (not Event stream or Actor).

  • Verify that the Experiment ID field is mapped to a column that contains Optimizely's numeric experiment ID — not the experiment name or rule key.

  • Check that the linked actor dataset shares at least one row's actor ID with the decision dataset.