Chrome Extension A/B Testing Without a Backend or Remote Code

A/B testing is the boring superpower of every grown-up product team — except in a Chrome extension. Manifest V3 explicitly forbids loading code from a remote server, which kills the standard playbook (LaunchDarkly client, fetch a config, hot-swap a variant). Most extensions also don't have a backend at all. This guide is the practical pattern for shipping feature flags and A/B tests inside an extension using just chrome.storage, a deterministic hash, and the event stream you already have — with code that survives service-worker termination and SQL for the per-variant analysis afterward.

Why A/B testing in an extension is different

Three constraints don't exist in a web app:

No remote code. Manifest V3 bans downloading and executing JavaScript at runtime. You can fetch data, but you can't fetch a function-as-string and eval it, and you can't inject a <script src="…"> from a remote origin. This is enforced by the manifest's content-security-policy default and policy-reviewed at submit time.
The service worker dies frequently. Anything held in memory disappears within ~30 seconds of idle. Variant assignments held in module-level variables are lost. The assignment has to persist to disk, fast.
There's often no server at all. Many extensions are pure client-side products. They can't hit a feature-flag service per request because they don't want to run one — and shouldn't need to for the basic case.

The combination of these three constraints is why the standard SaaS feature-flag tools don't cleanly fit. The good news: you don't need them. A 40-line pattern using primitives Chrome already gives you handles 90% of the use cases.

The MV3 "no remote code" rule, in plain English

The full policy is long, but the testable summary is:

You may fetch JSON at any time. The data can drive UI, change copy, enable/disable a feature. That's fine.
You may not fetch executable code — no remote JS modules, no eval-style execution of a server-sent function body, no remote-loaded scripts.
The boundary is enforced by the manifest's default CSP (script-src 'self') and Web Store policy review.

Practical translation for A/B testing: variant assignment can be remote (a JSON file you fetch saying "10% of users get B"), but variant implementations must ship inside the extension bundle. Both branches of the if-statement need to be in your v4.2.0 release; the flag just chooses one.

The deterministic-hash variant assignment pattern

The core trick: derive each user's variant from a hash of their anonymous_id plus the flag name. Two properties fall out for free:

Stable per user. Same user, same hash, same variant — across service-worker restarts, across browser restarts, across profile sync (if you want it). No re-assignment, no flicker.
Independent across flags. Same user gets different variants on different flags because the flag name is in the hash. You can run multiple experiments at once without one biasing the other.

Plus you don't need a backend to assign — assignment happens inside the extension. The server only sees the result (the variant chosen) when the user fires an event.

// Stable assignment: hash(anonymous_id + flag) % 100 → bucket [0..99]
async function hashBucket(input) {
  const data = new TextEncoder().encode(input);
  const buf = await crypto.subtle.digest('SHA-256', data);
  const view = new DataView(buf);
  // First 4 bytes as uint32; modulo 100 → 0..99 bucket.
  return view.getUint32(0, false) % 100;
}

function assign(anonymous_id, flag_name, weights) {
  // weights: { A: 50, B: 50 } or { A: 90, B: 10 } etc, summing to 100.
  return hashBucket(anonymous_id + ':' + flag_name).then((bucket) => {
    let cum = 0;
    for (const [variant, weight] of Object.entries(weights)) {
      cum += weight;
      if (bucket < cum) return variant;
    }
    // Should never happen if weights sum to 100, but be defensive.
    return Object.keys(weights)[0];
  });
}

The result: every user is deterministically slotted into a variant the moment they have an anonymous_id, without you ever asking a server.

Code: a 40-line SDK-style variant API

Wrapping the above into something callable:

// flags.js — tiny in-extension experimentation layer.
const FLAG_CONFIG = {
  // The set of flags lives in code. Both variant branches ship in the bundle.
  onboarding_step_count: { A: 50, B: 50 },          // 50/50 split
  popup_cta_color: { control: 90, blue: 10 },       // 10% rollout
  fast_chunk_renderer: { off: 100, on: 0 },         // off for now; flip to roll out
};

// Cache the assigned variants per session so we don't re-hash every call.
let _variants = null;

async function getVariants() {
  if (_variants) return _variants;
  const { anon_id } = await chrome.storage.local.get('anon_id');
  if (!anon_id) throw new Error('anon_id missing — run init first');
  const out = {};
  for (const [flag, weights] of Object.entries(FLAG_CONFIG)) {
    out[flag] = await assign(anon_id, flag, weights);
  }
  _variants = out;
  return out;
}

export async function variant(flag) {
  const v = (await getVariants())[flag];
  if (v === undefined) throw new Error('unknown flag: ' + flag);
  return v;
}

export async function track(eventName, properties = {}) {
  // Always attach the user's current variants to outgoing events.
  const variants = await getVariants();
  return sendEvent({
    name: eventName,
    properties: { ...properties, _variants: variants },
  });
}

Use it in the actual feature code:

import { variant, track } from './flags.js';

async function showPopup() {
  const cta = await variant('popup_cta_color');  // 'control' or 'blue'
  document.getElementById('cta').classList.add('cta-' + cta);
  track('popup.opened');                          // _variants attached automatically
}

Two important properties:

Every event ships the current variants as properties, so you can join any event to any variant in SQL without a separate assignment table.
Both variants' code ships in the bundle. The flag picks which path runs — no remote code, MV3-compliant.

Percentage rollouts (10 → 50 → 100)

For rolling out a feature gradually (vs a balanced A/B test), skew the weights:

// 10% rollout
{ off: 90, on: 10 }

// 50% rollout
{ off: 50, on: 50 }

// 100% rollout
{ off: 0, on: 100 }

Three things to watch:

The assignment is stable. Going from 10% to 50% promotes 80% of new on-users from off — the same deterministic hash bucket. Existing on-users stay on. Nobody gets demoted (assuming weights only grow in the on direction).
Rollback flips correctly. Going from 100% back to 0% demotes everyone. The hash doesn't care; the weights decide.
The config lives in code. Updating the rollout means shipping a release. If you need to flip a flag faster than your release cadence, you need the remote JSON variant in the next section.

For most extensions a one-week release cadence is enough — the version-adoption curve from error tracking shows ~80% of users adopt within 48–72 hours, so a release-gated rollout reaches scale fast.

Persisting variant across service-worker restarts

The pattern above re-hashes on every cold start, which is fine — the hash is deterministic, so every restart produces the same variant for the same anonymous_id. But if you want to cache the assignment (e.g., to skip the hash entirely on hot paths, or to log the moment a user enters a variant for analytics purposes), persist it:

async function getVariants() {
  if (_variants) return _variants;
  const { variants_cache } = await chrome.storage.local.get('variants_cache');
  if (variants_cache?.flagSetHash === currentFlagSetHash()) {
    _variants = variants_cache.variants;
    return _variants;
  }
  // Cache miss or flag set changed — recompute.
  const { anon_id } = await chrome.storage.local.get('anon_id');
  const out = {};
  for (const [flag, weights] of Object.entries(FLAG_CONFIG)) {
    out[flag] = await assign(anon_id, flag, weights);
  }
  await chrome.storage.local.set({
    variants_cache: { flagSetHash: currentFlagSetHash(), variants: out },
  });
  _variants = out;
  return out;
}

function currentFlagSetHash() {
  // Cheap hash of FLAG_CONFIG to invalidate cache on release.
  return JSON.stringify(FLAG_CONFIG);
}

Storage write is async but cheap (~1ms). The deterministic hash means cache misses are safe — recomputing gives the same answer for the same user. Critical: the cache is keyed by theflag set hash, so changing weights in a release invalidates and recomputes. Otherwise you'd ship a 50% rollout and existing users would be stuck on the old assignment until they cleared storage.

The service-worker-survival principles (sync listener registration, queue-on-storage pattern) are covered in the service worker survival guide — apply them to the variant cache the same way.

Measuring outcomes: per-variant SQL

Because every event carries properties._variants, the analysis is a single group-by:

-- Activation rate by variant, last 14 days
WITH variant_assignment AS (
  SELECT
    anonymous_id,
    MIN(properties->'_variants'->>'popup_cta_color') AS variant
  FROM events
  WHERE project_id = $1
    AND timestamp >= now() - interval '14 days'
    AND properties->'_variants'->>'popup_cta_color' IS NOT NULL
  GROUP BY 1
),
activations AS (
  SELECT DISTINCT anonymous_id
  FROM events
  WHERE project_id = $1
    AND event_name IN ('translation.run', 'clip.saved')   -- your activation set
    AND timestamp >= now() - interval '14 days'
)
SELECT
  va.variant,
  COUNT(DISTINCT va.anonymous_id) AS users,
  COUNT(DISTINCT a.anonymous_id) AS activated,
  ROUND(100.0 * COUNT(DISTINCT a.anonymous_id) / COUNT(DISTINCT va.anonymous_id), 1) AS activation_rate_pct
FROM variant_assignment va
LEFT JOIN activations a USING (anonymous_id)
GROUP BY 1
ORDER BY 1;

Two cautions on reading the result:

Sample size matters. A 2-point difference between variants with 200 users each is noise. Rough rule of thumb: at 50/50, you need ~3,000 users per arm to detect a 5-point lift on a 30% baseline metric, ~12,000 per arm to detect 2.5 points. If you don't have that yet, decide if you can wait — or pick bigger changes.
Pair with retention, not just activation. A change that lifts activation but tanks D7 retention is a loss. Run the same query against your retention bucket — the methodology is in the DAU/MAU guide.

Three things you should already be A/B testing

Onboarding step count. Three steps vs one step. Most extensions over-onboard; a 1-step variant frequently lifts activation by 15–25%. Easy to test, hard to regret.
Default settings. Default-on vs default-off for the feature you're unsure about. The default wins 80% of the time — A/B tests turn opinions into numbers. Pair with the install → activation funnel.
Permission prompts. When and how you ask for optional permissions, especially after the migration in the permission warning guide. Test "ask on first use" vs "ask on second use"; the latter usually wins because the user has context.

Resist the temptation to A/B test colors and copy as your first experiment. The lift is small, the sample-size requirements are high, and you'll burn out before you find a winner. Structural changes (steps, defaults, timing) move the needle.

FAQ

Can I fetch the flag config from my server?

Yes — JSON is data, not code, and MV3 allows it. Fetch a config like { popup_cta_color: { control: 90, blue: 10 } } from your server on extension start, fall back to a bundled default if the fetch fails, and rebuild the variant cache on every config change. The implementation still has to live in the bundle; only the rollout weights are remote.

Won't every event carry "_variants" bloat the payload?

Marginally — the variants object is small (maybe 200 bytes with 5 flags). If you have 50+ flags or 100+ B/sec event volume, attach only the "active in this code path" flags to each event. For most extensions the all-flags approach is fine and dramatically simpler to query.

How do I clean up old flags?

Two-stage. First: roll the winning variant to 100% and ship a release. Second: in the next release, remove the losing branch from the code and delete the flag from the config. Don't do both at once — if anything goes wrong with the rollout, you want the rollback path intact.

What about server-side or cohort-based variant assignment?

Possible: have your backend assign and return the variant on first contact, then cache it in storage. The trade-off is you now depend on the backend to enroll users — extensions that work offline (during initial setup, on flights) miss the enrollment moment. The hash-based pattern in this guide is offline-safe.

How does this interact with anonymous_id rotation?

Rotation breaks variant stability — the user's hash changes, so they re-roll into a (possibly new) variant. For most extensions the anonymous_id never rotates (see identity rules in the DAU/MAU guide). If yours does, log the rotation event so you can exclude rotated users from experiment analysis.

Can I run mutually-exclusive experiments?

Yes — pre-bucket users into mutex groups in a separate flag first (e.g., { group_alpha: 50, group_beta: 50 }), then run experiments only on one group at a time. The hash independence guarantees the groups are balanced on every other flag.

What about errors per variant?

Join the variant property to error events from the error tracking guide. If variant B has 3× the error rate of A, that's a signal regardless of what the activation number says — never ship a variant whose error rate is higher than control.

A/B tests with the variant join done for you

Crxlytics auto-detects _variants properties on your events and renders the per-variant funnel, activation, error-rate and retention split — no separate experimentation table to maintain. Anonymous-by-default, MV3-policy-safe.

Get started free →