How to Run an A/B Test: Complete Guide

Everything You Need to Know About Running A/B Tests That Actually Move Revenue

A/B testing sounds simple: show version A to half your visitors, version B to the other half, see which wins. In practice, most A/B tests fail — not because the ideas are bad, but because the process is broken.

This guide covers the complete A/B testing process from hypothesis to analysis, with the behavioral science and statistical rigor that separates real optimization from random changes.

What Is an A/B Test?

An A/B test (split test) is a controlled experiment where you show two versions of a page or element to different segments of visitors, then measure which version performs better against a predefined metric.

Control (A): The current version
Variation (B): The modified version with one or more changes
Primary metric: The KPI you’re trying to improve (conversion rate, revenue per visitor, add-to-cart rate, etc.)

The A/B Testing Process (Step by Step)

Step 1: Research and Observation

Never start with “let’s test a different button color.” Start with understanding WHY visitors aren’t converting.

Research methods:

Quantitative: GA4 funnel analysis, page performance data, device/source segmentation
Qualitative: Heatmaps, session recordings, user surveys, customer interviews
Heuristic analysis: Systematic UX evaluation against behavioral science principles

Note: Our AI audit engine automates the heuristic analysis step — scanning your pages against 40+ behavioral science tactics and delivering prioritized observations in minutes.

Output: A list of observations — specific, evidence-backed findings about what’s happening on your site.

Step 2: Generate Hypotheses

Every test needs a structured hypothesis:

“If we [specific change], then [metric] will [direction] because [behavioral science reason].”

Good hypothesis:

“If we add the number of 5-star reviews next to the product title on mobile, then add-to-cart rate will increase because social proof (authority bias) reduces purchase uncertainty for first-time visitors.”

Bad hypothesis:

“If we change the button color to green, conversion will increase.”

The difference? The good hypothesis is specific, measurable, and grounded in behavioral science. The bad one is a guess.

Step 3: Prioritize

You’ll generate more test ideas than you can run. Prioritize using a structured framework:

ICE Score (basic):

Impact (1-10): How much will this move the metric?
Confidence (1-10): How sure are you it will work?
Ease (1-10): How easy is it to implement?
ICE Score = Impact x Confidence x Ease

AXR Score (advanced — acceleroi’s framework):

Adds heuristic win rates, lever multipliers, and priority boosts to ICE for more accurate prediction. Our AI calculates AXR scores automatically.

Step 4: Design and Develop the Variation

Design guidelines:

Change only what your hypothesis specifies — don’t redesign the entire page
Keep the change meaningful enough to detect — a single word change on a low-traffic page won’t reach significance
Design for both desktop and mobile (or target one device specifically)
Consider above-the-fold vs below-the-fold visibility

Development guidelines:

Use your A/B testing tool’s visual editor for simple changes
Use custom code for structural changes
Ensure the variation loads without flash of original content (FOOC)
Test across all major browsers and devices

Step 5: Calculate Sample Size and Duration

Before launching, calculate how long the test needs to run:

Key inputs:

Baseline conversion rate (your current CVR)
Minimum detectable effect (MDE) — the smallest improvement worth detecting (typically 5-20% relative)
Statistical significance level — typically 95% (alpha = 0.05)
Statistical power — typically 80% (beta = 0.20)
Daily traffic to the test page

Example calculation:

Baseline CVR: 3.0%
MDE: 10% relative (detecting a lift from 3.0% to 3.3%)
Significance: 95%
Power: 80%
Required sample: ~35,000 visitors per variation
At 2,000 daily visitors: ~35 days minimum

Note: Never stop a test early because it “looks like a winner.” Early results are unreliable. Run for the full calculated duration AND at least 2 full business cycles (typically 2+ weeks).

Step 6: Launch and Monitor

Pre-launch checklist:

QA on Chrome, Safari, Firefox, Edge (desktop + mobile)
Verify tracking fires correctly for the primary metric
Confirm traffic split is working (50/50 or your chosen allocation)
Set calendar reminder for minimum test duration
Document the hypothesis, expected outcome, and success criteria

During the test:

Monitor for technical issues (broken layouts, tracking errors, JavaScript errors)
Do NOT make decisions based on early data
Do NOT change anything on the test page
Do NOT add more variations mid-test

Step 7: Analyze Results

When to call a test:

Minimum sample size reached
Minimum duration reached (2+ weeks)
Statistical significance reached (95%+ for Frequentist, or strong posterior probability for Bayesian)

How to analyze:

Frequentist approach:

p-value < 0.05 = statistically significant
Look at the confidence interval — does it include zero?
Check for segment-level effects (device, traffic source, visitor type)

Bayesian approach (recommended):

What’s the probability that B beats A?
What’s the expected revenue impact?
What’s the risk of implementing B? (expected loss)

Note: We use Bayesian analysis at acceleroi. It answers the question decision-makers actually care about: “What’s the probability this change will make us more money?” — rather than the Frequentist question: “How surprised should I be if there’s no difference?”

Every test — winner or loser — is a data point. Document:

What you tested and why
The result (with confidence level)
Revenue impact (actual or projected)
What you learned about user behavior
Implications for future tests

This builds your experimentation knowledge base and improves future hypothesis quality.

Common A/B Testing Mistakes

Mistake	Why It’s a Problem	How to Avoid It
Stopping tests early	False positives — you implement a change that doesn’t actually work	Calculate sample size upfront, commit to the full duration
Testing without a hypothesis	Random changes don’t build knowledge or compound results	Always start with research, observation, hypothesis
Testing too many things at once	Can’t attribute results to any specific change	One hypothesis per test. Use multivariate testing for multiple variables.
Ignoring sample size requirements	Underpowered tests produce unreliable results	Calculate required sample size before launch. If you don’t have enough traffic, test higher-impact changes.
Only looking at overall results	Missing segment-specific effects (mobile might win while desktop loses)	Always check device, traffic source, and visitor type segments
Measuring the wrong metric	Optimizing CTR might decrease revenue if click quality drops	Choose a primary metric tied to revenue. Monitor guardrail metrics.

A/B Testing Tools Comparison (2026)

Tool	Best For	Starting Price
VWO	Mid-market eCommerce, all-in-one platform	$350/mo
Optimizely	Enterprise, feature experimentation	Custom pricing
Convert	Privacy-focused, Shopify integration	$299/mo
AB Tasty	European market, personalization	Custom pricing
Google Optimize (sunset)	Was free — now look at alternatives	N/A
Shopify native A/B	Shopify stores, basic tests	Included in Shopify

How AI Is Changing A/B Testing

The biggest shift in A/B testing in 2026 is AI-powered hypothesis generation and prioritization:

AI-generated test ideas — Computer vision analyzes your pages and suggests experiments based on behavioral science principles
Predictive prioritization — AXR scoring predicts which tests are most likely to win based on historical data
Automated analysis — AI interprets session recordings and heatmaps at scale to identify testing opportunities
Faster learning — AI connects experiment outcomes to heuristic win rates, making future predictions more accurate

This doesn’t replace human creativity and strategic thinking — it supercharges the research and prioritization phases so more time is spent on high-impact work.

Frequently Asked Questions

How long should an A/B test run?

Minimum 2 weeks (to capture weekly patterns), or until you reach the calculated sample size — whichever is longer. Never stop early just because results “look significant.”

What’s a good A/B test win rate?

The industry average is about 1 in 3 tests produce a statistically significant winner. With strong research and behavioral-science-grounded hypotheses, you can push this higher — but losing tests are still valuable data.

How many tests should I run per month?

Depends on your traffic. High-traffic sites (100K+ monthly sessions) can run 4-6 concurrent tests. Lower-traffic sites should focus on 1-2 high-impact tests at a time with larger variations.

Can I A/B test with low traffic?

Yes, but you need to adjust your approach: test bigger changes (higher MDE), use Bayesian statistics, and supplement with qualitative research for directional validation.

Note: Get AI-generated A/B test ideas for your website. Our AI audit analyzes your pages against 40+ behavioral science heuristics and delivers prioritized experiment hypotheses — ready to test.

Conversion

Retention & Growth

Acquisition & Data

How to Run an A/B Test: Complete Guide

Everything You Need to Know About Running A/B Tests That Actually Move Revenue

What Is an A/B Test?

The A/B Testing Process (Step by Step)

Step 1: Research and Observation

Step 2: Generate Hypotheses

Step 3: Prioritize

Step 4: Design and Develop the Variation

Step 5: Calculate Sample Size and Duration

Step 6: Launch and Monitor

Step 7: Analyze Results

Common A/B Testing Mistakes

A/B Testing Tools Comparison (2026)

How AI Is Changing A/B Testing

Frequently Asked Questions

How long should an A/B test run?

What’s a good A/B test win rate?

How many tests should I run per month?

Can I A/B test with low traffic?

See where your store is leaking revenue

How to Run an A/B Test: Complete Guide

Everything You Need to Know About Running A/B Tests That Actually Move Revenue

What Is an A/B Test?

The A/B Testing Process (Step by Step)

Step 1: Research and Observation

Step 2: Generate Hypotheses

Step 3: Prioritize

Step 4: Design and Develop the Variation

Step 5: Calculate Sample Size and Duration

Step 6: Launch and Monitor

Step 7: Analyze Results

Step 8: Document and Share Learnings

Common A/B Testing Mistakes

A/B Testing Tools Comparison (2026)

How AI Is Changing A/B Testing

Frequently Asked Questions

How long should an A/B test run?

What’s a good A/B test win rate?

How many tests should I run per month?

Can I A/B test with low traffic?

See where your store is leaking revenue