A/B Testing Sample Size: How Much Traffic You Need

A/B Testing Sample Size: How to Calculate It, Why It Matters, and What to Do When You Don’t Have Enough

49,000 Visitors needed per variation (2% CVR, 20% MDE)

14 days Minimum test duration regardless of sample

80% Standard statistical power (1-beta)

2× Traffic needed for a standard A/B test

Sample size is the make-or-break factor in A/B testing. Run a test with too few visitors and your results are meaningless. Require too many and you’ll never finish a test. This guide covers everything you need to know.

The Sample Size Formula

Sample size per variation depends on four inputs:

Baseline conversion rate (p) — Your current conversion rate
Minimum Detectable Effect (MDE) — The smallest improvement worth detecting
Significance level (alpha) — Typically 0.05 (95% confidence)
Statistical power (1-beta) — Typically 0.80 (80% power)

Simplified formula:

n = (16 x p x (1-p)) / (MDE)^2

Where MDE is the absolute change in conversion rate (not relative).

Quick Reference Table

Baseline CVR	MDE (relative)	MDE (absolute)	Sample per variation
1%	20%	0.2%	~196,000
2%	20%	0.4%	~49,000
2%	10%	0.2%	~196,000
3%	15%	0.45%	~29,000
3%	10%	0.3%	~65,000
5%	10%	0.5%	~24,000
5%	20%	1.0%	~6,100
10%	10%	1.0%	~5,800
10%	5%	0.5%	~23,000

Key relationships:

Halving MDE means 4x the sample size (detecting smaller effects requires exponentially more data)
Higher baseline CVR means smaller sample needed (more “signal” in the data)
More variations means more total sample needed (traffic is split more ways)

Understanding MDE: The Most Important Decision

Note: MDE is the single most important input in sample size calculation. It’s not a statistical concept — it’s a business decision. You’re choosing: “What’s the smallest improvement I care about detecting?” Small MDE (5%) requires huge sample and catches small wins. Medium MDE (10-15%) is a balanced approach, standard for most tests. Large MDE (20%+) requires small sample but only catches big wins.

How to choose your MDE:

For high-traffic pages (10K+ daily visitors): 10% relative MDE
For medium-traffic pages (2-10K daily): 15% relative MDE
For low-traffic pages (under 2K daily): 20%+ relative MDE, or don’t A/B test

What If You Don’t Have Enough Traffic?

If your calculated sample size requires months of testing, you have options:

1. Test Bigger Changes (Increase MDE)

Instead of testing a button color, test a completely different page layout or value proposition. Bigger changes produce bigger effects, requiring fewer visitors.

2. Test on Higher-Traffic Pages

Move your testing to homepage, top product pages, or checkout — wherever traffic is highest.

3. Use a More Sensitive Metric

Instead of purchase conversion, test add-to-cart rate or click-through rate. Higher-frequency events require smaller samples.

4. Use Bayesian Methods

Bayesian analysis can reach actionable conclusions with fewer samples by calculating the probability of one variation winning, rather than testing against a null hypothesis.

5. Don’t A/B Test (Use Other Methods)

With under 1,000 daily visitors:

User testing — Watch 5-10 people use your site
Qualitative research — Surveys, interviews, session recordings
Before/after analysis — Implement changes and compare time periods
Expert review — Heuristic analysis against best practices

Common Sample Size Mistakes

1. Using overall site traffic for calculations

You need traffic to THE SPECIFIC PAGE being tested, not total site traffic. If your site gets 50K visitors/month but the test page only gets 5K, use 5K.

2. Forgetting about traffic splitting

Sample size is PER VARIATION. An A/B test needs 2x the per-variation sample. An A/B/C test needs 3x.

3. Setting unrealistic MDE

Detecting a 3% relative lift on a 2% baseline requires ~870,000 visitors per variation. At 1,000/day, that’s 2.4 years. Be realistic about what your traffic can support.

4. Not adjusting for multiple comparisons

If you’re testing multiple metrics (CVR, AOV, CTR), each comparison increases false positive risk. Adjust your significance threshold or designate one primary metric.

5. Ignoring the time dimension

Even if you reach the required sample in 5 days, you should still run for at least 14 days to capture weekly patterns.

Sample Size for Different Test Types

Test Type	Sample Multiplier	Notes
A/B test (2 variations)	2x per-variation sample	Standard, most efficient
A/B/C test (3 variations)	3x per-variation sample	Adjust significance for multiple comparisons
Multivariate test	Depends on combinations	Requires very high traffic; typically impractical
Bandit/MAB test	Similar total, but adaptive	Traffic shifts toward winners automatically

How to Calculate Sample Size for Revenue Metrics

Revenue per visitor (RPV) has higher variance than binary conversion rate, requiring larger samples:

Rule of thumb: RPV-based tests need 2–5x more sample than CVR-based tests because revenue distributions are skewed (a few large orders dominate).

Practical approach:

Calculate sample size for CVR
Multiply by 2–3x for RPV
Or use a bootstrap simulation with your actual revenue distribution

Common Sample Size Mistakes (& How to Fix Them)

Mistake	Consequence	Fix
Using overall site traffic instead of page traffic	Overestimates power; test never reaches significance	Use traffic to THE SPECIFIC PAGE being tested
Forgetting traffic split	Assumes all traffic goes to one variation	Divide sample per variation by number of variations (A/B = 2x, A/B/C = 3x)
Setting too-small MDE	Need 6–12 months of testing for marginal effects	Be realistic about smallest effect worth detecting (10–20% for most tests)
Not adjusting for multiple metrics	False positive rate inflates (1 metric = 5%, 5 metrics = 22%)	Designate one primary metric OR use multiple-comparisons correction
Ignoring seasonality in duration	Week 1 results vary wildly from week 3	Minimum 14 days regardless of sample; capture weekly patterns
Using conversions instead of visitors	Fundamental formula error	Always use VISITORS in sample size formula, not conversion counts

Sample Size Calculators & Tools (2026)

Tool	Type	Cost	Best For
Optimizely Sample Size Calculator	Web-based calculator	Free (requires Optimizely account)	Quick estimates, integrates with platform
Evan Miller’s Calculator	Web-based calculator	Free	Clean, simple interface; fast estimates
VWO Sample Size Calculator	Web-based calculator	Free	Good for Frequentist; includes power curve
R (pwr package)	Statistical software	Free (open-source)	Precise calculations, full control
Statsig Sample Size Tool	Platform-native	Free (inside Statsig)	Bayesian approach, continuous monitoring

For most teams: use Evan Miller’s calculator for quick math, then verify with your platform’s built-in tool.

Test Duration Planning by Traffic Level

Use this to plan how long to run a test before you can expect a valid result:

Monthly Visitors to Test Page	Baseline CVR	MDE (relative)	Expected Test Duration
3,000/mo (100/day)	3%	20%	60–90 days
6,000/mo (200/day)	3%	20%	30–45 days
15,000/mo (500/day)	3%	15%	28–35 days
30,000/mo (1,000/day)	2%	15%	21–28 days
60,000/mo (2,000/day)	2%	10%	28–35 days
150,000/mo (5,000/day)	2%	10%	14–21 days
300,000/mo (10,000/day)	2%	5%	35–50 days

Rule of thumb: If your required test duration exceeds 60 days, either test a bigger change (larger MDE) or move to a higher-traffic page. Long tests are subject to seasonal effects that contaminate results.

Traffic Minimums by Test Type

Test Type	Minimum Monthly Traffic	Notes
Homepage hero CTA	20,000+/mo	Large but not all traffic converts from here
Checkout flow	3,000+/mo	High-intent traffic, high event frequency
Product page (top product)	5,000+/mo	Use your highest-traffic product first
Category/collection page	10,000+/mo	Medium frequency events
Pricing page (SaaS)	2,000+/mo	High intent, smaller sample viable
Cart page	3,000+/mo	Bottleneck stage, high value
Email signup form	10,000+/mo page views	Low baseline rate needs more traffic

Frequently Asked Questions

How many visitors do I need to start A/B testing?

Minimum: 1,000 weekly visitors to the specific page being tested, targeting a 20% MDE. This gives you roughly 30-day test durations. Ideal: 5,000+ weekly visitors for a 10% MDE — this level allows you to run 2–3 concurrent tests on different pages without sample contamination concerns.

Can I use conversions instead of visitors for sample size?

The sample size formula uses visitors (sessions), not conversions. The conversion rate determines how many visitors you need to accumulate a given number of conversions. You need enough conversions to detect a difference — but the actual calculation is always based on the visitor count, not the conversion count.

What if my test reaches significance before the planned sample size?

Don’t stop early in frequentist testing. Run to the pre-calculated sample size and minimum duration — early significance often reverses. A test that shows 95% confidence on day 5 may fall to 60% confidence by day 14. This phenomenon (regression toward the mean in early data) is one of the most common sources of false positives in CRO programs.

Should I use one-tailed or two-tailed tests?

Two-tailed is the standard and almost always the right choice. It detects both improvements AND degradations — a variation that hurts conversion is as important to know as one that helps. One-tailed tests require about 20% less sample and are only appropriate when you’ve already established that the variant cannot possibly perform worse than control (which is rarely the case).

How do I handle multiple concurrent tests?

Running multiple tests simultaneously creates three risks: (1) traffic overlap — if the same visitors see both tests, their behavior may be affected by both variants; (2) interaction effects — two changes may interact with each other in unpredictable ways; (3) false significance — more tests means higher aggregate false positive rate. Best practice: run concurrent tests on separate, non-overlapping pages. On the same page, run tests sequentially, not concurrently.

Frequentist vs Bayesian — Why Bayesian reduces sample size requirements
Segmentation in A/B Testing — How to budget for segment-level analysis
False Positives in A/B Testing — Why longer test durations reduce false positives
CRO Roadmap Template — How to plan a quarter of tests with realistic sample sizes

Note: Don’t waste test slots on underpowered tests. Our AI audit helps you identify the highest-impact test opportunities — so every test slot is used on ideas with sufficient traffic AND high predicted impact.

Conversion

Retention & Growth

Acquisition & Data

A/B Testing Sample Size: How Much Traffic You Need

A/B Testing Sample Size: How to Calculate It, Why It Matters, and What to Do When You Don’t Have Enough

The Sample Size Formula

Quick Reference Table

Understanding MDE: The Most Important Decision

How to choose your MDE:

What If You Don’t Have Enough Traffic?

1. Test Bigger Changes (Increase MDE)

2. Test on Higher-Traffic Pages

3. Use a More Sensitive Metric

4. Use Bayesian Methods

5. Don’t A/B Test (Use Other Methods)

Common Sample Size Mistakes

1. Using overall site traffic for calculations

2. Forgetting about traffic splitting

3. Setting unrealistic MDE

4. Not adjusting for multiple comparisons

5. Ignoring the time dimension

Sample Size for Different Test Types

How to Calculate Sample Size for Revenue Metrics

Common Sample Size Mistakes (& How to Fix Them)

Sample Size Calculators & Tools (2026)

Test Duration Planning by Traffic Level

Traffic Minimums by Test Type

Frequently Asked Questions

How many visitors do I need to start A/B testing?

Can I use conversions instead of visitors for sample size?

What if my test reaches significance before the planned sample size?

Should I use one-tailed or two-tailed tests?

How do I handle multiple concurrent tests?

Read next

See where your store is leaking revenue

A/B Testing Sample Size: How Much Traffic You Need

A/B Testing Sample Size: How to Calculate It, Why It Matters, and What to Do When You Don’t Have Enough

The Sample Size Formula

Quick Reference Table

Understanding MDE: The Most Important Decision

How to choose your MDE:

What If You Don’t Have Enough Traffic?

1. Test Bigger Changes (Increase MDE)

2. Test on Higher-Traffic Pages

3. Use a More Sensitive Metric

4. Use Bayesian Methods

5. Don’t A/B Test (Use Other Methods)

Common Sample Size Mistakes

1. Using overall site traffic for calculations

2. Forgetting about traffic splitting

3. Setting unrealistic MDE

4. Not adjusting for multiple comparisons

5. Ignoring the time dimension

Sample Size for Different Test Types

How to Calculate Sample Size for Revenue Metrics

Common Sample Size Mistakes (& How to Fix Them)

Sample Size Calculators & Tools (2026)

Test Duration Planning by Traffic Level

Traffic Minimums by Test Type

Frequently Asked Questions

How many visitors do I need to start A/B testing?

Can I use conversions instead of visitors for sample size?

What if my test reaches significance before the planned sample size?

Should I use one-tailed or two-tailed tests?

How do I handle multiple concurrent tests?

Related Reading to Master Test Planning

Read next

See where your store is leaking revenue