A/B Testing

A/B Testing Sample Size: How Much Traffic You Need

By Denys Pankov · February 3, 2026 · 10 min read

A/B Testing Sample Size: How to Calculate It, Why It Matters, and What to Do When You Don’t Have Enough

49,000 Visitors needed per variation (2% CVR, 20% MDE)
14 days Minimum test duration regardless of sample
80% Standard statistical power (1-beta)
Traffic needed for a standard A/B test

Sample size is the make-or-break factor in A/B testing. Run a test with too few visitors and your results are meaningless. Require too many and you’ll never finish a test. This guide covers everything you need to know.


The Sample Size Formula

Sample size per variation depends on four inputs:

  1. Baseline conversion rate (p) — Your current conversion rate
  2. Minimum Detectable Effect (MDE) — The smallest improvement worth detecting
  3. Significance level (alpha) — Typically 0.05 (95% confidence)
  4. Statistical power (1-beta) — Typically 0.80 (80% power)

Simplified formula:

n = (16 x p x (1-p)) / (MDE)^2

Where MDE is the absolute change in conversion rate (not relative).


Quick Reference Table

Baseline CVRMDE (relative)MDE (absolute)Sample per variation
1%20%0.2%~196,000
2%20%0.4%~49,000
2%10%0.2%~196,000
3%15%0.45%~29,000
3%10%0.3%~65,000
5%10%0.5%~24,000
5%20%1.0%~6,100
10%10%1.0%~5,800
10%5%0.5%~23,000

Key relationships:

  • Halving MDE means 4x the sample size (detecting smaller effects requires exponentially more data)
  • Higher baseline CVR means smaller sample needed (more “signal” in the data)
  • More variations means more total sample needed (traffic is split more ways)

Understanding MDE: The Most Important Decision

Note: MDE is the single most important input in sample size calculation. It’s not a statistical concept — it’s a business decision. You’re choosing: “What’s the smallest improvement I care about detecting?” Small MDE (5%) requires huge sample and catches small wins. Medium MDE (10-15%) is a balanced approach, standard for most tests. Large MDE (20%+) requires small sample but only catches big wins.

How to choose your MDE:

  • For high-traffic pages (10K+ daily visitors): 10% relative MDE
  • For medium-traffic pages (2-10K daily): 15% relative MDE
  • For low-traffic pages (under 2K daily): 20%+ relative MDE, or don’t A/B test

What If You Don’t Have Enough Traffic?

If your calculated sample size requires months of testing, you have options:

1. Test Bigger Changes (Increase MDE)

Instead of testing a button color, test a completely different page layout or value proposition. Bigger changes produce bigger effects, requiring fewer visitors.

2. Test on Higher-Traffic Pages

Move your testing to homepage, top product pages, or checkout — wherever traffic is highest.

3. Use a More Sensitive Metric

Instead of purchase conversion, test add-to-cart rate or click-through rate. Higher-frequency events require smaller samples.

4. Use Bayesian Methods

Bayesian analysis can reach actionable conclusions with fewer samples by calculating the probability of one variation winning, rather than testing against a null hypothesis.

5. Don’t A/B Test (Use Other Methods)

With under 1,000 daily visitors:

  • User testing — Watch 5-10 people use your site
  • Qualitative research — Surveys, interviews, session recordings
  • Before/after analysis — Implement changes and compare time periods
  • Expert review — Heuristic analysis against best practices

Common Sample Size Mistakes

1. Using overall site traffic for calculations

You need traffic to THE SPECIFIC PAGE being tested, not total site traffic. If your site gets 50K visitors/month but the test page only gets 5K, use 5K.

2. Forgetting about traffic splitting

Sample size is PER VARIATION. An A/B test needs 2x the per-variation sample. An A/B/C test needs 3x.

3. Setting unrealistic MDE

Detecting a 3% relative lift on a 2% baseline requires ~870,000 visitors per variation. At 1,000/day, that’s 2.4 years. Be realistic about what your traffic can support.

4. Not adjusting for multiple comparisons

If you’re testing multiple metrics (CVR, AOV, CTR), each comparison increases false positive risk. Adjust your significance threshold or designate one primary metric.

5. Ignoring the time dimension

Even if you reach the required sample in 5 days, you should still run for at least 14 days to capture weekly patterns.


Sample Size for Different Test Types

Test TypeSample MultiplierNotes
A/B test (2 variations)2x per-variation sampleStandard, most efficient
A/B/C test (3 variations)3x per-variation sampleAdjust significance for multiple comparisons
Multivariate testDepends on combinationsRequires very high traffic; typically impractical
Bandit/MAB testSimilar total, but adaptiveTraffic shifts toward winners automatically

How to Calculate Sample Size for Revenue Metrics

Revenue per visitor (RPV) has higher variance than binary conversion rate, requiring larger samples:

Rule of thumb: RPV-based tests need 2–5x more sample than CVR-based tests because revenue distributions are skewed (a few large orders dominate).

Practical approach:

  1. Calculate sample size for CVR
  2. Multiply by 2–3x for RPV
  3. Or use a bootstrap simulation with your actual revenue distribution

Common Sample Size Mistakes (& How to Fix Them)

MistakeConsequenceFix
Using overall site traffic instead of page trafficOverestimates power; test never reaches significanceUse traffic to THE SPECIFIC PAGE being tested
Forgetting traffic splitAssumes all traffic goes to one variationDivide sample per variation by number of variations (A/B = 2x, A/B/C = 3x)
Setting too-small MDENeed 6–12 months of testing for marginal effectsBe realistic about smallest effect worth detecting (10–20% for most tests)
Not adjusting for multiple metricsFalse positive rate inflates (1 metric = 5%, 5 metrics = 22%)Designate one primary metric OR use multiple-comparisons correction
Ignoring seasonality in durationWeek 1 results vary wildly from week 3Minimum 14 days regardless of sample; capture weekly patterns
Using conversions instead of visitorsFundamental formula errorAlways use VISITORS in sample size formula, not conversion counts

Sample Size Calculators & Tools (2026)

ToolTypeCostBest For
Optimizely Sample Size CalculatorWeb-based calculatorFree (requires Optimizely account)Quick estimates, integrates with platform
Evan Miller’s CalculatorWeb-based calculatorFreeClean, simple interface; fast estimates
VWO Sample Size CalculatorWeb-based calculatorFreeGood for Frequentist; includes power curve
R (pwr package)Statistical softwareFree (open-source)Precise calculations, full control
Statsig Sample Size ToolPlatform-nativeFree (inside Statsig)Bayesian approach, continuous monitoring

For most teams: use Evan Miller’s calculator for quick math, then verify with your platform’s built-in tool.


Test Duration Planning by Traffic Level

Use this to plan how long to run a test before you can expect a valid result:

Monthly Visitors to Test PageBaseline CVRMDE (relative)Expected Test Duration
3,000/mo (100/day)3%20%60–90 days
6,000/mo (200/day)3%20%30–45 days
15,000/mo (500/day)3%15%28–35 days
30,000/mo (1,000/day)2%15%21–28 days
60,000/mo (2,000/day)2%10%28–35 days
150,000/mo (5,000/day)2%10%14–21 days
300,000/mo (10,000/day)2%5%35–50 days

Rule of thumb: If your required test duration exceeds 60 days, either test a bigger change (larger MDE) or move to a higher-traffic page. Long tests are subject to seasonal effects that contaminate results.


Traffic Minimums by Test Type

Test TypeMinimum Monthly TrafficNotes
Homepage hero CTA20,000+/moLarge but not all traffic converts from here
Checkout flow3,000+/moHigh-intent traffic, high event frequency
Product page (top product)5,000+/moUse your highest-traffic product first
Category/collection page10,000+/moMedium frequency events
Pricing page (SaaS)2,000+/moHigh intent, smaller sample viable
Cart page3,000+/moBottleneck stage, high value
Email signup form10,000+/mo page viewsLow baseline rate needs more traffic

Frequently Asked Questions

How many visitors do I need to start A/B testing?

Minimum: 1,000 weekly visitors to the specific page being tested, targeting a 20% MDE. This gives you roughly 30-day test durations. Ideal: 5,000+ weekly visitors for a 10% MDE — this level allows you to run 2–3 concurrent tests on different pages without sample contamination concerns.

Can I use conversions instead of visitors for sample size?

The sample size formula uses visitors (sessions), not conversions. The conversion rate determines how many visitors you need to accumulate a given number of conversions. You need enough conversions to detect a difference — but the actual calculation is always based on the visitor count, not the conversion count.

What if my test reaches significance before the planned sample size?

Don’t stop early in frequentist testing. Run to the pre-calculated sample size and minimum duration — early significance often reverses. A test that shows 95% confidence on day 5 may fall to 60% confidence by day 14. This phenomenon (regression toward the mean in early data) is one of the most common sources of false positives in CRO programs.

Should I use one-tailed or two-tailed tests?

Two-tailed is the standard and almost always the right choice. It detects both improvements AND degradations — a variation that hurts conversion is as important to know as one that helps. One-tailed tests require about 20% less sample and are only appropriate when you’ve already established that the variant cannot possibly perform worse than control (which is rarely the case).

How do I handle multiple concurrent tests?

Running multiple tests simultaneously creates three risks: (1) traffic overlap — if the same visitors see both tests, their behavior may be affected by both variants; (2) interaction effects — two changes may interact with each other in unpredictable ways; (3) false significance — more tests means higher aggregate false positive rate. Best practice: run concurrent tests on separate, non-overlapping pages. On the same page, run tests sequentially, not concurrently.



Note: Don’t waste test slots on underpowered tests. Our AI audit helps you identify the highest-impact test opportunities — so every test slot is used on ideas with sufficient traffic AND high predicted impact.

See where your store is leaking revenue

Our AI-powered audit analyzes your pages against 48 behavioral science heuristics and shows you exactly what to fix first – in minutes, not weeks.

Get Instant CRO Audit → Book Strategy Call