A/B Testing Sample Size: How to Calculate It, Why It Matters, and What to Do When You Don’t Have Enough
Sample size is the make-or-break factor in A/B testing. Run a test with too few visitors and your results are meaningless. Require too many and you’ll never finish a test. This guide covers everything you need to know.
The Sample Size Formula
Sample size per variation depends on four inputs:
- Baseline conversion rate (p) — Your current conversion rate
- Minimum Detectable Effect (MDE) — The smallest improvement worth detecting
- Significance level (alpha) — Typically 0.05 (95% confidence)
- Statistical power (1-beta) — Typically 0.80 (80% power)
Simplified formula:
n = (16 x p x (1-p)) / (MDE)^2
Where MDE is the absolute change in conversion rate (not relative).
Quick Reference Table
| Baseline CVR | MDE (relative) | MDE (absolute) | Sample per variation |
|---|---|---|---|
| 1% | 20% | 0.2% | ~196,000 |
| 2% | 20% | 0.4% | ~49,000 |
| 2% | 10% | 0.2% | ~196,000 |
| 3% | 15% | 0.45% | ~29,000 |
| 3% | 10% | 0.3% | ~65,000 |
| 5% | 10% | 0.5% | ~24,000 |
| 5% | 20% | 1.0% | ~6,100 |
| 10% | 10% | 1.0% | ~5,800 |
| 10% | 5% | 0.5% | ~23,000 |
Key relationships:
- Halving MDE means 4x the sample size (detecting smaller effects requires exponentially more data)
- Higher baseline CVR means smaller sample needed (more “signal” in the data)
- More variations means more total sample needed (traffic is split more ways)
Understanding MDE: The Most Important Decision
Note: MDE is the single most important input in sample size calculation. It’s not a statistical concept — it’s a business decision. You’re choosing: “What’s the smallest improvement I care about detecting?” Small MDE (5%) requires huge sample and catches small wins. Medium MDE (10-15%) is a balanced approach, standard for most tests. Large MDE (20%+) requires small sample but only catches big wins.
How to choose your MDE:
- For high-traffic pages (10K+ daily visitors): 10% relative MDE
- For medium-traffic pages (2-10K daily): 15% relative MDE
- For low-traffic pages (under 2K daily): 20%+ relative MDE, or don’t A/B test
What If You Don’t Have Enough Traffic?
If your calculated sample size requires months of testing, you have options:
1. Test Bigger Changes (Increase MDE)
Instead of testing a button color, test a completely different page layout or value proposition. Bigger changes produce bigger effects, requiring fewer visitors.
2. Test on Higher-Traffic Pages
Move your testing to homepage, top product pages, or checkout — wherever traffic is highest.
3. Use a More Sensitive Metric
Instead of purchase conversion, test add-to-cart rate or click-through rate. Higher-frequency events require smaller samples.
4. Use Bayesian Methods
Bayesian analysis can reach actionable conclusions with fewer samples by calculating the probability of one variation winning, rather than testing against a null hypothesis.
5. Don’t A/B Test (Use Other Methods)
With under 1,000 daily visitors:
- User testing — Watch 5-10 people use your site
- Qualitative research — Surveys, interviews, session recordings
- Before/after analysis — Implement changes and compare time periods
- Expert review — Heuristic analysis against best practices
Common Sample Size Mistakes
1. Using overall site traffic for calculations
You need traffic to THE SPECIFIC PAGE being tested, not total site traffic. If your site gets 50K visitors/month but the test page only gets 5K, use 5K.
2. Forgetting about traffic splitting
Sample size is PER VARIATION. An A/B test needs 2x the per-variation sample. An A/B/C test needs 3x.
3. Setting unrealistic MDE
Detecting a 3% relative lift on a 2% baseline requires ~870,000 visitors per variation. At 1,000/day, that’s 2.4 years. Be realistic about what your traffic can support.
4. Not adjusting for multiple comparisons
If you’re testing multiple metrics (CVR, AOV, CTR), each comparison increases false positive risk. Adjust your significance threshold or designate one primary metric.
5. Ignoring the time dimension
Even if you reach the required sample in 5 days, you should still run for at least 14 days to capture weekly patterns.
Sample Size for Different Test Types
| Test Type | Sample Multiplier | Notes |
|---|---|---|
| A/B test (2 variations) | 2x per-variation sample | Standard, most efficient |
| A/B/C test (3 variations) | 3x per-variation sample | Adjust significance for multiple comparisons |
| Multivariate test | Depends on combinations | Requires very high traffic; typically impractical |
| Bandit/MAB test | Similar total, but adaptive | Traffic shifts toward winners automatically |
How to Calculate Sample Size for Revenue Metrics
Revenue per visitor (RPV) has higher variance than binary conversion rate, requiring larger samples:
Rule of thumb: RPV-based tests need 2-5x more sample than CVR-based tests because revenue distributions are skewed (a few large orders dominate).
Practical approach:
- Calculate sample size for CVR
- Multiply by 2-3x for RPV
- Or use a bootstrap simulation with your actual revenue distribution
Frequently Asked Questions
How many visitors do I need to start A/B testing?
Minimum: 1,000 weekly visitors to the page being tested, targeting a 20% MDE. Ideal: 5,000+ weekly visitors for a 10% MDE.
Can I use conversions instead of visitors for sample size?
The calculation uses visitors (total sample), but the conversion rate determines how many visitors you need. More conversions = more statistical power.
What if my test reaches significance before the planned sample size?
Don’t stop early (in Frequentist testing). Run to the pre-calculated sample size and minimum duration. Early significance often reverses.
Should I use one-tailed or two-tailed tests?
Two-tailed is standard and recommended. It detects both improvements AND degradations. One-tailed tests require ~20% less sample but miss harmful effects.
Note: Don’t waste test slots on underpowered tests. Our AI audit helps you identify the highest-impact test opportunities — so every test slot is used on ideas with sufficient traffic AND high predicted impact.