A/B Testing Sample Size: How to Calculate It, Why It Matters, and What to Do When You Don’t Have Enough
Sample size is the make-or-break factor in A/B testing. Run a test with too few visitors and your results are meaningless. Require too many and you’ll never finish a test. This guide covers everything you need to know.
The Sample Size Formula
Sample size per variation depends on four inputs:
- Baseline conversion rate (p) — Your current conversion rate
- Minimum Detectable Effect (MDE) — The smallest improvement worth detecting
- Significance level (alpha) — Typically 0.05 (95% confidence)
- Statistical power (1-beta) — Typically 0.80 (80% power)
Simplified formula:
n = (16 x p x (1-p)) / (MDE)^2
Where MDE is the absolute change in conversion rate (not relative).
Quick Reference Table
| Baseline CVR | MDE (relative) | MDE (absolute) | Sample per variation |
|---|---|---|---|
| 1% | 20% | 0.2% | ~196,000 |
| 2% | 20% | 0.4% | ~49,000 |
| 2% | 10% | 0.2% | ~196,000 |
| 3% | 15% | 0.45% | ~29,000 |
| 3% | 10% | 0.3% | ~65,000 |
| 5% | 10% | 0.5% | ~24,000 |
| 5% | 20% | 1.0% | ~6,100 |
| 10% | 10% | 1.0% | ~5,800 |
| 10% | 5% | 0.5% | ~23,000 |
Key relationships:
- Halving MDE means 4x the sample size (detecting smaller effects requires exponentially more data)
- Higher baseline CVR means smaller sample needed (more “signal” in the data)
- More variations means more total sample needed (traffic is split more ways)
Understanding MDE: The Most Important Decision
Note: MDE is the single most important input in sample size calculation. It’s not a statistical concept — it’s a business decision. You’re choosing: “What’s the smallest improvement I care about detecting?” Small MDE (5%) requires huge sample and catches small wins. Medium MDE (10-15%) is a balanced approach, standard for most tests. Large MDE (20%+) requires small sample but only catches big wins.
How to choose your MDE:
- For high-traffic pages (10K+ daily visitors): 10% relative MDE
- For medium-traffic pages (2-10K daily): 15% relative MDE
- For low-traffic pages (under 2K daily): 20%+ relative MDE, or don’t A/B test
What If You Don’t Have Enough Traffic?
If your calculated sample size requires months of testing, you have options:
1. Test Bigger Changes (Increase MDE)
Instead of testing a button color, test a completely different page layout or value proposition. Bigger changes produce bigger effects, requiring fewer visitors.
2. Test on Higher-Traffic Pages
Move your testing to homepage, top product pages, or checkout — wherever traffic is highest.
3. Use a More Sensitive Metric
Instead of purchase conversion, test add-to-cart rate or click-through rate. Higher-frequency events require smaller samples.
4. Use Bayesian Methods
Bayesian analysis can reach actionable conclusions with fewer samples by calculating the probability of one variation winning, rather than testing against a null hypothesis.
5. Don’t A/B Test (Use Other Methods)
With under 1,000 daily visitors:
- User testing — Watch 5-10 people use your site
- Qualitative research — Surveys, interviews, session recordings
- Before/after analysis — Implement changes and compare time periods
- Expert review — Heuristic analysis against best practices
Common Sample Size Mistakes
1. Using overall site traffic for calculations
You need traffic to THE SPECIFIC PAGE being tested, not total site traffic. If your site gets 50K visitors/month but the test page only gets 5K, use 5K.
2. Forgetting about traffic splitting
Sample size is PER VARIATION. An A/B test needs 2x the per-variation sample. An A/B/C test needs 3x.
3. Setting unrealistic MDE
Detecting a 3% relative lift on a 2% baseline requires ~870,000 visitors per variation. At 1,000/day, that’s 2.4 years. Be realistic about what your traffic can support.
4. Not adjusting for multiple comparisons
If you’re testing multiple metrics (CVR, AOV, CTR), each comparison increases false positive risk. Adjust your significance threshold or designate one primary metric.
5. Ignoring the time dimension
Even if you reach the required sample in 5 days, you should still run for at least 14 days to capture weekly patterns.
Sample Size for Different Test Types
| Test Type | Sample Multiplier | Notes |
|---|---|---|
| A/B test (2 variations) | 2x per-variation sample | Standard, most efficient |
| A/B/C test (3 variations) | 3x per-variation sample | Adjust significance for multiple comparisons |
| Multivariate test | Depends on combinations | Requires very high traffic; typically impractical |
| Bandit/MAB test | Similar total, but adaptive | Traffic shifts toward winners automatically |
How to Calculate Sample Size for Revenue Metrics
Revenue per visitor (RPV) has higher variance than binary conversion rate, requiring larger samples:
Rule of thumb: RPV-based tests need 2–5x more sample than CVR-based tests because revenue distributions are skewed (a few large orders dominate).
Practical approach:
- Calculate sample size for CVR
- Multiply by 2–3x for RPV
- Or use a bootstrap simulation with your actual revenue distribution
Common Sample Size Mistakes (& How to Fix Them)
| Mistake | Consequence | Fix |
|---|---|---|
| Using overall site traffic instead of page traffic | Overestimates power; test never reaches significance | Use traffic to THE SPECIFIC PAGE being tested |
| Forgetting traffic split | Assumes all traffic goes to one variation | Divide sample per variation by number of variations (A/B = 2x, A/B/C = 3x) |
| Setting too-small MDE | Need 6–12 months of testing for marginal effects | Be realistic about smallest effect worth detecting (10–20% for most tests) |
| Not adjusting for multiple metrics | False positive rate inflates (1 metric = 5%, 5 metrics = 22%) | Designate one primary metric OR use multiple-comparisons correction |
| Ignoring seasonality in duration | Week 1 results vary wildly from week 3 | Minimum 14 days regardless of sample; capture weekly patterns |
| Using conversions instead of visitors | Fundamental formula error | Always use VISITORS in sample size formula, not conversion counts |
Sample Size Calculators & Tools (2026)
| Tool | Type | Cost | Best For |
|---|---|---|---|
| Optimizely Sample Size Calculator | Web-based calculator | Free (requires Optimizely account) | Quick estimates, integrates with platform |
| Evan Miller’s Calculator | Web-based calculator | Free | Clean, simple interface; fast estimates |
| VWO Sample Size Calculator | Web-based calculator | Free | Good for Frequentist; includes power curve |
| R (pwr package) | Statistical software | Free (open-source) | Precise calculations, full control |
| Statsig Sample Size Tool | Platform-native | Free (inside Statsig) | Bayesian approach, continuous monitoring |
For most teams: use Evan Miller’s calculator for quick math, then verify with your platform’s built-in tool.
Test Duration Planning by Traffic Level
Use this to plan how long to run a test before you can expect a valid result:
| Monthly Visitors to Test Page | Baseline CVR | MDE (relative) | Expected Test Duration |
|---|---|---|---|
| 3,000/mo (100/day) | 3% | 20% | 60–90 days |
| 6,000/mo (200/day) | 3% | 20% | 30–45 days |
| 15,000/mo (500/day) | 3% | 15% | 28–35 days |
| 30,000/mo (1,000/day) | 2% | 15% | 21–28 days |
| 60,000/mo (2,000/day) | 2% | 10% | 28–35 days |
| 150,000/mo (5,000/day) | 2% | 10% | 14–21 days |
| 300,000/mo (10,000/day) | 2% | 5% | 35–50 days |
Rule of thumb: If your required test duration exceeds 60 days, either test a bigger change (larger MDE) or move to a higher-traffic page. Long tests are subject to seasonal effects that contaminate results.
Traffic Minimums by Test Type
| Test Type | Minimum Monthly Traffic | Notes |
|---|---|---|
| Homepage hero CTA | 20,000+/mo | Large but not all traffic converts from here |
| Checkout flow | 3,000+/mo | High-intent traffic, high event frequency |
| Product page (top product) | 5,000+/mo | Use your highest-traffic product first |
| Category/collection page | 10,000+/mo | Medium frequency events |
| Pricing page (SaaS) | 2,000+/mo | High intent, smaller sample viable |
| Cart page | 3,000+/mo | Bottleneck stage, high value |
| Email signup form | 10,000+/mo page views | Low baseline rate needs more traffic |
Frequently Asked Questions
How many visitors do I need to start A/B testing?
Minimum: 1,000 weekly visitors to the specific page being tested, targeting a 20% MDE. This gives you roughly 30-day test durations. Ideal: 5,000+ weekly visitors for a 10% MDE — this level allows you to run 2–3 concurrent tests on different pages without sample contamination concerns.
Can I use conversions instead of visitors for sample size?
The sample size formula uses visitors (sessions), not conversions. The conversion rate determines how many visitors you need to accumulate a given number of conversions. You need enough conversions to detect a difference — but the actual calculation is always based on the visitor count, not the conversion count.
What if my test reaches significance before the planned sample size?
Don’t stop early in frequentist testing. Run to the pre-calculated sample size and minimum duration — early significance often reverses. A test that shows 95% confidence on day 5 may fall to 60% confidence by day 14. This phenomenon (regression toward the mean in early data) is one of the most common sources of false positives in CRO programs.
Should I use one-tailed or two-tailed tests?
Two-tailed is the standard and almost always the right choice. It detects both improvements AND degradations — a variation that hurts conversion is as important to know as one that helps. One-tailed tests require about 20% less sample and are only appropriate when you’ve already established that the variant cannot possibly perform worse than control (which is rarely the case).
How do I handle multiple concurrent tests?
Running multiple tests simultaneously creates three risks: (1) traffic overlap — if the same visitors see both tests, their behavior may be affected by both variants; (2) interaction effects — two changes may interact with each other in unpredictable ways; (3) false significance — more tests means higher aggregate false positive rate. Best practice: run concurrent tests on separate, non-overlapping pages. On the same page, run tests sequentially, not concurrently.
Related Reading to Master Test Planning
- Frequentist vs Bayesian — Why Bayesian reduces sample size requirements
- Segmentation in A/B Testing — How to budget for segment-level analysis
- False Positives in A/B Testing — Why longer test durations reduce false positives
- CRO Roadmap Template — How to plan a quarter of tests with realistic sample sizes
Note: Don’t waste test slots on underpowered tests. Our AI audit helps you identify the highest-impact test opportunities — so every test slot is used on ideas with sufficient traffic AND high predicted impact.