A/B Testing & Experimentation
Most A/B testing programs fail for one of three reasons: tests are stopped too early, sample sizes are too small to detect realistic effects, or the program runs in isolation from the business rather than tied to revenue. The articles below address each in detail.
Statistical rigor matters more than test volume. A program running 4 well-designed tests per month with proper sample-size planning and significance discipline will out-perform a program running 12 sloppy tests at 80% power that nobody trusts the results of. Speed without rigor produces a graveyard of "winners" that don't replicate, ship to production, and quietly drag conversion down.
The pieces here cover the methodology layer — [sample size](/blog/ab-testing-sample-size-guide) and MDE calculation, [statistical significance](/blog/ab-testing-statistical-significance) thresholds, [Bayesian vs frequentist](/blog/bayesian-vs-frequentist-ab-testing) tradeoffs, peeking and stopping rules, [multivariate testing](/blog/multivariate-testing-guide), and how to handle failed or inconclusive tests. Plus the program layer — tooling, reporting cadence, hypothesis intake, and what separates a mature experimentation team from a "we run tests sometimes" function.
If you're trying to figure out where to start testing or whether you have the traffic for it, the [free AI CRO audit](/audit) generates prioritized test ideas with predicted sample sizes for your specific traffic.
- Shopify Shopify Plus Launchpad for A/B Testing: Complete Guide + Comparison Complete guide to Shopify Plus Launchpad for A/B testing and controlled releases — what you can test, real limitations, step-by-step workflows, and when to use Launchpad vs third-party tools like Convert or VWO.
- A/B Testing Using AI to Generate A/B Test Ideas How to use AI (ChatGPT, Claude, Gemini) to generate A/B test hypotheses. Prompt templates, quality filters, and how to integrate AI into your ideation process.
- A/B Testing AI Experimentation Platforms 2026 AI-powered A/B testing platforms compared: Statsig, Eppo, Optimizely, VWO. CUPED variance reduction, bandits, predictive analysis. Choose the right platform for speed & rigor.
- A/B Testing AI Copywriting for A/B Testing How to use AI to generate copy variants for A/B tests — headlines, CTAs, product descriptions, and email subject lines. Quality control tips included.
- A/B Testing A/B Testing Reporting: Save Hours Every Week 2026 Cut A/B testing reporting time by 80%: automated dashboards, exec summaries, templates, and tagging systems. Spend more time optimizing, less time reporting.
- A/B Testing Bayesian vs Frequentist A/B Testing Bayesian vs Frequentist A/B testing explained — when to use each, how they handle early stopping, and which tools support them.
- CRO Personalization vs A/B Testing When to personalize vs when to A/B test — the decision framework, traffic requirements, and how the best programs combine both approaches.
- A/B Testing Feature Flags for Experimentation How feature flags enable safer, faster experimentation — gradual rollouts, kill switches, and the platforms that combine feature management with A/B tests.
- A/B Testing Frequentist vs Bayesian: Deep Dive for CRO A technical deep dive into Frequentist and Bayesian statistics for A/B testing — mathematical foundations, practical implications, and tool recommendations.
- A/B Testing Experimentation Velocity: How Fast Should You Test? How to measure and increase experimentation velocity — the relationship between test volume, learning speed, and compounding conversion gains.
- A/B Testing How Much Does A/B Testing Cost? The real cost of A/B testing — tools, team time, opportunity cost, and agency fees. How to calculate whether testing is worth it for your traffic level.
- A/B Testing Best A/B Testing Tools Compared (2026) Honest comparison of A/B testing tools — VWO, Optimizely, AB Tasty, Google Optimize alternatives, and Statsig. Features, pricing, and best fit.
- A/B Testing False Positives in A/B Testing How false positives corrupt A/B testing programs — multiple comparisons, early stopping, and the statistical corrections that prevent false wins.
- A/B Testing Segmentation in A/B Testing How to segment A/B test results correctly — pre-planned vs post-hoc segmentation, Simpson's paradox, and the segments that actually matter for CRO.
- A/B Testing Multivariate Testing: When and How to Use It When multivariate testing makes sense over A/B testing, how to design MVT experiments, and the traffic requirements that make it practical.
- CRO Building a CRO Experimentation Culture How to build an experimentation culture that scales — from executive buy-in to cross-functional collaboration to celebrating learning over winning.
- A/B Testing Server-Side A/B Testing Explained What server-side A/B testing is, why it matters for performance and reliability, and when to use it over client-side testing.
- A/B Testing A/B Testing for eCommerce: What to Test First The highest-impact A/B tests for eCommerce stores — product pages, cart, checkout, and navigation. Prioritized by expected revenue impact.
- A/B Testing 12 A/B Testing Mistakes That Waste Your Budget The most common A/B testing mistakes — from peeking at results to testing too many variants. How to avoid them and run tests that produce reliable wins.
- A/B Testing A/B Testing Sample Size: How Much Traffic You Need How to calculate the right sample size for your A/B tests. Why running tests too short produces unreliable results — and how to plan test duration correctly.
- A/B Testing How to Run an A/B Test: Complete Guide The complete guide to running A/B tests — hypothesis formation, test design, sample size calculation, duration, analysis, and common mistakes to avoid.
- A/B Testing A/B Testing for SaaS in 2026: Pricing, Onboarding & Activation Tests That Move Revenue The highest-impact A/B tests for SaaS in 2026 — pricing pages, trial onboarding, feature discovery, and upgrade prompts. Benchmarks and test ideas included.
- A/B Testing A/B Testing Statistical Significance Explained What statistical significance actually means in A/B testing, why 95% is the standard, and the mistakes that lead to false positives and premature test calls.
How long should an A/B test run?
Until it reaches its pre-calculated sample size at the agreed-upon power (usually 80%) and significance level (usually 95%) — not based on day-counts. Practically, this means 2–4 weeks for most pages with 10K+ monthly visitors. Stopping earlier inflates false-positive rates dramatically; the test result you see at day 3 is often the opposite by day 14.
Bayesian or frequentist — which should I use?
Frequentist is the industry default and is what most testing tools (Optimizely, VWO, Convert) use natively. Bayesian gives you probability-of-being-best framing that is easier for stakeholders to interpret but requires more methodological care. For most teams, pick whichever your tool defaults to, learn it deeply, and resist switching mid-program — the discipline matters more than the framework.
Why do my "winning" tests not hold up in production?
Three usual causes: peeking at results before reaching sample size (inflates false positives), running too many simultaneous tests on overlapping audiences (interaction effects), or measuring the wrong primary metric (e.g., click-through rate when revenue is what you actually care about). Audit each before running another test.
See where your store is leaking revenue
Our AI audit scores your store against 48 behavioral-science heuristics and shows you exactly what to fix first — in minutes, not weeks.