A/B test statistical significance calculator
Variant A (control)
Variant B (test)
Statistical significance — the probability that the difference between variants is not random. A confidence level of at least 95% is recommended for decision-making.
p-value — the probability of obtaining the observed (or greater) difference by chance. If p < 0.05, the result is considered statistically significant.
Confidence levels: 90% — low confidence, 95% — standard for most tests, 99% — high confidence for critical decisions.
Minimum sample size: for 1–5% conversion rates, a minimum of 1,000–10,000 visitors per variant is recommended. The smaller the difference between variants, the more traffic is needed.
A/B Test Calculator: Statistical Significance and Sample Size
This free A/B test calculator determines whether your test results are statistically significant and computes the sample size you need before launching. Make data-driven decisions with confidence.
What is an A/B test
An A/B test (split test) compares two versions of something — control (A) and variant (B) — by randomly splitting your audience between them and measuring a single metric. Once you've collected enough data, you determine which version performs better with statistical confidence.
A/B tests apply to landing pages, email subject lines, ad creatives, pricing, UI elements, and any hypothesis that can be measured quantitatively.
Statistical significance
Statistical significance (p-value) measures the probability that the observed difference between A and B occurred by chance. The standard threshold is p < 0.05 (95% confidence level), meaning there's less than a 5% chance the result is random.
| p-value | Confidence | Recommendation |
|---|---|---|
| < 0.01 | 99%+ | High confidence — act on results |
| 0.01–0.05 | 95–99% | Standard threshold — results are reliable |
| 0.05–0.10 | 90–95% | Weak significance — collect more data |
| > 0.10 | < 90% | Not significant — don't make decisions |
Sample size calculation
Required sample size depends on three inputs:
- Baseline conversion rate — current rate for variant A
- Minimum Detectable Effect (MDE) — smallest improvement worth detecting
- Statistical power — probability of detecting a real effect (typically 80%)
Rule of thumb: at 2% baseline conversion and 10% MDE (B expected at 2.2%), you need roughly 15,000–20,000 visitors per variant.
A/B testing rules
Test one variable at a time
Change only one element per test. If you change the headline, button color, and background simultaneously, you can't determine which change caused the result.
Don't stop the test early
Peeking at results and stopping when they "look good" is the most common mistake — it inflates false positive rates. Pre-determine test duration based on your sample size calculation and stick to it.
Run for full week cycles
Run tests for at least 7–14 days. Traffic patterns differ by day of week — a short test may capture an unrepresentative period.
Use an A/A test first
Before running A/B tests, run an A/A test where both variants are identical. If it shows significant differences, your tracking or traffic-splitting has a bug.
FAQ
Can I test more than two variants?
Yes — A/B/C/D multivariate testing is possible, but each additional variant increases required sample size. With limited traffic, stick to pairwise testing.
What does a neutral result mean?
A null result is still valuable — it means your hypothesis was wrong, freeing you to test the next idea. Document all tests; negative results prevent repeating failed hypotheses.
See also: ROI calculator, unit economics, CPM calculator.
Useful articles
LTV: How to Calculate Customer Lifetime Value
What LTV is, how to calculate it using different methods, and why it's a key metric for making business decisions.
CPM, CPC, and CPA: Online Advertising Payment Models
A detailed breakdown of online advertising payment models: CPM, CPC, and CPA. Calculation formulas, real-world examples, and recommendations for choosing the right model for different goals.