Power Analysis Explained: Why Sample Size Matters
Understanding statistical power is crucial for rigorous impact evaluation
By Aubrey Jolex | November 26, 2025
You’re designing an impact evaluation. You’ve identified
your research question, chosen your outcome measures, and decided on a randomized controlled trial (RCT)
design. Now comes a critical question that many organizations get wrong:
How many people do you need in your study?
This is where power analysis comes in—one of the most important (and most misunderstood)
concepts in impact evaluation.
What Is Statistical Power?
In simple terms, statistical power is the probability that your study will detect an
effect if one actually exists.
Think of it like a metal detector:
- A high-powered detector can find small coins buried deep underground
- A low-powered detector will only find large metal objects near the surface
- With a weak detector, you might walk right over buried treasure and never know it was there
In evaluation, power determines whether you’ll be able to detect your program’s true
impact. A well-powered study can detect small but meaningful effects. An underpowered study might miss
real impacts entirely.
The Four Key Concepts
Every power analysis involves four interrelated components:
1. Sample Size (N)
How many people are in your study
2. Effect Size
How large an impact you expect (or want to detect)
3. Significance (α)
Your threshold for calling a result “statistically significant” (usually 5%)
4. Power (1-β)
The probability of detecting a true effect (typically 80% or 90%)
These four are mathematically linked. If you know three, you can calculate the fourth.
Try Our Power Calculator
Skip the complex formulas! Use our RCT Workflow Toolkit to calculate sample sizes
instantly.
✓ Statistical power analysis and sample size determination
✓ Support for continuous and binary outcomes
✓ Individual and cluster randomization with ICC calculations
✓ Interactive power curves and baseline adjustments
Why Sample Size Matters: A Real Example
Let’s say you’re evaluating a girls’ education program. You believe the program increases secondary
school enrollment by 10 percentage points (from 60% to 70%).
Scenario 1: Small Sample (N=100)
- Treatment group: 50 girls
- Control group: 50 girls
- Only 35% chance of detecting +10pp effect
Result: Evaluation will likely conclude “no significant effect” even though the
program works
✓ Scenario 2: Adequate Sample (N=400)
- Treatment group: 200 girls
- Control group: 200 girls
- 80% chance of detecting +10pp effect
Result: Evaluation has sufficient power to detect the program’s impact
The difference? Sample size. Too small, and you’re flying blind.
The Consequences of Being Underpowered
When studies are underpowered, several bad things happen:
1. False Negatives (Type II Errors)
Your program actually works, but your evaluation fails to detect it. You conclude the program is
ineffective and shut it down, wasting a potentially valuable intervention.
Real example: An early childhood
education program genuinely improved child development, but the evaluation had only 100 children and
failed to detect statistical significance. The program was defunded. Years later, a larger study
with 800 children showed strong positive effects.
2. Wasted Resources
You spent money, time, and effort implementing an evaluation that was doomed from the start. All that
investment in data collection, analysis, and reporting yields inconclusive results.
3. Publication Bias
Journals and funders favor statistically significant results. Underpowered studies that show “no effect”
are less likely to be published, even if they were conducted rigorously.
4. Incorrect Conclusions
Sometimes, underpowered studies do find statistically significant results—but these are often false
positives or wildly inflated effect sizes. This misleads future program designers.
Conducting a Power Analysis: Step-by-Step
Here’s how to do a power analysis for your evaluation:
Step 1: Define Your Primary Outcome
What is the one most important outcome you’re measuring? Examples: test scores,
household income, clinic visits, business profit.
Step 2: Estimate Baseline Variance
How much does this outcome vary in your population? Get this from existing data, baseline
surveys, published studies, or pilot data.
Step 3: Define Minimum Detectable Effect (MDE)
What’s the smallest program impact that would be practically meaningful to
detect? This isn’t about what you hope for—it’s about what matters.
Step 4: Choose Alpha and Power
Standard choices: Alpha (α) = 0.05 and Power = 0.80 (80% chance of detecting true effect) or
0.90.
Step 5: Calculate Required Sample Size
Use our power analysis toolkit to determine
your needed sample size.
Step 6: Adjust for Real-World Factors
Account for attrition (dropout), clustering (village/school randomization), and stratification.
Use Our Free RCT Field Flow
Toolkit
Comprehensive platform for
managing your entire RCT lifecycle—including power calculations
Power Calculations
Statistical power analysis with ICC calculations and interactive
power curves
Randomization
Treatment assignment with balance diagnostics and validation tools
Analysis & Results
Statistical analysis with treatment effects and heterogeneity
analysis
Practical Example: Sample Size Calculation
Scenario: Evaluating a Savings Program
- Outcome: Household savings
- Standard deviation: 1,400 Philippine peso
- Mean Household Savings: 800 Philippine pesos
- MDE: 100 Philippine pesos (minimum meaningful impact)
- Alpha: 0.05
- Power: 0.80
Result: You need approximately 3,077 households per group (6,154
total) to detect a 100 pesos difference in savings with 80% power.
Adjustments for Real-World Factors
Attrition
If you expect 10% sample decrease in the follow up, then your sample should be adjusted as follows:
Example: N × 90% = 6,154
N = 6,154/90%
N = 6,838
Clustering
If randomizing groups (villages, schools), use design effect:
Design effect = 1 + (m-1) × ICC
Stratification
Stratifying on baseline covariates can increase power, letting you detect smaller effects with
the same sample size
Common Power Analysis Mistakes
Mistake 1
Doing power analysis AFTER data collection
Power analysis must be done before you start. Post-hoc power analysis
is statistically meaningless.
Mistake 2
Powering for multiple outcomes
Pick your primary outcome and power for that. Other outcomes are
exploratory.
Mistake 3
Using unrealistic effect sizes
Be realistic based on prior research and theoretical expectations.
Mistake 4
Ignoring clustering
If you randomize groups, account for it in your power calculations.
What If You Can’t Afford the Required Sample?
Power analysis might reveal that you need 2,000 participants, but you can only afford 500. What now?
Option 1: Accept Lower Power
Document that your study is underpowered. You might still detect large effects, but you’ll miss
small-to-moderate effects.
Option 2: Focus on Larger Effect Sizes
Design your program to have bigger impacts. Instead of a light-touch intervention, implement
something more intensive.
Option 3: Use More Efficient Designs
Strategies like stratification, baseline covariates, or within-subject designs can increase power
without increasing sample size.
Option 4: Postpone Until You Have Resources
Sometimes it’s better to wait and do a properly powered study than to proceed with an
underpowered one.
Conclusion: Don’t Skip the Power Analysis
Power analysis is not an optional luxury—it’s a fundamental requirement for any rigorous impact
evaluation. Skipping it is like building a house without checking if the foundation can support the
structure.
Key Takeaways
- Always conduct power analysis before starting your evaluation
- Be realistic about effect sizes you want to detect
- Account for attrition, clustering, and other real-world factors
- Don’t proceed with an underpowered study unless you accept the risks
Remember: An underpowered evaluation is
worse than no evaluation at all. It wastes resources and generates misleading conclusions.
Ready to Power Your Evaluation?
Get started with our tools and expert guidance
Use Our Toolkit
Access the RCT Field Flow platform for power calculations, randomization, and complete RCT
management.
Free Consultation
Need help with power analysis for your evaluation? Schedule a free consultation to discuss your study
design.
More Resources
Explore our blog for more guides on impact evaluation, RCT design, and statistical methods.
About the Author
Aubrey Jolex is the founder of AJ Impact Evaluation Consulting, specializing in rigorous impact
evaluation for development programs. With 7+ years of experience with leading research organizations
such as IPA, IFPRI and IITA, Aubrey has designed and powered a number of impact evaluations across
multiple countries.