Power Analysis Explained: Why Sample Size Matters

Understanding statistical power is crucial for rigorous impact evaluation

By Aubrey Jolex | November 26, 2025

You're designing an impact evaluation. You've identified your research question, chosen your outcome measures, and decided on a randomized controlled trial (RCT) design. Now comes a critical question that many organizations get wrong:

How many people do you need in your study?

This is where power analysis comes in—one of the most important (and most misunderstood) concepts in impact evaluation.

What Is Statistical Power?

In simple terms, statistical power is the probability that your study will detect an effect if one actually exists. Think of it like a metal detector:
  • A high-powered detector can find small coins buried deep underground
  • A low-powered detector will only find large metal objects near the surface
  • With a weak detector, you might walk right over buried treasure and never know it was there
In evaluation, power determines whether you'll be able to detect your program's true impact. A well-powered study can detect small but meaningful effects. An underpowered study might miss real impacts entirely.

The Four Key Concepts

Every power analysis involves four interrelated components:

1. Sample Size (N)

How many people are in your study

2. Effect Size

How large an impact you expect (or want to detect)

3. Significance (α)

Your threshold for calling a result "statistically significant" (usually 5%)

4. Power (1-β)

The probability of detecting a true effect (typically 80% or 90%)
These four are mathematically linked. If you know three, you can calculate the fourth.

Try Our Power Calculator

Skip the complex formulas! Use our RCT Workflow Toolkit to calculate sample sizes instantly.

✓ Statistical power analysis and sample size determination ✓ Support for continuous and binary outcomes ✓ Individual and cluster randomization with ICC calculations ✓ Interactive power curves and baseline adjustments

Launch Power Calculator →

Why Sample Size Matters: A Real Example

Let's say you're evaluating a girls' education program. You believe the program increases secondary school enrollment by 10 percentage points (from 60% to 70%).

Scenario 1: Small Sample (N=100)

  • Treatment group: 50 girls
  • Control group: 50 girls
  • Only 35% chance of detecting +10pp effect
Result: Evaluation will likely conclude "no significant effect" even though the program works

✓ Scenario 2: Adequate Sample (N=400)

  • Treatment group: 200 girls
  • Control group: 200 girls
  • 80% chance of detecting +10pp effect
Result: Evaluation has sufficient power to detect the program's impact

The difference? Sample size. Too small, and you're flying blind.

The Consequences of Being Underpowered

When studies are underpowered, several bad things happen:

1. False Negatives (Type II Errors)

Your program actually works, but your evaluation fails to detect it. You conclude the program is ineffective and shut it down, wasting a potentially valuable intervention.

Real example: An early childhood education program genuinely improved child development, but the evaluation had only 100 children and failed to detect statistical significance. The program was defunded. Years later, a larger study with 800 children showed strong positive effects.

2. Wasted Resources

You spent money, time, and effort implementing an evaluation that was doomed from the start. All that investment in data collection, analysis, and reporting yields inconclusive results.

3. Publication Bias

Journals and funders favor statistically significant results. Underpowered studies that show "no effect" are less likely to be published, even if they were conducted rigorously.

4. Incorrect Conclusions

Sometimes, underpowered studies do find statistically significant results—but these are often false positives or wildly inflated effect sizes. This misleads future program designers.

Conducting a Power Analysis: Step-by-Step

Here's how to do a power analysis for your evaluation:

Step 1: Define Your Primary Outcome

What is the one most important outcome you're measuring? Examples: test scores, household income, clinic visits, business profit.

Step 2: Estimate Baseline Variance

How much does this outcome vary in your population? Get this from existing data, baseline surveys, published studies, or pilot data.

Step 3: Define Minimum Detectable Effect (MDE)

What's the smallest program impact that would be practically meaningful to detect? This isn't about what you hope for—it's about what matters.

Step 4: Choose Alpha and Power

Standard choices: Alpha (α) = 0.05 and Power = 0.80 (80% chance of detecting true effect) or 0.90.

Step 5: Calculate Required Sample Size

Use our power analysis toolkit to determine your needed sample size.

Step 6: Adjust for Real-World Factors

Account for attrition (dropout), clustering (village/school randomization), and stratification.

Use Our Free RCT Field Flow Toolkit

Comprehensive platform for managing your entire RCT lifecycle—including power calculations

Power Calculations

Statistical power analysis with ICC calculations and interactive power curves

Randomization

Treatment assignment with balance diagnostics and validation tools

Analysis & Results

Statistical analysis with treatment effects and heterogeneity analysis

Try It Free Learn More

Practical Example: Sample Size Calculation

Scenario: Evaluating a Savings Program

  • Outcome: Household savings
  • Standard deviation: 1,400 Philippine peso
  • Mean Household Savings: 800 Philippine pesos
  • MDE: 100 Philippine pesos (minimum meaningful impact)
  • Alpha: 0.05
  • Power: 0.80

Result: You need approximately 3,300 households per group (6,601 total) to detect a 100 pesos difference in savings with 80% power.

Adjustments for Real-World Factors

Attrition

If you expect 10% sample decrease in the follow up, then your sample should be adjusted as follows:

Example: N × 90% = 6,601

                N = 6,601/90% N = 7,335

Clustering

If randomizing groups (villages, schools), use design effect:

Design effect = 1 + (m-1) × ICC

Stratification

Stratifying on baseline covariates can increase power, letting you detect smaller effects with the same sample size

Common Power Analysis Mistakes

Mistake 1

Doing power analysis AFTER data collection

Power analysis must be done before you start. Post-hoc power analysis is statistically meaningless.

Mistake 2

Powering for multiple outcomes

Pick your primary outcome and power for that. Other outcomes are exploratory.

Mistake 3

Using unrealistic effect sizes

Be realistic based on prior research and theoretical expectations.

Mistake 4

Ignoring clustering

If you randomize groups, account for it in your power calculations.

What If You Can't Afford the Required Sample?

Power analysis might reveal that you need 2,000 participants, but you can only afford 500. What now?

Option 1: Accept Lower Power

Document that your study is underpowered. You might still detect large effects, but you'll miss small-to-moderate effects.

Option 2: Focus on Larger Effect Sizes

Design your program to have bigger impacts. Instead of a light-touch intervention, implement something more intensive.

Option 3: Use More Efficient Designs

Strategies like stratification, baseline covariates, or within-subject designs can increase power without increasing sample size.

Option 4: Postpone Until You Have Resources

Sometimes it's better to wait and do a properly powered study than to proceed with an underpowered one.

Conclusion: Don't Skip the Power Analysis

Power analysis is not an optional luxury—it's a fundamental requirement for any rigorous impact evaluation. Skipping it is like building a house without checking if the foundation can support the structure.

Key Takeaways

  • Always conduct power analysis before starting your evaluation
  • Be realistic about effect sizes you want to detect
  • Account for attrition, clustering, and other real-world factors
  • Don't proceed with an underpowered study unless you accept the risks

Remember: An underpowered evaluation is worse than no evaluation at all. It wastes resources and generates misleading conclusions.

Ready to Power Your Evaluation?

Get started with our tools and expert guidance

Use Our Toolkit

Access the RCT Field Flow platform for power calculations, randomization, and complete RCT management. Launch Toolkit →

Free Consultation

Need help with power analysis for your evaluation? Schedule a free consultation to discuss your study design. Book Now →

More Resources

Explore our blog for more guides on impact evaluation, RCT design, and statistical methods. Read Blog →

About the Author

Aubrey Jolex is the founder of AJ Impact Evaluation Consulting, specializing in rigorous impact evaluation for development programs. With 7+ years of experience with leading research organizations such as IPA, IFPRI and IITA, Aubrey has designed and powered a number of impact evaluations across multiple countries.
Connect on LinkedIn Get in Touch
← Back to Blog | Home