AJ IMPACTEVALUATION CONSULTING

Power Analysis

Power Analysis Explained: Why Sample Size Matters

Understanding statistical power is crucial for rigorous impact evaluation

By Aubrey Jolex | November 26, 2025

You’re designing an impact evaluation. You’ve identified
your research question, chosen your outcome measures, and decided on a randomized controlled trial (RCT)
design. Now comes a critical question that many organizations get wrong:

How many people do you need in your study?

This is where power analysis comes in—one of the most important (and most misunderstood)
concepts in impact evaluation.

What Is Statistical Power?

In simple terms, statistical power is the probability that your study will detect an
effect if one actually exists.

Think of it like a metal detector:

  • A high-powered detector can find small coins buried deep underground
  • A low-powered detector will only find large metal objects near the surface
  • With a weak detector, you might walk right over buried treasure and never know it was there

In evaluation, power determines whether you’ll be able to detect your program’s true
impact. A well-powered study can detect small but meaningful effects. An underpowered study might miss
real impacts entirely.

The Four Key Concepts

Every power analysis involves four interrelated components:

1. Sample Size (N)

How many people are in your study

2. Effect Size

How large an impact you expect (or want to detect)

3. Significance (α)

Your threshold for calling a result “statistically significant” (usually 5%)

4. Power (1-β)

The probability of detecting a true effect (typically 80% or 90%)

These four are mathematically linked. If you know three, you can calculate the fourth.

Try Our Power Calculator

Skip the complex formulas! Use our RCT Workflow Toolkit to calculate sample sizes
instantly.

✓ Statistical power analysis and sample size determination
✓ Support for continuous and binary outcomes
✓ Individual and cluster randomization with ICC calculations
✓ Interactive power curves and baseline adjustments

Launch Power
Calculator →

Why Sample Size Matters: A Real Example

Let’s say you’re evaluating a girls’ education program. You believe the program increases secondary
school enrollment by 10 percentage points (from 60% to 70%).

Scenario 1: Small Sample (N=100)

  • Treatment group: 50 girls
  • Control group: 50 girls
  • Only 35% chance of detecting +10pp effect

Result: Evaluation will likely conclude “no significant effect” even though the
program works

✓ Scenario 2: Adequate Sample (N=400)

  • Treatment group: 200 girls
  • Control group: 200 girls
  • 80% chance of detecting +10pp effect

Result: Evaluation has sufficient power to detect the program’s impact

The difference? Sample size. Too small, and you’re flying blind.

The Consequences of Being Underpowered

When studies are underpowered, several bad things happen:

1. False Negatives (Type II Errors)

Your program actually works, but your evaluation fails to detect it. You conclude the program is
ineffective and shut it down, wasting a potentially valuable intervention.

Real example: An early childhood
education program genuinely improved child development, but the evaluation had only 100 children and
failed to detect statistical significance. The program was defunded. Years later, a larger study
with 800 children showed strong positive effects.

2. Wasted Resources

You spent money, time, and effort implementing an evaluation that was doomed from the start. All that
investment in data collection, analysis, and reporting yields inconclusive results.

3. Publication Bias

Journals and funders favor statistically significant results. Underpowered studies that show “no effect”
are less likely to be published, even if they were conducted rigorously.

4. Incorrect Conclusions

Sometimes, underpowered studies do find statistically significant results—but these are often false
positives or wildly inflated effect sizes. This misleads future program designers.

Conducting a Power Analysis: Step-by-Step

Here’s how to do a power analysis for your evaluation:

Step 1: Define Your Primary Outcome

What is the one most important outcome you’re measuring? Examples: test scores,
household income, clinic visits, business profit.

Step 2: Estimate Baseline Variance

How much does this outcome vary in your population? Get this from existing data, baseline
surveys, published studies, or pilot data.

Step 3: Define Minimum Detectable Effect (MDE)

What’s the smallest program impact that would be practically meaningful to
detect? This isn’t about what you hope for—it’s about what matters.

Step 4: Choose Alpha and Power

Standard choices: Alpha (α) = 0.05 and Power = 0.80 (80% chance of detecting true effect) or
0.90.

Step 5: Calculate Required Sample Size

Use our power analysis toolkit to determine
your needed sample size.

Step 6: Adjust for Real-World Factors

Account for attrition (dropout), clustering (village/school randomization), and stratification.

Use Our Free RCT Field Flow
Toolkit

Comprehensive platform for
managing your entire RCT lifecycle—including power calculations

Power Calculations

Statistical power analysis with ICC calculations and interactive
power curves

Randomization

Treatment assignment with balance diagnostics and validation tools

Analysis & Results

Statistical analysis with treatment effects and heterogeneity
analysis

Practical Example: Sample Size Calculation

Scenario: Evaluating a Savings Program

  • Outcome: Household savings
  • Standard deviation: 1,400 Philippine peso
  • Mean Household Savings: 800 Philippine pesos
  • MDE: 100 Philippine pesos (minimum meaningful impact)
  • Alpha: 0.05
  • Power: 0.80

Result: You need approximately 3,077 households per group (6,154
total) to detect a 100 pesos difference in savings with 80% power.

Adjustments for Real-World Factors

Attrition

If you expect 10% sample decrease in the follow up, then your sample should be adjusted as follows:

Example: N × 90% = 6,154

                N = 6,154/90%

N = 6,838

Clustering

If randomizing groups (villages, schools), use design effect:

Design effect = 1 + (m-1) × ICC

Stratification

Stratifying on baseline covariates can increase power, letting you detect smaller effects with
the same sample size

Common Power Analysis Mistakes

Mistake 1

Doing power analysis AFTER data collection

Power analysis must be done before you start. Post-hoc power analysis
is statistically meaningless.

Mistake 2

Powering for multiple outcomes

Pick your primary outcome and power for that. Other outcomes are
exploratory.

Mistake 3

Using unrealistic effect sizes

Be realistic based on prior research and theoretical expectations.

Mistake 4

Ignoring clustering

If you randomize groups, account for it in your power calculations.

What If You Can’t Afford the Required Sample?

Power analysis might reveal that you need 2,000 participants, but you can only afford 500. What now?

Option 1: Accept Lower Power

Document that your study is underpowered. You might still detect large effects, but you’ll miss
small-to-moderate effects.

Option 2: Focus on Larger Effect Sizes

Design your program to have bigger impacts. Instead of a light-touch intervention, implement
something more intensive.

Option 3: Use More Efficient Designs

Strategies like stratification, baseline covariates, or within-subject designs can increase power
without increasing sample size.

Option 4: Postpone Until You Have Resources

Sometimes it’s better to wait and do a properly powered study than to proceed with an
underpowered one.

Conclusion: Don’t Skip the Power Analysis

Power analysis is not an optional luxury—it’s a fundamental requirement for any rigorous impact
evaluation. Skipping it is like building a house without checking if the foundation can support the
structure.

Key Takeaways

  • Always conduct power analysis before starting your evaluation
  • Be realistic about effect sizes you want to detect
  • Account for attrition, clustering, and other real-world factors
  • Don’t proceed with an underpowered study unless you accept the risks

Remember: An underpowered evaluation is
worse than no evaluation at all. It wastes resources and generates misleading conclusions.

Ready to Power Your Evaluation?

Get started with our tools and expert guidance

Use Our Toolkit

Access the RCT Field Flow platform for power calculations, randomization, and complete RCT
management.

Launch Toolkit

Free Consultation

Need help with power analysis for your evaluation? Schedule a free consultation to discuss your study
design.

Book Now →

More Resources

Explore our blog for more guides on impact evaluation, RCT design, and statistical methods.

Read Blog →

About the Author

Aubrey Jolex is the founder of AJ Impact Evaluation Consulting, specializing in rigorous impact
evaluation for development programs. With 7+ years of experience with leading research organizations
such as IPA, IFPRI and IITA, Aubrey has designed and powered a number of impact evaluations across
multiple countries.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *