Category: Uncategorized

The Future of Research is Human-Powered AI: Introducing DERA
The Future of Research is Human-Powered AI: Introducing DERA

The future of research isn’t AI replacing human expertise—it’s AI amplifying it.

By Aubrey Jolex | December 15, 2025
We’re at an inflection point. AI is fundamentally changing how we research, learn, and solve problems in development economics. Yet the skepticism persists—and rightfully so. We’ve all seen the disclaimers: “ChatGPT can make mistakes, so double-check it.” We’ve all experienced AI confidently providing plausible-sounding answers to questions it has no business answering.

But here’s what I’ve learned: dismissing AI entirely is just as risky as trusting it blindly.

The truth? The future of research isn’t AI replacing human expertise—it’s AI amplifying
it. This is what researchers call Human-AI collaboration, and it’s transformative when
done right.

The Problem: Lost Hours in Literature Labyrinths

Let me paint a familiar picture. You’re a development economist planning a study on cash transfers. You need to understand what’s already known. So you start searching:

NBER working papers

World Bank policy research

J-PAL and IPA evaluations

Cross-referenced journal articles

Grey literature and policy briefs

By the time you’ve synthesized findings across these fragmented sources, you’ve burned 40-60 hours of research time. And you haven’t even started your actual analysis yet.

Literature review—the foundation of rigorous research—has become a bottleneck. We’re losing time on the searching when we should be spending it on the thinking.

Introducing DERA: Intelligence With Guardrails

DERA (Development Economics Research Assistant) is designed with this exact challenge in mind. It’s an AI-powered platform that combines semantic search with Retrieval-Augmented Generation (RAG) to revolutionize how we discover, access, and synthesize research in development economics.

But here’s the key: DERA isn’t here to replace your judgment. It’s here to amplify it.

The DERA interface
simplifies research discovery.

Five Capabilities Built for Human-AI Collaboration

🧠 Semantic Search Engine

Instead of keyword hunting, ask questions in natural language: “What’s the impact of
microfinance on household savings in sub-Saharan Africa?” DERA understands meaning, not just keywords. It finds papers you’d never stumble upon in a traditional search, reducing research time while improving comprehensiveness.

🏷️ Intelligent Categorization

Papers are organized hierarchically using JEL codes mapped to development economics themes. Browse strategically from broad categories (Poverty & Social Protection) to specific topics (Conditional vs. Unconditional Cash Transfers). Structure replaces chaos.

💬 RAG-Powered Research Assistant

Ask your research questions in plain English: “What’s the typical effect size for conditional cash transfers on school enrollment?” The platform retrieves relevant papers and synthesizes findings in context. You get evidence-backed answers—but you always see the sources, letting you verify, challenge, and refine.

🔬 RCT-Focused Evidence Browser

For those prioritizing causal evidence, a dedicated module indexes randomized controlled trials with rich metadata: intervention types, geographic regions, sample sizes, power calculations, and links to replication data. It’s discovery with rigor built in.

📊 Automated Meta-Analysis Engine

Imagine this: the platform automatically extracts coefficients, confidence intervals, and sample sizes from research papers, then visualizes effect sizes in forest plots. It identifies heterogeneity (how effects vary across contexts), all while flagging uncertainty and inviting human verification. You review, validate, and refine—not starting from scratch.

Seamlessly browsing
research evidence.

The Human-AI Contract

Notice what DERA doesn’t do: it doesn’t hand you answers carved in stone. Every feature is designed for verification, refinement, and human judgment. You’re not trusting AI to do your thinking—you’re using it to handle the grunt work so you can think better.

The Hybrid Model

AI finds the papers; you assess their quality

AI synthesizes findings; you contextualize and challenge them

AI extracts numbers; you verify and interpret them

The Result

Time saved: 60-70%
Judgment
retained: 100%

Balancing AI automation
with human judgment.

The Bigger Picture

In development economics, rigor matters. We’re informing policy that affects millions of lives. We can’t afford careless research. But we also can’t afford to waste half our research cycle on logistics.

DERA embodies a principle I believe deeply: smart tools make smarter humans. It
democratizes capabilities once available only to researchers at well-resourced institutions with dedicated research librarians and meta-analysis specialists.

The skepticism about AI in research? It’s not going away—and it shouldn’t. But the solution isn’t rejection. It’s smart collaboration.

What’s your experience been with AI in your research process? Are you finding the balance between efficiency and rigor?
Ready to Elevate Your Research?

Explore how AJ Impact can support your evaluation needs

View Our Work

See how we apply rigorous impact evaluation methods in the field.
View Projects →

Free Consultation

Need help with survey measurement or evaluation design? Schedule a free consultation.
Book Now →

More Resources

Explore our blog for more guides on impact evaluation, RCT design, and survey methodology.
Read Blog →

About the Author

Aubrey Jolex is the founder of AJ Impact Evaluation Consulting, specializing in rigorous impact evaluation for development programs. With 7+ years of experience with leading research organizations such as IPA, IFPRI and IITA, Aubrey has designed complex survey instruments and managed data collection across multiple countries.

Connect on
LinkedIn
Get in Touch

← Back to Blog | Home
December 15, 2025
SurveyCTO Version Control Tool with optional AI Agent
SurveyCTO Version Control Tool with AI Agent

Streamline your survey deployments with intelligent version control the Github commit message
way.

By Aubrey Jolex | December 12, 2025
AJ Impact Evaluation Consulting has designed and programmed complex SurveyCTO survey instruments incorporating advanced logic, real-time validation checks, and quality monitoring. Beyond
programming, the team understands the full survey lifecycle: ensuring instruments align with research
objectives, piloting, managing enumerator training, and troubleshooting implementation challenges.

But tracking changes across survey iterations? That was always the hardest part.

To address this challenge, AJ Impact developed the SurveyCTO Version Control tool with optional AI Agent—a Google Sheets Add-on that streamlines deployment management with one-click redirect form submission to SurveyCTO server and intelligent change logging.

Key Features

🚀 One-Click Deployment

Deploy your form directly from Google Sheets to the SurveyCTO server or console with a single click.

🔒 Version Locking

The tool freezes the version number in your settings sheet to prevent version misalignment between deployed forms.

📋 History Logging

Records every deployment in the Version History sheet with complete audit trails.

The SurveyCTO Version
Control menu integrates directly into Google Sheets

Check Out the Tool

The SurveyCTO Version Control tool is available on GitHub. Star the repo
and contribute!

View on GitHub →

The AI Agent: Your Smart Documentation Assistant

Don’t want to write deployment logs manually? Let the AI Agent do it for you.

✨ Ask Agent to Summarize

Click this button in the deployment popup

Agent compares current form against last deployed version

Generates professional, human-readable summary

⏱️ Before: Manual Tracking

15+ minutes documenting each deployment

Risk of missing important changes

Inconsistent formatting across team members

Instead of spending 15 minutes documenting what changed, the Agent tells you:

“Updated question labels, added skip logic to Q7, removed Q12″—done.

The AI Agent
automatically detects and summarizes form changes

Why This Matters

In research, transparency is everything.

RCTs, pilot studies, iterative design
cycles—they all demand a clear audit trail. This tool makes that effortless.

Every successful impact evaluation relies on well-documented survey instruments. When you’re managing complex RCTs across multiple sites and iterations, knowing exactly what changed between versions isn’t just convenient—it’s essential for maintaining research integrity.

The Full Survey Lifecycle

This tool was built from real-world experience across the entire survey lifecycle—from design through implementation. Complex instruments. Real-world constraints. Tools that actually solve problems.

Design

Ensuring instruments align with research objectives

Pilot Testing

Iterating based on field feedback

Training

Managing enumerator training and protocols

Implementation

Troubleshooting real-world challenges

Complete Survey Deployment
Management

From version control to
AI-powered change summaries—all in one Google Sheets Add-on

Version Control

Track Version, Deployed By, Timestamp, and Message for every
single deploy

Snapshots

The system quietly keeps track of your form’s state for precise
comparison

AI Summaries

Professional change logs generated automatically by the AI Agent

View on
GitHub
Contact Us

Choose between manual
logging or AI-powered summaries

How It Works

Step 1: Open Your Form

Open your SurveyCTO form spreadsheet in Google Sheets.

Step 2: Deploy Form

Click the SurveyCTO Version Control menu → 🚀 Deploy Form.

Step 3: Choose Your Mode

Manual: Type your own message and hit Enter.
Agent: Click ✨ Ask Agent to Summarize.

Step 4: Complete Deployment

Click Proceed to SurveyCTO to finish the actual deployment in the SurveyCTO server.

Every deployment is
logged with version, timestamp, and change message

Survey Expertise for Your RCTs

This comprehensive survey experience and tool-building capability positions AJ Impact well for developing and managing survey instruments for RCTs. Whether you need:

Complex Survey Programming

Advanced logic, real-time validation checks, and quality monitoring built into your instruments.

Deployment Management

Streamlined version control and transparent change tracking across all survey iterations.

Get in Touch

Managing survey deployments at scale? AJ Impact is ready to help
research teams streamline their data collection workflows.

Contact Us
Ready to Streamline Your Survey Deployments?

Explore the tool and connect for a consultation

View the Tool

Check out the SurveyCTO Version Control tool on GitHub. Star the repo and contribute!

View on GitHub →

Free Consultation

Need help with survey programming or deployment management? Schedule a free consultation to discuss your
needs.

Book Now →

More Resources

Explore our blog for more guides on impact evaluation, RCT design, and survey methodology.

Read Blog →

About the Author

Aubrey Jolex is the founder of AJ Impact Evaluation Consulting, specializing in rigorous impact evaluation for development programs. With 7+ years of experience with leading research organizations such as IPA, IFPRI and IITA, Aubrey has designed complex survey instruments and managed data collection across multiple countries.

Connect on
LinkedIn
Get in Touch

← Back to Blog | Home
December 11, 2025
Power Analysis
Power Analysis Explained: Why Sample Size Matters

Understanding statistical power is crucial for rigorous impact evaluation

By Aubrey Jolex | November 26, 2025
You’re designing an impact evaluation. You’ve identified
your research question, chosen your outcome measures, and decided on a randomized controlled trial (RCT)
design. Now comes a critical question that many organizations get wrong:

How many people do you need in your study?

This is where power analysis comes in—one of the most important (and most misunderstood)
concepts in impact evaluation.

What Is Statistical Power?

In simple terms, statistical power is the probability that your study will detect an
effect if one actually exists.

Think of it like a metal detector:

A high-powered detector can find small coins buried deep underground

A low-powered detector will only find large metal objects near the surface

With a weak detector, you might walk right over buried treasure and never know it was there

In evaluation, power determines whether you’ll be able to detect your program’s true
impact. A well-powered study can detect small but meaningful effects. An underpowered study might miss
real impacts entirely.

The Four Key Concepts

Every power analysis involves four interrelated components:

1. Sample Size (N)

How many people are in your study

2. Effect Size

How large an impact you expect (or want to detect)

3. Significance (α)

Your threshold for calling a result “statistically significant” (usually 5%)

4. Power (1-β)

The probability of detecting a true effect (typically 80% or 90%)

These four are mathematically linked. If you know three, you can calculate the fourth.

Try Our Power Calculator

Skip the complex formulas! Use our RCT Workflow Toolkit to calculate sample sizes
instantly.

✓ Statistical power analysis and sample size determination
✓ Support for continuous and binary outcomes
✓ Individual and cluster randomization with ICC calculations
✓ Interactive power curves and baseline adjustments

Launch Power
Calculator →

Why Sample Size Matters: A Real Example

Let’s say you’re evaluating a girls’ education program. You believe the program increases secondary
school enrollment by 10 percentage points (from 60% to 70%).

Scenario 1: Small Sample (N=100)

Treatment group: 50 girls

Control group: 50 girls

Only 35% chance of detecting +10pp effect

Result: Evaluation will likely conclude “no significant effect” even though the
program works

✓ Scenario 2: Adequate Sample (N=400)

Treatment group: 200 girls

Control group: 200 girls

80% chance of detecting +10pp effect

Result: Evaluation has sufficient power to detect the program’s impact

The difference? Sample size. Too small, and you’re flying blind.

The Consequences of Being Underpowered

When studies are underpowered, several bad things happen:

1. False Negatives (Type II Errors)

Your program actually works, but your evaluation fails to detect it. You conclude the program is
ineffective and shut it down, wasting a potentially valuable intervention.

Real example: An early childhood
education program genuinely improved child development, but the evaluation had only 100 children and
failed to detect statistical significance. The program was defunded. Years later, a larger study
with 800 children showed strong positive effects.

2. Wasted Resources

You spent money, time, and effort implementing an evaluation that was doomed from the start. All that
investment in data collection, analysis, and reporting yields inconclusive results.

3. Publication Bias

Journals and funders favor statistically significant results. Underpowered studies that show “no effect”
are less likely to be published, even if they were conducted rigorously.

4. Incorrect Conclusions

Sometimes, underpowered studies do find statistically significant results—but these are often false
positives or wildly inflated effect sizes. This misleads future program designers.

Conducting a Power Analysis: Step-by-Step

Here’s how to do a power analysis for your evaluation:

Step 1: Define Your Primary Outcome

What is the one most important outcome you’re measuring? Examples: test scores,
household income, clinic visits, business profit.

Step 2: Estimate Baseline Variance

How much does this outcome vary in your population? Get this from existing data, baseline
surveys, published studies, or pilot data.

Step 3: Define Minimum Detectable Effect (MDE)

What’s the smallest program impact that would be practically meaningful to
detect? This isn’t about what you hope for—it’s about what matters.

Step 4: Choose Alpha and Power

Standard choices: Alpha (α) = 0.05 and Power = 0.80 (80% chance of detecting true effect) or
0.90.

Step 5: Calculate Required Sample Size

Use our power analysis toolkit to determine
your needed sample size.

Step 6: Adjust for Real-World Factors

Account for attrition (dropout), clustering (village/school randomization), and stratification.

Use Our Free RCT Field Flow
Toolkit

Comprehensive platform for
managing your entire RCT lifecycle—including power calculations

Power Calculations

Statistical power analysis with ICC calculations and interactive
power curves

Randomization

Treatment assignment with balance diagnostics and validation tools

Analysis & Results

Statistical analysis with treatment effects and heterogeneity
analysis

Try It Free
Learn More

Practical Example: Sample Size Calculation

Scenario: Evaluating a Savings Program

Outcome: Household savings

Standard deviation: 1,400 Philippine peso

Mean Household Savings: 800 Philippine pesos

MDE: 100 Philippine pesos (minimum meaningful impact)

Alpha: 0.05

Power: 0.80

Result: You need approximately 3,077 households per group (6,154
total) to detect a 100 pesos difference in savings with 80% power.

Adjustments for Real-World Factors

Attrition

If you expect 10% sample decrease in the follow up, then your sample should be adjusted as follows:

Example: N × 90% = 6,154

N = 6,154/90%

N = 6,838

Clustering

If randomizing groups (villages, schools), use design effect:

Design effect = 1 + (m-1) × ICC

Stratification

Stratifying on baseline covariates can increase power, letting you detect smaller effects with
the same sample size

Common Power Analysis Mistakes

Mistake 1

Doing power analysis AFTER data collection

Power analysis must be done before you start. Post-hoc power analysis
is statistically meaningless.

Mistake 2

Powering for multiple outcomes

Pick your primary outcome and power for that. Other outcomes are
exploratory.

Mistake 3

Using unrealistic effect sizes

Be realistic based on prior research and theoretical expectations.

Mistake 4

Ignoring clustering

If you randomize groups, account for it in your power calculations.

What If You Can’t Afford the Required Sample?

Power analysis might reveal that you need 2,000 participants, but you can only afford 500. What now?

Option 1: Accept Lower Power

Document that your study is underpowered. You might still detect large effects, but you’ll miss
small-to-moderate effects.

Option 2: Focus on Larger Effect Sizes

Design your program to have bigger impacts. Instead of a light-touch intervention, implement
something more intensive.

Option 3: Use More Efficient Designs

Strategies like stratification, baseline covariates, or within-subject designs can increase power
without increasing sample size.

Option 4: Postpone Until You Have Resources

Sometimes it’s better to wait and do a properly powered study than to proceed with an
underpowered one.

Conclusion: Don’t Skip the Power Analysis

Power analysis is not an optional luxury—it’s a fundamental requirement for any rigorous impact
evaluation. Skipping it is like building a house without checking if the foundation can support the
structure.

Key Takeaways

Always conduct power analysis before starting your evaluation

Be realistic about effect sizes you want to detect

Account for attrition, clustering, and other real-world factors

Don’t proceed with an underpowered study unless you accept the risks

Remember: An underpowered evaluation is
worse than no evaluation at all. It wastes resources and generates misleading conclusions.
Ready to Power Your Evaluation?

Get started with our tools and expert guidance

Use Our Toolkit

Access the RCT Field Flow platform for power calculations, randomization, and complete RCT
management.

Launch Toolkit
→

Free Consultation

Need help with power analysis for your evaluation? Schedule a free consultation to discuss your study
design.

Book Now →

More Resources

Explore our blog for more guides on impact evaluation, RCT design, and statistical methods.

Read Blog →

About the Author

Aubrey Jolex is the founder of AJ Impact Evaluation Consulting, specializing in rigorous impact
evaluation for development programs. With 7+ years of experience with leading research organizations
such as IPA, IFPRI and IITA, Aubrey has designed and powered a number of impact evaluations across
multiple countries.

Connect on
LinkedIn
Get in Touch

← Back to Blog | Home
November 25, 2025
“Why Rigorous Impact Evaluation Matters for Development”
Why Rigorous Impact Evaluation Matters – AJ Impact Evaluation Consulting

Why Rigorous Impact Evaluation Matters

Moving beyond good intentions to measure true impact

By Aubrey Jolex | November 23, 2025 | 10 min read
Every year, billions of dollars flow into
international
development programs. Organizations are driven by a genuine desire to create positive change. Yet a
critical question often goes unanswered: Are these programs actually working?

This is where rigorous impact evaluation comes in—and why it matters more than ever.

The Problem: Good Intentions Aren’t Enough

Development practitioners work hard. But here’s the hard truth: activity doesn’t equal
impact.

Activity

10,000 people attended training.
50 schools received textbooks.
100 farmers adopted new
seeds.

Impact?

Did savings increase?
Did test scores improve?
Did farm income rise?

Without rigorous evaluation, we’re operating in the dark. We might be wasting resources on programs
that
don’t work, or worse, cause harm.

What Makes Evaluation “Rigorous”?

A rigorous impact evaluation answers a specific causal question: Did this program cause the
observed outcomes?

The gold standard is the Randomized Controlled Trial (RCT). When that’s not
feasible,
Quasi-Experimental Designs (QEDs) like RDD or DID can provide credible evidence.

Measure What Matters

Don’t guess about your impact. We help organizations design and implement rigorous evaluations.

RCT and Quasi-Experimental Design

Power Analysis & Sample Size Calculation

Data Quality Assurance & Analysis

Schedule Free
Consultation
→

Common Evaluation Pitfalls

Before-and-After

Comparing outcomes over time without a control group ignores external factors (economy,
weather).

Self-Selection

Comparing volunteers to non-volunteers is biased because volunteers are more motivated.

Small Samples

Underpowered studies fail to detect real effects, leading to false “no impact” conclusions.

Why Rigorous Evaluation Matters

Learn What Works: Discover the truth about your
program’s effectiveness.

Stop Wasting Money: Cut losses on ineffective
programs
and double down on what works.

Improve Programs: Learn which components drive
impact
and for whom.

Build Credibility: Funders demand evidence.
Rigorous
proof gives you a competitive edge.

Contribute to Knowledge: Help the global
development
community learn.

Common Objections (And Why They’re Wrong)

“Evaluation is too expensive”

Reality: It costs 5-10% of the budget. Is it worth saving 5% to waste the other
95%
on a program that doesn’t work?

“We know it works—we see it every day”

Reality: Humans are biased. We remember successes and forget failures. We need
objective data.

“Randomization is unethical”

Reality: When resources are scarce, a lottery is the fairest way to allocate
them.

Conclusion: Evidence Matters

Rigorous impact evaluation isn’t a luxury. It’s a moral imperative. We owe it to the people we serve
to
ensure our programs actually improve their lives.

Start Your Evaluation Journey

Think about evaluation from Day One

Invest in Power Analysis

Partner with Technical Experts

Commit to Learning (even from failure)
Ready to Measure Your Impact?

Get the tools and expertise you need

Free Consultation

Discuss your program and evaluation needs with an expert.

Book Now →

RCT Field Flow

Our all-in-one toolkit for managing rigorous evaluations.

Launch Toolkit
→

Learn More

Read our guides on Power Analysis, RCT Design, and more.

Read Blog →

About the Author

Aubrey Jolex is the founder of AJ Impact Evaluation Consulting, specializing in rigorous impact
evaluation for development programs. With 7+ years of experience at IFPRI and 6+ peer-reviewed
publications, Aubrey helps organizations generate credible evidence of program impact.

Connect on LinkedIn
Get in Touch

← Back to Blog | Home
November 23, 2025
Quasi-Experimental Designs
Quasi-Experimental Designs – AJ Impact Evaluation Consulting

Quasi-Experimental Designs

Rigorous impact evaluation when RCTs aren’t possible

By Aubrey Jolex | February 15, 2025 | 14 min read
Randomized controlled trials (RCTs) are the gold standard
for impact evaluation. But let’s be honest: sometimes an RCT just isn’t feasible.

Maybe your program is already running, political constraints prevent withholding services, or you’re
evaluating a national policy. Does this mean you can’t conduct rigorous impact evaluation?
No.

Enter quasi-experimental designs (QEDs)â€”methods that approximate experimental conditions
without random assignment. When implemented carefully, QEDs can provide credible causal evidence.

When to Use Quasi-Experimental Designs

âœ“ Feasibility

Randomization is politically or ethically impossible.

âœ“ Timing

Program is already implemented (too late for RCT).

âœ“ Scale

Universal rollout prevents creating a control group.

Four Common Quasi-Experimental Designs

1. Regression Discontinuity Design (RDD)

Best for: Programs with strict eligibility cutoffs (e.g., test scores, income
thresholds).

RDD compares people just above vs. just below the cutoff. If the cutoff is arbitrary, people on either
side are likely very similar, making the difference in outcomes attributable to the program.

Example: Scholarship for students scoring â‰¥80. Compare students scoring 79 vs. 81.
They are likely identical in ability, so any difference in future success is due to the scholarship.

2. Difference-in-Differences (DID)

Best for: Policy changes where you have pre- and post-data for treatment and comparison
groups.

DID compares the change over time in the treatment group vs. the change in the
comparison group. This removes time-invariant differences and common trends.

3. Propensity Score Matching (PSM)

Best for: When you have rich data on all factors influencing participation.

PSM creates a comparison group by matching each treated person with a non-participant who has similar
characteristics (age, education, motivation, etc.).

4. Instrumental Variables (IV)

Best for: When you have a variable (instrument) that affects treatment but not outcomes
directly.

IV uses “natural experiments” (like distance to a clinic or lottery numbers) to isolate causal effects.

ðŸ“Š Need Advanced Analysis?

QEDs require sophisticated statistical analysis to be credible. Our Analysis &
Results module and consulting services can help.

Rigorous statistical modeling (RDD, DID, PSM)

Robustness checks and sensitivity analysis

Clear interpretation of causal claims

Discuss Your Analysis
Needs â†’

Which Design Should You Use?

1. Eligibility Cutoff?

Yes â†’ Consider RDD (Regression Discontinuity)

2. Pre/Post Data?

Yes â†’ Consider DID (Difference-in-Differences)

3. Rich Covariates?

Yes â†’ Consider PSM (Propensity Score Matching)

4. Valid Instrument?

Yes â†’ Consider IV (Instrumental Variables)

Strengthening Your Design

Since QEDs rely on assumptions, you need to work harder to prove your results are robust:

Combine Methods: Use DID + PSM together for stronger evidence.

Falsification Tests: Test for effects on “placebo” outcomes that shouldn’t change.

Sensitivity Analysis: Show that results hold under different assumptions.

Transparency: Be honest about limitations and assumptions.

Conclusion

Quasi-experimental designs offer rigorous causal inference when RCTs aren’t feasible. But they’re not a
free lunchâ€”they require strong assumptions that must be justified and tested.

Key Takeaways

Choose design based on available data and variation

Understand and defend your assumptions (Parallel Trends, Continuity)

Test robustness extensively with placebo tests

Be honest about limitations
Expert Evaluation Support

Whether RCT or QED, we help you measure what matters

ðŸ“… Free Consultation

Not sure which design fits your program? Let’s discuss your options.

Book Now â†’

ðŸ› ï¸ Analysis Tools

Explore our toolkit for data management and analysis support.

Launch Toolkit
â†’

ðŸ“š Methods Guide

Read more about evaluation methodologies in our blog.

Read Blog â†’

About the Author

Aubrey Jolex has designed and implemented both experimental and quasi-experimental evaluations across
multiple countries. Let’s find the right approach for your context.

Connect on LinkedIn
Get in Touch

â† Back to Blog | Home
November 23, 2025
“Randomized Controlled Trials 101: The Gold Standard for Impact Evaluation”
RCTs 101: The Gold Standard for Impact Evaluation – AJ Impact Evaluation Consulting

RCTs 101: The Gold Standard for Impact Evaluation

Why randomization is the most powerful tool for measuring impact

By Aubrey Jolex | February 1, 2025 | 12 min read
Randomized controlled trials (RCTs) are the gold
standard for measuring program impact—but what makes them so powerful? And how do you actually
design and implement one correctly?

Whether you’re evaluating a health intervention, education program, or poverty reduction strategy,
understanding RCTs is essential for rigorous impact evaluation.

What is an RCT?

An RCT is an experimental design where participants are randomly assigned to either a treatment group
(receives the program) or a control group (does not receive the program).

The Key Principle: Randomization

Random assignment ensures that, on average, treatment and control groups are identical except for the
program itself. This means any difference in outcomes can be attributed to the program—not to
pre-existing differences between groups.

Why Randomization Matters

Without randomization, you might compare program participants (who
self-selected or were chosen) to non-participants. But these groups likely differ in motivation,
resources, or other characteristics—making it impossible to isolate the program’s true impact.

The Anatomy of an RCT

1. Baseline Survey

Collect data on participants before program starts

2. Randomization

Randomly assign participants to treatment or control

3. Program Implementation

Deliver the intervention to treatment group only

4. Endline Survey

Measure outcomes for both groups after program

5. Analysis

Compare treatment vs control group outcomes

6. Reporting

Document findings and program impact

Types of Randomization

Individual Randomization

Assign individual people to treatment or control. Best when: intervention is individual-level
(e.g., scholarship, training)

Cluster Randomization

Assign groups (schools, villages, clinics) to treatment or control. Best when: intervention
operates at group level or spillovers are concern

Implementing Randomization with Integrity

Common Threats to Validity

Warning: Compromised Randomization

Staff changing assignments based on “need”

Participants swapping between groups

Selective attrition from one group

Best Practices for Clean Randomization

Centralize Assignment

Use computer-based randomization, not manual selection

Blind When Possible

Keep staff unaware of assignment until after baseline

Lock Assignments

Do not allow changes after randomization

Document Everything

Record randomization procedure and any deviations

Power Analysis: Getting Sample Size Right

A critical step in RCT design is determining how many participants you need. Too few, and you won’t
be able to detect program effects. Too many wastes resources.

Design Your RCT Like a Pro

Use our RCT Field Flow Toolkit to document your research design, intervention
logic, and prepare for randomization.

Centralized study planning hub
Document intervention logic and theory of change
Prepare for power calculations and randomization

Start
Designing →

Common RCT Design Challenges

Ethical Concerns

Solution: Offer control group the program after study, use lottery for oversubscribed
programs

Spillovers

Solution: Use cluster randomization, ensure sufficient distance between treatment/control

Attrition

Solution: Track participants carefully, over-sample in baseline, analyze differential
attrition

Conclusion

RCTs provide the most credible evidence of program impact when done correctly. Key principles:

RCT Success Checklist

Conduct proper power analysis

Implement randomization with integrity

Collect baseline data before randomization

Monitor implementation fidelity

Minimize and track attrition

Pre-register your analysis plan
Ready to Design Your RCT?

Get the tools and guidance you need

Use Our Toolkit

Comprehensive platform for power analysis, randomization, and complete RCT management.

Launch Toolkit
→

Free Consultation

Get expert guidance on your RCT design and implementation strategy.

Book Now →

More Resources

Explore our blog for more guides on RCT design and impact evaluation.

Read Blog →

About the Author

Aubrey Jolex has designed and implemented dozens of RCTs across Asia and Africa with 7+ years of
experience at IFPRI. Learn from real-world experience to implement rigorous evaluations.

Connect on LinkedIn
Get in Touch

← Back to Blog | Home
November 23, 2025
“5 Common RCT Design Mistakes (And How to Avoid Them)”
5 Common RCT Design Mistakes – AJ Impact Evaluation Consulting

5 Common RCT Design Mistakes

And actionable strategies to avoid them in your evaluation

By Aubrey Jolex | February 10, 2025 | 15 min read
Randomized controlled trials (RCTs) are the gold
standard for impact evaluation—when done correctly. But we’ve reviewed dozens of RCT
designs over our years in the field, and we see the same mistakes repeatedly.

These aren’t minor technical issues. They’re fundamental flaws that can invalidate your entire
evaluation, waste resources, and lead to incorrect conclusions about program effectiveness.

Mistake #1: Insufficient Statistical Power

The Problem: You design an RCT with too small a sample to detect meaningful program
effects.

Real Example

An NGO randomizes 20 schools (1,000 students). Sounds big enough? Wrong. After
accounting for clustering, this design has only 35% power. Even if the program
works, there’s a 65% chance the evaluation will conclude “no significant effect.”

The Fix: Proper Power Analysis

Don’t guess your sample size. Use our RCT Field Flow Toolkit to calculate
exactly what you need.

Calculate sample size for individual & cluster RCTs

Account for attrition and baseline correlation

Visualize power curves interactively

Run Power
Analysis →

Mistake #2: Poor Randomization Implementation

The Problem: Randomization is compromised by field staff or logistical errors,
undermining the entire design.

Common Failures

Staff “randomly” assigning based on need

Swapping participants after assignment

Using predictable patterns (every other person)

How to Avoid It

Centralize randomization (computer-based)

Blind staff to assignment when possible

Lock assignment lists immediately

Mistake #3: Measuring Outcomes Too Soon (Or
Late)

The Problem: You collect data before effects materialize or after they’ve faded.

Too Soon

Measuring employment 3 months after training. Job search takes time—you’ll miss the impact.

Too Late

Measuring health 5 years after short-term vitamin supplements. Effects may have faded.

Rule of Thumb: Map your theory of change carefully.
Education often needs 1-2 years; health might need 6-24 months depending on the outcome.

Mistake #4: Ignoring Implementation Fidelity

The Problem: You assume the program was implemented as designed, but it wasn’t. A
“null result” might just mean the program never actually happened.

Monitor Your Fieldwork Real-Time

Use the Monitoring Dashboard in our toolkit to track implementation fidelity as
it happens.

Track submissions per enumerator

Verify intervention delivery

Detect quality issues immediately

Explore
Dashboard →

Mistake #5: Multiple Testing Without
Correction

The Problem: Testing 30 outcomes and reporting only the 2 “significant” ones. This
is statistical cherry-picking.

The Solution

Pre-specify primary outcomes: Pick 1-2 main goals.

Adjust p-values: Use Bonferroni or Benjamini-Hochberg corrections.

Create indices: Combine related measures (e.g., “Empowerment Index”) to
reduce the number of tests.

Bonus: Skipping Pre-Registration

Pre-registration prevents “p-hacking” and “outcome switching.” Always register your study design and
analysis plan on the AEA RCT Registry or OSF before collecting endline data.

Conclusion

Great RCT design requires care, technical knowledge, and commitment to rigor. Avoid these five
mistakes:

Checklist for Success

Conduct proper power analysis

Implement randomization with integrity

Time your measurement appropriately

Monitor implementation fidelity

Correct for multiple testing
Avoid These Mistakes with Our Toolkit

Tools designed to ensure rigor at every step

RCT Field Flow

Comprehensive platform for power analysis, randomization, and field monitoring.

Launch Toolkit
→

Expert Guidance

Don’t risk your evaluation. Schedule a consultation to review your design.

Book Consultation →

Design Guides

Read more about RCT design best practices in our blog.

Read More →

About the Author

Aubrey Jolex has designed and implemented dozens of RCTs across Asia and Africa with 7+ years of
experience at IFPRI. Learn from real-world experience—avoid costly mistakes in your
evaluation.

Connect on LinkedIn
Get in Touch

← Back to Blog | Home
November 23, 2025