The Gold Standard for Causal Evidence
Randomized Controlled Trials (RCTs) are widely considered the most rigorous method for measuring program impact. But why? And when are they worth the investment?
Scenario: Your education program serves 500 students. After one year, test scores improve by 15 points.
Question: Did your program cause the improvement?
Alternative explanations:
— Students would have improved anyway (maturation)
— A new government policy changed curricula
— Motivated families self-selected into your program
— Economic growth improved nutrition, enabling learning
Bottom line: Without a comparison group, you can't isolate your program's effect.
Before randomization: Treatment and control groups are statistically identical on all characteristics (observed and unobserved).
After randomization: Any difference in outcomes can be attributed to the program (plus random noise).
Key insight: Randomization eliminates selection bias — the #1 threat to causal inference.
Genuine Uncertainty Exists
Program is new or unproven; stakeholders are genuinely unsure if it works; high stakes (scaling could affect millions).
Randomization is Feasible
Can assign treatment before rollout; enough units to randomize (100+); control group won't receive similar services elsewhere.
Results Will Inform Decisions
Funders will scale if evidence is positive, or pivot/stop if negative; academic publication could influence the field.
Budget Allows Rigor
Typically $20K–$100K+ depending on scope; sufficient sample size for adequate power; timeline allows 12–24+ months.
Randomization is Unethical
Denying treatment to control violates rights; program is clearly beneficial (e.g., clean water); vulnerable populations without appropriate protections.
Randomization is Infeasible
Program already rolled out universally; sample size too small (<50 units); political constraints prevent randomization.
Research Question Doesn't Need RCT
Process questions ("How is it implemented?"), mechanism questions ("Why does it work?"), or heterogeneity beyond effect size.
Better Alternatives Exist
Strong quasi-experimental design possible (RDD, DID); existing rigorous evidence from similar contexts; budget better spent on program improvement.
Reality: Costs vary widely ($20K–$500K+). Smart design can reduce costs: focus on administrative data, use cluster randomization, partner with government for data collection.
Counter-question: What's the cost of scaling an ineffective program?
Reality: Ethical RCTs are common when using phased rollout (everyone gets treatment, just at different times), oversubscription lotteries, or when genuine uncertainty exists.
What's truly unethical: Launching untested programs at scale without learning if they work.
Reality: RCTs have been successfully conducted in active conflict zones (DRC, Afghanistan), remote rural areas (Sub-Saharan Africa), urban slums (Kenya, India), and refugee camps (Jordan, Uganda).
Creative solutions: Cluster randomization (villages, not individuals), encouragement designs, stepped-wedge designs.
Reality: RCTs measure average treatment effects, but can also test mechanisms through mediation analysis, examine heterogeneity (who benefits most?), combine with qualitative research, and use multiple treatment arms to compare approaches.