The impact of development impact research

TIME keynote, November 2025

Macartan Humphreys

The plan

  1. What convinced me of the need for randomized interventions [Sierra Leone DDR]
  2. Some big early investments, and poor returns: How little we know! [DRC study]
  3. A flourishing of studies, innovations, sectoral learning, high level take-aways [Many studies]
  4. What’s been picked up, what’s been set aside? [Many anecdotes]
  5. Where to from here? [Some sage reflections]

Headlines

  • We have learned a lot about how to figure out the effectiveness of development interventions
  • We have learned a lot about what kinds of interventions seem to work, with results very often going against intuitions
  • We have learned that many interventions have weak effects at best and that effects are likely very heterogeneous
  • Some of this learning appears to translate into changes in practice, but pickup is uneven

And:

  • Everything we—researchers and practitioners—have learned, points to the need to be modest about our knowledge, attuned to the role of context, and open to learning from different approaches

  • As development resources become scarce, the need to focus on learning becomes stronger than ever, especially for addressing the biggest questions

1. An interest in impacts

The road from TCD, 1994

The road from TCD, 1994

Development to Conflict (1996)

Destruction far surpassing development contributions

Sierra Leone 2003

We went to study why people fought, what sustained the conflict, what explained the horrific abuses. Following work in Casamance and Mali and Peru, Uganda (Weinstein).

  • More questions than hypotheses: No impact evaluations in mind
  • Funding sought from Michael Kremer (❌) and Jeffrey Sachs (✅)

Focus on

  • Why people take part in violence
  • Understanding patterns of violence

Sierra Leone conflict study: 2003

Crisis when we find:

  • No chance of accessing many chiefdoms given destruction and rains
  • Opposition from ex combatants

DDR: Disarmament, Demobilization, and Reintegration

Both problems addressed by bringing in a focus on DDR.

  • The ex-combatants issued a nationwide OK for our work if we included questions on ex combatant welfare.

  • The (Irish) UN DDR coordinator wanted to learn about program effects and gave full helicopter access in return for including a DDR module.

DDR:

A “process of disarmament, demobilization, and reintegration has repeatedly proved to be vital to stability in a post-conflict situation” (United Nations 2000, 1; italics added).

But really little evidence.

The need for control

At the time there was not (to my knowledge) a single study comparing places with and without DDR, or individuals exposed and not exposed

A colleague’s study highlights inferential challenge when you do not have a control group.

Blattman, short term patterns

DDR study

  • Co-designed modules with UN, NGOs, and ex-combatants

  • Sought to interview participants and non participants nationwide

  • Reached over 1000 ex combatants from all groups throughout the country

Sierra Leone

DDR Findings

  • Nearly 7% of ex-combatants experienced serious problems reintegrating after the war (approximately 5000 individuals)
  • Little evidence that women and young people faced more difficulty returning to civilian life
  • Individuals from abusive units experienced the greatest challenges in adapting to the post-war period
  • Almost no evidence that international programs to ease reintegration have a substantial impact on individual’s prospects

Results on the impacts of the DDR program in Sierra Leone

  • Differences mostly small and pointing in the wrong direction
  • Sobering, confusing

Things we learned

  • Huge commitment of UN and NGO partners, invested in the question, and open to learning (Still, saw some manipulation risks)

  • Clash between qualitative insights and quantitative findings and challenges to integration of findings

  • Is DDR a good idea? Maybe! Clarification of twin goals of interventions: DDR interventions might make sense politically, even if they have no specific benefits for individual participants

Messaging: not ‘DDR does not work (at this level),’ but ‘no evidence that it does’

  • Biggest lesson was serious difficulty estimating effects credibly

Dissatisfaction with this study

This is what’s called an “observational” study. The study was conducted after project implementation. Individuals self select into the intervention. Some do, some “self-reintegrate.”

We worried about three big risks to inference:

  • Selection biases: do people self-reintegrate because they are doing well? Or because they are doing very poorly?

  • Sampling biases: are successful reintegrators less sampleable?

  • Spillover biases: are non participants doing well because their friends did participate?

These risks are quite general across impact evaluations.

But they can all be minimized with randomized intervention.

Shift to randomized impact evaluation

A shift

Big push at this time to broaden partnerships between researchers and practitioners (EGAP network and others: DIME, 3IE, JPAL, IPA; DfID was a major actor).

Dream was (is!) to maintain twin goals:

  • learn about what works in development – not just aid effectiveness
  • work as scholars not plumbers: learn about what works to do good in the world and to get a sharper handle on generalizable social processes.

CDR trials

Joined forces with the International Rescue Committee (IRC) as they started a learning agenda.

Two bigger RCTs with IRC in Liberia and Congo.

“Community-driven development operations produce two primary types of results: more and better distributed assets, and stronger, more responsive institutions.” (The World Bank)

But: almost no evidence to this effect at the time and almost none now.

My biggest null: DRC

The intervention:

  • Components: CDR, elections, committees, training, projects
  • Very large scale: c 2m receive the program, 2m in control. 560 randomization units.

Objectives: economic and governance:

  • Participation
  • Accountability
  • Efficiency
  • Transparency
  • Capture

DRC research strategy:

Some clear priors:

“This program is exciting because it seeks to understand and rebuild the social fabric of communities. […] It’s a program that starts to rebuild trust, it’s a grassroots democratization program.”

Strong identification:

  • Block randomization & Excellent balance
  • Good power (800 + villages)
  • Random variation in treatment
  • Strong “first stage”

Strong measurement strategy:

  • Naturalistic behavioral outcomes: Independent second $1000 projects in 560 villages
  • Entire second development program introduced for measurement purposes

Pre-registration: lots of sign offs, buy-in, enthusiasm

DRC Results

On measure after measure after measure, the distribution of outcomes in treatment and control were identical.

DRC Results

Reception

  • Shock among staff: these were closest to us; almost a grieving process
  • Messaging. Political embarrassment: but implementation sound and focus on learning applauded
  • But confusion: If not CDR, then what? Our ability to answer this was limited.

Broader literature and Impact

  • A collection of other RCTs similarly found null or mixed results, Liberia, Sierra Leone, Afghanistan…

  • A lot of work by IRC and DfID to make sense of this.

Major implication seems to be picked up: CDR (CDD) might make sense for implementing projects but not for improving governance. Allocate resources accordingly.

Perhaps bigger check on the idea that externals can (or should) be trying to alter local governance structures.

Since then:

  • IRC has largely shifted out of governance interventions

    • Treatment of Acute Malnutrition
    • Contraception
    • Humanitarian Immunization
    • Multi-purpose Cash Assistance

Expansion

A flourishing of studies

  • Expansion of studies in last 20 years
  • Meta analysis possible in more and more areas
  • Long term impacts assessment possible

Irish development NGOs and academic researchers playing a big role.

  • Concern has had multiple major RCTs, GOAL also engaging
  • TIME punching well above its weight with multiple collaborations with World Bank, FCDO and others

What kinds of findings are emerging?

Bad news on Micro-credit

An average increase of less than 8 percent of the current average profit, and less than 5 percent of the standard deviation, is not likely to be a transformative change for a household. (Meager 2019)

Bad news on information and accountability (Metaketa i)

(Dunning et al. 2019) No evidence that getting information to voters about political malfeasance affects vote choice!

Some bright news on a DDR-like intervention

Recent Liberia study (Blattman) looks specifically at Cognitive Behavioral Therapy with at risk youths post conflict.

CBT shows some surprisingly encouraging results after 10 years

CBT Liberia project; Blattman et al

  • Not a UN intervention. Initial work by local NGO NEPI with IPA support (largely UK aid funded)
  • Almost all effects concentrated in small group with severe initial problems (targeting)

Good news on community based natural resource management (Metaketa iii)

Community monitoring reduces extraction rates (and has other benefits) but heterogeneous and only observed across studies.

Good news on Cash

Meta-analysis of 115 studies of 72 UCT programs in middle and low income countries: strong and positive average treatment effects on 10 of 13 outcomes: monthly household total and food consumption, monthly income, labor supply, school enrollment, food security, psychological well-being, total assets, financial assets, and children height-for-age. Crosta et al

Good news on migration

  • Bryan, Chowdhury, and Mobarak (2014) seasonal migration, encouragement design
  • Mobarak, Sharif, and Shrestha (2023) on international migration (natural experiment using lotteries)

Good news on ‘graduation programs’

Multifacted components to help move people out of poverty traps Banerjee et al. (2015) (assets, skills, training)

A lot of work, including by TIME, about optimizing the design – e.g. understanding gender aspects.

High level conclusions

  • Many nulls (both in individual studies and meta-analyses)

  • Much heterogeneity

  • Heterogeneity across fields

  • It seems lots and lots of interventions don’t work

  • We systematically over-estimate program effectiveness

  • But for all that, clearly cumulation of knowledge

[Features of RCTs that make nulls more likely]

They are more likely to:

  • be prospective (protect you from the lure of retrospective “natural experiments”)
  • be independent
  • use more distal behavioral measures
  • get the statistics right
  • include adjustments for multiple comparisons
  • employ preanalysis plans
  • be large (and unhideable!)

What gets picked up

What do we know about the impact of impact studies

An irony for impact evaluation is that, for the most part, there is not good evidence on the impact of impact.

They have all the features that make impact hard to study

In preparation for the talk I spoke with a few people working on policy and research at

  • (former) DfID
  • (former?) USAID
  • WFP
  • IRC
  • others

Where did they see research having an impact?

Learning about learning: a little evidence

Exposed policy makers to single experimental study, single observational study (Ext), meta-analyses of experimental studies

Learning about learning: Practitioner put weight on RCTs, on meta-analyses, and on negative evidence

DDR: a (mostly) stalled agenda

We saw null results from the Sierra Leone observational DDR project

  • 20 years later, what do we know?

  • We urged the UN and others to implement some DDR RCTs

  • We tried with UN in Haiti but initial fieldwork suggested the program was unlikely to be effective

  • There is still not a single completed RCT of a UN style DDR program

Major design questions remain unanswered:

  • Should they focus on ex-combatants or youth in general?
  • Should authority structures be employed or ignored?
  • Should opportunities be expanded case-by-case or through larger development plans?

However the Blattman study is now having influence (though it’s still just one study).

We are currently working on a German funded project in Nigeria directly influenced by the Blattman study, following literature reviews.

Instances where negative results likely changed policy

In general, it is easier to use evidence to stop programmes. … It has far larger reputational risk – there is evidence and documents that could be requested … that state advice the minister should not do something that is a risk.

It is a much smaller risk if a minister did not do what someone advised based on evidence. The argument is that there are many good things they could do so they can pick and not pick.

  • CDR: Reduced in some portfolios, restructured in others

  • Microcredit

Null results also helpful for simplifying

Bednets example

  • Citizens were historically charged for bednets in malarial zones based on quasi-behavioral arguments.

  • Cohen and Dupas varied whether bednets are subsidized or not and found no deterioration in quality of usage (but gains in uptake)

In 2009, the British government cited the study in calling for the abolition of user fees for health products and services in poor countries. (IPA)

Targetting

World Food Program recently worrying about effectiveness of aid targetting algorithms:

  • Tried multiple versions
  • No evidence of marked differences
  • Go with simpler models

Instances where impact studies led to design improvements

Many of these:

  • WFP is now regularly doing “lean” evaluations: rapid low cost randomized pilot interventions to decide whether or how to proceed.

  • WFP Jordan study uses a school feeding menu change pilot to learn about optimal design, boosting school attendance

An instance where positive results likely supported a policy

When it should have:

  • World Food Program decided to largely shift to cash, before the evidence came in. But were able to refer to the evidence to support the policy.

  • DfID also shifted to cash before the evidence, but relied on evidence later to defend it

  • Germany much more hesitant

When it maybe shouldn’t have:

Community monitoring of health workers

  • One small study reported enormous effects and was very influential—or at least much cited to support other interventions
  • Later (larger) replications including by found much weaker evidence (by Raffler et al and GOAL) or that community monitoring was not better than simply paying workers (Voors et al). Will they have the same impact?

Looking ahead

For researcher / practitioner partnerships

Build trust at all levels

  • Invest in partnerships: Joint communities of practice, be present

  • Work across methodological divides: seek to integrate quantitative and qualitative knowledge rather than treating them as alternative perspectives

  • Engage ethically: Minimize interference; step back when unexpected risks arise; adhere to principle of justice

  • Align “central” (HQ, global) learning agendas and “local” agendas: neither pure top down or pure bottom up approaches are working, for different reasons

  • Communicate: Use MOUs and share PAPs: clarity about the purpose and what implications will be drawn from different findings

For researcher / practitioner partnerships

A best practice model from WFP:

  • Head office-country office consultations on major learning questions
  • Funding to support country offices with projects that align with agenda
  • Long term relations with research teams (DIME, externals)

Twin focus:

  • Internal focus: using evidence for decisions
  • External focus: self consciously working with a community of practice to share agendas

For ODA actors

Biggest lesson is maybe that many development interventions are likely ineffective and those that are likely are not effective everywhere

Expect to have your priors challenged.

Knowing that things so often do not work out as we expect has welfare implications and political implications

Two responses:

  • A: Regroup and focus on what works
  • B: Place more value on learning

For ODA actors

Approach A is to focus on “best buys” in development

  • A lot to be said for this, especially when resources are scarce and you want to be sure of impact.

  • But a focus on doing “what works” means reduced investment in things that

    • don’t work
    • haven’t been shown to work (or haven’t been tried)

This:

  • can prevent innovation
  • leave the biggest and hardest questions unaddressed

For ODA actors

Approach B: Focus on learning. In particular: use impact evaluations more as a tool for learning than for accountability

Accountability goal

  • Can threaten trust

  • Discourage risk taking

  • Impact evaluation a noisy tool for accountability anyhow

  • implementers are not (always) responsible for impact; impact depends on the design (commissioners also responsible)

Learning agenda:

  • Knowledge is a public good and the impacts (in principle) can be large: a contribution in its own right

  • Even if the evidence on individual projects is be noisy it cumulates

  • Implications for what interventions or aspects of interventions to be examined:

    • goal is portable knowledge
    • focus on variations in treatment
  • Learning agenda is a shared agenda, shared with development partners

Last word for fellow researchers

  • Don’t be plumbers: getting details of implementation right is not our comparative advantage and not our job: our challenge is to identify what general ideas are portable across settings
  • Maintain an independent agenda: connecting impact evidence to more fundamental theories of development processes
  • That, perhaps, is where the biggest impacts will be

References

Banerjee, Abhijit, Esther Duflo, Nathanael Goldberg, Dean Karlan, Robert Osei, William Parienté, Jeremy Shapiro, Bram Thuysbaert, and Christopher Udry. 2015. “A Multifaceted Program Causes Lasting Progress for the Very Poor: Evidence from Six Countries.” Science 348 (6236): 1260799.
Bryan, Gharad, Shyamal Chowdhury, and Ahmed Mushfiq Mobarak. 2014. “Underinvestment in a Profitable Technology: The Case of Seasonal Migration in Bangladesh.” Econometrica 82 (5): 1671–1748.
Dunning, Thad, Guy Grossman, Macartan Humphreys, Susan D Hyde, Craig McIntosh, and Gareth Nellis. 2019. Information, Accountability, and Cumulative Learning: Lessons from Metaketa i. Cambridge University Press.
Meager, Rachael. 2019. “Understanding the Average Impact of Microcredit Expansions: A Bayesian Hierarchical Analysis of Seven Randomized Experiments.” American Economic Journal: Applied Economics 11 (1): 57–91.
Mobarak, Ahmed Mushfiq, Iffath Sharif, and Maheshwor Shrestha. 2023. “Returns to International Migration: Evidence from a Bangladesh-Malaysia Visa Lottery.” American Economic Journal: Applied Economics 15 (4): 353–88.