Experiments workflow

Macartan Humphreys

1 Experimentation: Workflows

Good questions studied well

1.1 Outline

Scope for experimentation
Ethics of experiments
Open science workflows

1.2 When to experiment

Prospects and priorities

1.2.1 Prospects

Whenever someone is uncertain about something they are doing (all the time)
Whenever someone hits scarcity constraints
When people have incentives to demonstrate that they are doing the right thing (careful…)

1.2.2 Prospects

Advice: If you can, start from theory and find an intervention, rather than the other way around.
Advice: If you can, go for structure rather than gimmicks
Advice: In attempts to parse, beware of generating unnatural interventions (how should a voter think of a politician that describes his policy towards Korea in detail but does not mention the economy? Is not mentioning the economy sending an unintended message?)

1.2.3 Innovative designs

Randomization of where police are stationed (India)
Randomization of how government tax collectors get paid (do they get a share?) (Pakistan)
Randomization of the voting rules for determining how decisions get made (Afghanistan)
Random assignment of populations to peacekeepers (Liberia)
Random assignment of ex-combatants out of their networks (Indonesia)
Randomization of students to ethnically homogeneous or ethnically diverse schools (anywhere?)

1.3 Ethics

1.3.1 Constraint: Is it ethical to manipulate subjects for research purposes?

There is no foundationless answer to this question. So let’s take some foundations from the Belmont report and seek to ensure:
1. Respect for persons
2. Beneficence
3. Justice
Unfortunately, operationalizing these requires further ethical theories. Let’s assume that (1) is operationalized by informed consent (a very liberal idea). We are a bit at sea for (2) and (3) (the Belmont report suggests something like a utilitarian solution).
The major focus on (1) by IRBs might follow from the view that if subjects consent, then they endorse the ethical calculations made for 2 and 3 — they think that it is good and fair.
This is a little tricky, though, since the study may not be good or fair because of implications for non-subjects.

1.3.2 Is it ethical to manipulate subjects for research purposes?

The problem is that many (many) field experiments have nothing like informed consent.
For example, whether the government builds a school in your village, whether an ad appears on your favorite radio show, and so on.
Consider three cases:
1. You work with a nonprofit to post (true?) posters about the crimes of politicians on billboards to see effects on voters
2. You hire confederates to offer bribes to police officers to see if they are more likely to bend the law for coethnics
3. The British government asks you to work on figuring out how the use of water cannons helps stop rioters rioting

1.3.3 Is it ethical to manipulate subjects for research purposes?

Consider three cases:
- You work with a nonprofit to post (true?) posters about the crimes of politicians on billboards to see effects on voters
- You hire confederates to offer bribes to police officers to see if they are more likely to bend the law for coethnics
- The British government asks you to work on figuring out how the use of water cannons helps stop rioters rioting
In all cases, there is no consent given by subjects.
In 2 and 3, the treatment is possibly harmful for subjects, and the results might also be harmful. But even in case 1, there could be major unintended harmful consequences.
In cases 1 and 3, however, the “intervention” is within the sphere of normal activities for the implementer.

1.3.4 Constraint: Is it ethical to manipulate subjects for research purposes?

Sometimes it is possible to use this point of difference to make a “spheres of ethics” argument for “embedded experimentation.”
Spheres of Ethics Argument: Experimental research that involves manipulations that are not normally appropriate for researchers may nevertheless be ethical if:
- Researchers and implementers agree on a division of responsibility where implementers take on responsibility for actions
- Implementers have legitimacy to make these decisions within the sphere of the intervention
- Implementers are indeed materially independent of researchers (no swapping hats)

1.3.5 Constraint: Is it ethical to manipulate subjects for research purposes?

Difficulty with this argument:
- Question begging: How to determine the legitimacy of the implementer? (Can we rule out Nazi doctors?)

Otherwise keep focus on consent and desist if this is not possible

1.4 Transparency & Experimentation

1.4.1 Transparent workflows

Experimental researchers are deeply engaged in the movement towards more transparency social science research.

Analytic replication. This should be a no brainer. Set everything up so that replication is easy. Use quarto rmarkdown, or similar. Or produce your replication code as a package.

1.4.2 Contentious Issues

Experimental researchers are deeply engaged in the movement towards more transparency social science research.

Contentious issues (mostly):

Data. How soon should you make your data available? My view: as soon as possibe. Along with working papers and before publication. Before it affects policy in any case. Own the ideas not the data.
- Hard core: no citation without (analytic) replication. Perhaps. Non-replicable results should not be influencing policy.
Where should you make your data available? Dataverse is focal for political science. Not personal website (mea culpa)
What data should you make available? Disagreement is over how raw your data should be. My view: as raw as you can but at least post cleaning and pre-manipulation.

1.4.3 Open science checklist

Experimental researchers are deeply engaged in the movement towards more transparency social science research.

Should you register?: Hard to find reasons against. But case strongest in testing phase rather than exploratory phase.
Registration: When should you register? My view: Before treatment assignment. (Not just before analysis, mea culpa)
Registration: Should you deviate from a preanalysis plan if you change your mind about optimal estimation strategies. My view: Yes, but make the case and describe both sets of results.

1.5 Pre-registration rationales and structures

1.5.1 Two distinct rationales for registration

File drawer bias (Publication bias)
Analysis bias (Fishing)

1.5.2 File drawer bias

– Say in truth \(X\) affects \(Y\) in 50% of cases.

– Researchers conduct multiple excellent studies. But they only write up the 50% that produce “positive” results.

– Even if each individual study is indisputably correct, the account in the research record – that X affects Y in 100% of cases – will be wrong.

1.5.3 File drawer bias

– Say in truth \(X\) affects \(Y\) in 50% of cases.

– Researchers conduct multiple excellent studies. But they only write up the 50% that produce “positive” results.

– Even if each individual study is indisputably correct, the account in the research record – that X affects Y in 100% of cases – will be wrong.

1.5.4 File drawer bias

Exacerbated by:

– Publication bias – the positive results get published

– Citation bias – the positive results get read and cited

– Chatter bias – the positive results gets blogged, tweeted and TEDed.

1.5.5 Analysis bias (Fishing)

– Say in truth \(X\) affects \(Y\) in 50% of cases.

– But say that researchers enjoy discretion to select measures for \(X\) or \(Y\), or enjoy discretion to select statistical models after seeing \(X\) and \(Y\) in each case.

– Then, with enough discretion, 100% of analyses may report positive effects, even if all studies get published.

1.5.6 Analysis bias (Fishing)

– Say in truth \(X\) affects \(Y\) in 50% of cases.

– But say that researchers enjoy discretion to select measures for \(X\) or \(Y\), or enjoy discretion to select statistical models after seeing \(X\) and \(Y\) in each case.

– Then, with enough discretion, 100% of analyses may report positive effects, even if all studies get published.

1.5.7 Analysis bias (Fishing)

– Try the exact fishy test An Exact Fishy Test (https://macartan.shinyapps.io/fish/)

– What’s the problem with this test?

1.5.8 Evidence-Proofing: Illustration

When your conclusions do not really depend on the data
Eg – some evidence will always support your proposition – some interpretation of evidence will always support your proposition
Knowing the mapping from data to inference in advance gives a handle on the false positive rate.

1.5.9 The scope for fishing

1.5.10 Evidence from political science

Source: Gerber and Malhotra

1.5.11 More evidence from TESS

Malhotra tracked 221 TESS studies.
20% of the null studies were published. 65% not even written up (file drawer or anticipation of publication bias)
60% of studies with strong results were published.

Implications are:

population of results not representative
(subtler) individual published studies are also more likely to be overestimates

1.5.12 The problem

Summary: we do not know when we can or cannot trust claims made by researchers.
[Not a tradition specific claim]

1.5.13 Registration as a possible solution

Simple idea:

It’s about communication:
just say what you are planning on doing before you do it
if you don’t have a plan, say that
If you do things differently from what you were planning to do, say that

1.6 Worries and Myths around registration

1.6.1 Myth: Concerns about fishing presuppose researcher dishonesty

Fishing can happen in very subtle ways, and may seem natural and justifiable.
Example:
- I am interested in whether more democratic institutions result in better educational outcomes.
- I examine the relationship between institutions and literacy and between institutions and school attendance.
- The attendance measure is significant and the literacy one is not. Puzzled, I look more carefully at the literacy measure and see various outliers and indications of measurement error. As I think more I realize too that literacy is a slow moving variable and may not be the best measure anyhow. I move forward and start to analyze the attendance measure only, perhaps conducting new tests, albeit with the same data.

1.6.2 Structural challenge

Our journal review process is largely organized around advising researchers how to adjust analysis in light of findings in the data.

1.6.3 Myth: Fishing is technique specific

Frequentists can do it
Bayesians can do it too.
Qualitative researchers can also do it.
You can even do it with descriptive statistics

1.6.4 Myth: Fishing is estimand specific

You can do it when estimating causal effects
You can do it when studying mechanisms
You can do it when estimating counts

1.6.5 Myth: Registration only makes sense for experimental studies, not for observational studies

The key distinction is between prospective and retrospective studies.
Not between experimental and observational studies.
A reason (from the medical literature) why registration is especially important for experiments: because you owe it to subjects
A reason why registration is less important for experiments: because it is more likely that the intended analysis is implied by the design in an experimental study. Researcher degrees of freedom may be greatest for observational qualitative analyses.

1.6.6 Worry: Registration will create administrative burdens for researchers, reviewers, and journals

Registration will produce some burden but does not require the creation of content that is not needed anyway
It does shift preparation of analyses forward
And it also can increase the burden of developing analyses plans even for projects that don’t work. But that is in part, the point.
Upside is that ultimate analyses may be much easier.

1.6.7 Worry: Registration will force people to implement analyses that they know are wrong

Most arguments for registration in social science advocate for non-binding registration, where deviations from designs are possible, though they should be described.
Even if it does not prevent them, a merit of registration is that it makes deviations visible.

1.6.8 Myth: Replication (or other transparency practices) obviates the need for registration

There are lots of good things to do, including replication.
Many of these do not substitute for each other. (How to interpret a fished replication of a fished analysis?)
And they may likely act as complements
Registration can clarify details of design and analysis and ensure early preparation of material. Indeed material needed for replication may be available even before data collection

1.6.9 Worry: Registration will put researchers at risk of scooping

But existing registries allow people to protect registered designs for some period
Registration may let researchers lay claim to a design

1.6.10 Worry: Registration will kill creativity

This is an empirical question. However, under a nonmandatory system researchers could:
Register a plan for structured exploratory analysis
Decide that exploration is at a sufficiently early stage that no substantive registration is possible and proceed without registration.

1.6.11 Implications:

In neither case would the creation of a registration facility prevent exploration.
What it might do is make it less credible for someone to claim that they have tested a proposition when in fact the proposition was developed using the data used to test it.
Registration communicates when researchers are angage in exploration or not. We love exploration and should be proud of it.

1.6.12 Punchline

Do it!
But if you have reasons to deviate, deviate transparently
Don’t implement bad analysis just because you pre-registered
Instead: reconcile

1.7 Reconciliation

Incentives and strategies

1.7.1 Reconciliation

Table 1: Illustration of an inquiry reconciliation table.
Inquiry	In the preanalysis plan	In the paper	In the appendix
Gender effect	X	X
Age effect			X

1.7.2 Reconciliation

Table 2: Illustration of an answer strategy reconciliation table.
Inquiry	Following A from the PAP	Following A from the paper	Notes
Gender effect	estimate = 0.6, s.e = 0.31	estimate = 0.6, s.e = 0.25	Difference due to change in control variables [provide cross references to tables and code]