Experiments

Experimenting

Macartan Humphreys

0.1 What is an experiment?

Cox & Reid (2000) define experiments as:

investigations in which an intervention, in all its essential elements, is under the control of the investigator.

Two types of control:

control over assignment to treatment and
control over the treatment itself

0.2 Two types of control:

0.3 More broadly:

Experimental studies use research designs in which the researchers uses:

interventional data strategies (to control setting or treatments)
often with causal inquiries
often with design based answer strategies

1 Plan

Let’s discuss:

The four parts to any design
The stages of an experimental design
Examples of the kinds of questions people address
The ethics of intervening

Then: Deep dive into discussion of actual experiments

Then: Plans for our own

2 The MIDA Framework

2.1 Four elements of any research design

Model: set of models of what causes what and how
Inquiry: a question stated in terms of the model
Data strategy: the set of procedures we use to gather information from the world (sampling, assignment, measurement)
Answer strategy: how we summarize the data produced by the data strategy

2.2 Four elements of any research design

2.3 Declaration

Design declaration is telling the computer (and readers) what M, I, D, and A are.

2.4 Diagnosis

Design diagnosis is figuring out how the design will perform under imagined conditions.
Estimating “diagnosands” like power, bias, rmse, error rates, ethical harm, “amount learned”.
Diagnosis takes account of model uncertainty: it aims to identify models for which the design works well and models for which it does not

2.5 Redesign

Redesign is the fine-tuning of features of the data- and answer strategies to understand how changing them affects the diagnosands

Different sample sizes
Different randomization procedures
Different estimation strategies
Implementation: effort into compliance versus more effort into sample size

2.6 Very often you have to simulate!

Doing all this is often too hard to work out from rules of thumb or power calculators
Specialized formulas exist for some diagnosands, but not all

3 Experimentation: Stages

Find a good question: Minimally, find an \(X\) and a \(Y\).
- \(X\)-based motivations
- \(Y\) based motivations
- Theory based, hypothesis based, mechanism based, replication focused
Survey literatures, identify contribution.
Identify partners, resources. Perhaps run exploratory pilot analyses.
Develop a design: Data strategy | Measurement strategy | Answer strategy
Simulate the design: Power analysis, Confirmation of properties of estimators
Present! gather feedback
Gather priors: what do others expect will happen?
Apply for ethics approval
Ensure compliance with other provisions (e.g. GDPR)!
Register a pre-analysis plan (e.g OSF, AEA)

4 Experimentation: Stages

Conduct Baseline (if needed): With live data quality control!
Implement randomization
Gather process observations
Implement Endline. With live data quality control!
Run analysis; check analysis
Generate key tables
Present to colleagues, but also present to any stakeholders
Prep replication materials. Make data and code available to others
Writeup
Submit [Get rejected - Submit again]

5 Experimentation: Questions

Good questions studied well

Prospects and priorities

5.1 Prospects

Whenever someone is uncertain about something they are doing (all the time)
Whenever someone hits scarcity constraints
When people have incentives to demonstrate that they are doing the right thing (careful…)

5.2 Prospects

If you can, start from theory and find an intervention, rather than the other way around.
If you can, go for structure rather than gimmicks
In attempts to parse, beware of generating unnatural interventions (how should a voter think of a politician that describes his policy towards Korea in detail but does not mention the economy? Is not mentioning the economy sending an unintended message?)

5.3 Innovative designs

Randomization of

encouragements to gain citizenship (New York)
where police are stationed (India)
how government tax collectors get paid (do they get a share?) (Pakistan)
the voting rules for determining how decisions get made (Afghanistan)
of populations to peacekeepers (Liberia)
of ex-combatants out of their networks (Indonesia)
students to ethnically homogeneous or ethnically diverse schools (Ethiopia)

6 Ethics

6.1 Constraint: Is it ethical to manipulate subjects for research purposes?

There is no foundationless answer to this question.
Belmont principles commonly used for guidance:
1. Respect for persons
2. Beneficence
3. Justice
Unfortunately, operationalizing these requires further ethical theories. (1) is often operationalized by informed consent (a very liberal idea). (2) and (3) sometimes by more utiliarian principles
The major focus on (1) by IRBs might follow from the view that if subjects consent, then they endorse the ethical calculations made for 2 and 3 — they think that it is good and fair.
Trickiness: can a study be good or fair because of implications for non-subjects?

6.2 Is it ethical to manipulate subjects for research purposes?

Many (many) field experiments have nothing like informed consent.
For example, whether the government builds a school in your village, whether an ad appears on your favorite radio show, and so on.
Consider three cases:
1. You work with a nonprofit to post (true?) posters about the crimes of politicians on billboards to see effects on voters
2. You hire confederates to offer bribes to police officers to see if they are more likely to bend the law for coethnics
3. The British government asks you to work on figuring out how the use of water cannons helps stop rioters rioting

6.3 Is it ethical to manipulate subjects for research purposes?

Consider three cases:
- You work with a nonprofit to post (true?) posters about the crimes of politicians on billboards to see effects on voters
- You hire confederates to offer bribes to police officers to see if they are more likely to bend the law for coethnics
- The British government asks you to work on figuring out how the use of water cannons helps stop rioters rioting
In all cases, there is no consent given by subjects.
In 2 and 3, the treatment is possibly harmful for subjects, and the results might also be harmful. But even in case 1, there could be major unintended harmful consequences.
In cases 1 and 3, however, the “intervention” is within the sphere of normal activities for the implementer.

6.4 Constraint: Is it ethical to manipulate subjects for research purposes?

Sometimes it is possible to use this point of difference to make a “spheres of ethics” argument for “embedded experimentation.”
Spheres of Ethics Argument: Experimental research that involves manipulations that are not normally appropriate for researchers may nevertheless be ethical if:
- Researchers and implementers agree on a division of responsibility where implementers take on responsibility for actions
- Implementers have legitimacy to make these decisions within the sphere of the intervention
- Implementers are indeed materially independent of researchers (no swapping hats)

6.5 Constraint: Is it ethical to manipulate subjects for research purposes?

Difficulty with this argument:
- Question begging: How to determine the legitimacy of the implementer? (Can we rule out Nazi doctors?)

Otherwise keep focus on consent and desist if this is not possible

6.6 APSA Guidelines

Available here
Used e.g. by APSR
Below is lightly abbreviated; full text however has detailed guidelines

6.7 APSA Ethics: General [Abbr]

Political science researchers should respect autonomy, consider the wellbeing of participants and other people affected by their research, and be open about the ethical issues they face.
Political science researchers have an individual responsibility to consider the ethics of their research related activities and cannot outsource ethical reflection to review boards, other institutional bodies, or regulatory agencies.
These principles describe the standards of conduct and reflexive openness that are expected of political science researchers. … [In cases of reasonable deviations], researchers should acknowledge and justify deviations in scholarly publications and presentations of their work.

6.8 APSA Ethics: Power

When designing and conducting research, political scientists should be aware of power differentials between researcher and researched, and the ways in which such power differentials can affect the voluntariness of consent and the evaluation of risk and benefit.

especially with low-power or vulnerable participants
covert or deceptive research with more than minimal harm may sometimes be ethically permissible in research with powerful parties

6.10 APSA Ethics: Deception

Political science researchers should carefully consider any use of deception and the ways in which deception can conflict with participant autonomy.

ask: is it plausible that engaged individuals would withhold consent if fully informed consent were sought
disclose, justify,, and describe steps taken to respect participant autonomy.

[Note: no general injunction against]

6.11 APSA Ethics: Harm and Trauma

Political science researchers should consider the harms associated with their research.

Researchers should generally avoid harm when possible, minimize harm when avoidance is not possible, and not conduct research when harm is excessive.
do not limit concern to physical and psychological risks to the participant.

Political science researchers should anticipate and protect individual participants from trauma stemming from participation in research.

6.12 APSA Ethics: Confidentiality

Political science researchers should generally keep the identities of research participants confidential; when circumstances require, researchers should adopt the higher standard of ensuring anonymity.

Researchers should clearly communicate assurances of confidentiality / anonymity
If confidentiality bit provided, communicate this and justify c./d. consider risks at all stages
Researchers who determine that it would be unethical to share materials derived from human subjects should be prepared to justify their decision to journal editors, to reviewers, etc

6.13 APSA Ethics: Impact

Political science researchers conducting studies on political processes should consider the broader social impacts of the research process as well as the impact on the experience of individuals directly engaged by the research. In general, political science researchers should not compromise the integrity of political processes for research purposes without the consent of individuals that are directly engaged by the research process.

cases in which research that produces impacts on political processes without consent of individuals directly engaged by the research might be appropriate. [examples]
Studies of interventions by third parties do not usually invoke this principle on impact. [details]
This principle is not intended to discourage any form of political engagement by political scientists in their non-research activities or private lives.
researchers should report likely impacts

6.14 APSA Ethics: Laws, Regulations, and Prospective Review

Political science researchers should be aware of relevant laws and regulations governing their research related activities.

6.15 APSA Ethics: Shared Responsibility

The responsibility to promote ethical research goes beyond the individual researcher or research team.

Mentors, advisors, dissertation committee members, and instructors
Graduate programs in political science should include ethics instruction in their formal and informal graduate curricula;
Editors and reviewers should encourage researchers to be open about the ethical decisions …
Journals, departments, and associations should incorporate ethical commitments into their mission, bylaws, instruction, practices, and procedures.

7 Appendix

8 Transparency & Experimentation

8.1 Transparent workflows

Experimental researchers are deeply engaged in the movement towards more transparency social science research.

Analytic replication. This should be a no brainer. Set everything up so that replication is easy. Use quarto rmarkdown, or similar. Or produce your replication code as a package.

8.2 Contentious Issues

Experimental researchers are deeply engaged in the movement towards more transparency social science research.

Contentious issues (mostly):

Data. How soon should you make your data available? My view: as soon as possibe. Along with working papers and before publication. Before it affects policy in any case. Own the ideas not the data.
- Hard core: no citation without (analytic) replication. Perhaps. Non-replicable results should not be influencing policy.
Where should you make your data available? Dataverse is focal for political science. Not personal website (mea culpa)
What data should you make available? Disagreement is over how raw your data should be. My view: as raw as you can but at least post cleaning and pre-manipulation.

8.3 Open science checklist

Experimental researchers are deeply engaged in the movement towards more transparency social science research.

Should you register?: Hard to find reasons against. But case strongest in testing phase rather than exploratory phase.
Registration: When should you register? My view: Before treatment assignment. (Not just before analysis, mea culpa)
Registration: Should you deviate from a preanalysis plan if you change your mind about optimal estimation strategies. My view: Yes, but make the case and describe both sets of results.

8.4 Two distinct rationales for registration

File drawer bias (Publication bias)
Analysis bias (Fishing)

8.5 File drawer bias

– Say in truth \(X\) affects \(Y\) in 50% of cases.

– Researchers conduct multiple excellent studies. But they only write up the 50% that produce “positive” results.

– Even if each individual study is indisputably correct, the account in the research record – that X affects Y in 100% of cases – will be wrong.

8.6 File drawer bias

– Say in truth \(X\) affects \(Y\) in 50% of cases.

– Researchers conduct multiple excellent studies. But they only write up the 50% that produce “positive” results.

– Even if each individual study is indisputably correct, the account in the research record – that X affects Y in 100% of cases – will be wrong.

8.7 File drawer bias

Exacerbated by:

– Publication bias – the positive results get published

– Citation bias – the positive results get read and cited

– Chatter bias – the positive results gets blogged, tweeted and TEDed.

8.8 Analysis bias (Fishing)

– Say in truth \(X\) affects \(Y\) in 50% of cases.

– But say that researchers enjoy discretion to select measures for \(X\) or \(Y\), or enjoy discretion to select statistical models after seeing \(X\) and \(Y\) in each case.

– Then, with enough discretion, 100% of analyses may report positive effects, even if all studies get published.

8.9 Analysis bias (Fishing)

– Say in truth \(X\) affects \(Y\) in 50% of cases.

– But say that researchers enjoy discretion to select measures for \(X\) or \(Y\), or enjoy discretion to select statistical models after seeing \(X\) and \(Y\) in each case.

– Then, with enough discretion, 100% of analyses may report positive effects, even if all studies get published.

8.10 Analysis bias (Fishing)

– Try the exact fishy test An Exact Fishy Test (https://macartan.shinyapps.io/fish/)

– What’s the problem with this test?

8.11 Evidence-Proofing: Illustration

When your conclusions do not really depend on the data
Eg – some evidence will always support your proposition – some interpretation of evidence will always support your proposition
Knowing the mapping from data to inference in advance gives a handle on the false positive rate.

8.12 The scope for fishing

8.13 Evidence from political science

Source: Gerber and Malhotra

8.14 More evidence from TESS

Malhotra tracked 221 TESS studies.
20% of the null studies were published. 65% not even written up (file drawer or anticipation of publication bias)
60% of studies with strong results were published.

Implications are:

population of results not representative
(subtler) individual published studies are also more likely to be overestimates

8.15 The problem

Summary: we do not know when we can or cannot trust claims made by researchers.
[Not a tradition specific claim]

8.16 Registration as a possible solution

Simple idea:

It’s about communication:
just say what you are planning on doing before you do it
if you don’t have a plan, say that
If you do things differently from what you were planning to do, say that

8.17 Worries and Myths

Lots of misunderstandings around registration

8.18 Myth: Concerns about fishing presuppose researcher dishonesty

Fishing can happen in very subtle ways, and may seem natural and justifiable.
Example:
- I am interested in whether more democratic institutions result in better educational outcomes.
- I examine the relationship between institutions and literacy and between institutions and school attendance.
- The attendance measure is significant and the literacy one is not. Puzzled, I look more carefully at the literacy measure and see various outliers and indications of measurement error. As I think more I realize too that literacy is a slow moving variable and may not be the best measure anyhow. I move forward and start to analyze the attendance measure only, perhaps conducting new tests, albeit with the same data.

8.19 Structural challenge

Our journal review process is largely organized around advising researchers how to adjust analysis in light of findings in the data.

8.20 Myth: Fishing is technique specific

Frequentists can do it
Bayesians can do it too.
Qualitative researchers can also do it.
You can even do it with descriptive statistics

8.21 Myth: Fishing is estimand specific

You can do it when estimating causal effects
You can do it when studying mechanisms
You can do it when estimating counts

8.22 Myth: Registration only makes sense for experimental studies, not for observational studies

The key distinction is between prospective and retrospective studies.
Not between experimental and observational studies.
A reason (from the medical literature) why registration is especially important for experiments: because you owe it to subjects
A reason why registration is less important for experiments: because it is more likely that the intended analysis is implied by the design in an experimental study. Researcher degrees of freedom may be greatest for observational qualitative analyses.

8.23 Worry: Registration will create administrative burdens for researchers, reviewers, and journals

Registration will produce some burden but does not require the creation of content that is not needed anyway
It does shift preparation of analyses forward
And it also can increase the burden of developing analyses plans even for projects that don’t work. But that is in part, the point.
Upside is that ultimate analyses may be much easier.

8.24 Worry: Registration will force people to implement analyses that they know are wrong

Most arguments for registration in social science advocate for non-binding registration, where deviations from designs are possible, though they should be described.
Even if it does not prevent them, a merit of registration is that it makes deviations visible.

8.25 Myth: Replication (or other transparency practices) obviates the need for registration

There are lots of good things to do, including replication.
Many of these do not substitute for each other. (How to interpret a fished replication of a fished analysis?)
And they may likely act as complements
Registration can clarify details of design and analysis and ensure early preparation of material. Indeed material needed for replication may be available even before data collection

8.26 Worry: Registration will put researchers at risk of scooping

But existing registries allow people to protect registered designs for some period
Registration may let researchers lay claim to a design

8.27 Worry: Registration will kill creativity

This is an empirical question. However, under a nonmandatory system researchers could:
Register a plan for structured exploratory analysis
Decide that exploration is at a sufficiently early stage that no substantive registration is possible and proceed without registration.

8.28 Implications:

In neither case would the creation of a registration facility prevent exploration.
What it might do is make it less credible for someone to claim that they have tested a proposition when in fact the proposition was developed using the data used to test it.
Registration communicates when researchers are angage in exploration or not. We love exploration and should be proud of it.

8.29 Punchline

Do it!
But if you have reasons to deviate, deviate transparently
Don’t implement bad analysis just because you pre-registered
Instead: reconcile

8.30 Reconciliation

Incentives and strategies

8.31 Reconciliation

Table 1: Illustration of an inquiry reconciliation table.

Inquiry	In the preanalysis plan	In the paper	In the appendix
Gender effect	X	X
Age effect			X

8.32 Reconciliation

Table 2: Illustration of an answer strategy reconciliation table.

Inquiry	Following A from the PAP	Following A from the paper	Notes
Gender effect	estimate = 0.6, s.e = 0.31	estimate = 0.6, s.e = 0.25	Difference due to change in control variables [provide cross references to tables and code]

Experiments

0.1 What is an experiment?

0.2 Two types of control:

0.3 More broadly:

1 Plan

2 The MIDA Framework

2.1 Four elements of any research design

2.2 Four elements of any research design

2.3 Declaration

2.4 Diagnosis

2.5 Redesign

2.6 Very often you have to simulate!

3 Experimentation: Stages

4 Experimentation: Stages

5 Experimentation: Questions

5.1 Prospects

5.2 Prospects

5.3 Innovative designs

6 Ethics

6.1 Constraint: Is it ethical to manipulate subjects for research purposes?

6.2 Is it ethical to manipulate subjects for research purposes?

6.3 Is it ethical to manipulate subjects for research purposes?

6.4 Constraint: Is it ethical to manipulate subjects for research purposes?

6.5 Constraint: Is it ethical to manipulate subjects for research purposes?

6.6 APSA Guidelines

6.7 APSA Ethics: General [Abbr]

6.8 APSA Ethics: Power

6.9 APSA Ethics: Consent

6.10 APSA Ethics: Deception

6.11 APSA Ethics: Harm and Trauma

6.12 APSA Ethics: Confidentiality

6.13 APSA Ethics: Impact

6.14 APSA Ethics: Laws, Regulations, and Prospective Review

6.15 APSA Ethics: Shared Responsibility

7 Appendix

8 Transparency & Experimentation

8.1 Transparent workflows

8.2 Contentious Issues

8.3 Open science checklist

8.4 Two distinct rationales for registration

8.5 File drawer bias

8.6 File drawer bias

8.7 File drawer bias

8.8 Analysis bias (Fishing)

8.9 Analysis bias (Fishing)

8.10 Analysis bias (Fishing)

8.11 Evidence-Proofing: Illustration

8.12 The scope for fishing

8.13 Evidence from political science

8.14 More evidence from TESS

8.15 The problem

8.16 Registration as a possible solution

8.17 Worries and Myths

8.18 Myth: Concerns about fishing presuppose researcher dishonesty

8.19 Structural challenge

8.20 Myth: Fishing is technique specific

8.21 Myth: Fishing is estimand specific

8.22 Myth: Registration only makes sense for experimental studies, not for observational studies

8.23 Worry: Registration will create administrative burdens for researchers, reviewers, and journals

8.24 Worry: Registration will force people to implement analyses that they know are wrong

8.25 Myth: Replication (or other transparency practices) obviates the need for registration

8.26 Worry: Registration will put researchers at risk of scooping

8.27 Worry: Registration will kill creativity

8.28 Implications:

8.29 Punchline

8.30 Reconciliation

8.31 Reconciliation

8.32 Reconciliation