Replication reflections

macartan humphreys

Thoughts

  1. More standardization on the producer side?
  2. Journal issues 1: Errors
  3. Journal issues 2: Re-analyses and field replications

First

  • The most important dimension of replication, for our fields, is forward looking: the many site field replication design to cumulate knowledge.

  • I am going to talk here though about more backwards looking replications as these are of particular interest to journals and for individuals.

  • Note also third category: more learning from reanalysis of existing studies

The producer side

Making replication materials in political science

  • still very ad hoc approaches
  • many places
  • many file structures
  • not automatically verified
  • high access costs
  • shift to AI replication: risk that we outsource not just coding but understanding

The producer side: Introducing…

replicate_everything() (link)

with Vermon Washingotn and Cord Masche

  • code with data
  • code without data or with links to data
  • replication packages

Next stage

re-analysis tabs:

  • mark up code edits?
  • summary re-interpretations
  • live together
  • populate with my studies! or with a particular journal’s recent studies?

The hard part:

  • adoption?

Journal issues 1: Errors and Corrections

Error and corrections

I would like to live in a world in which:

  1. Critic spots error
  2. Critic informs author
  3. Author takes responsibility: checks, issues a correction on journal website and updates repo files. [Even better – track change edits to article!] Does not minimize issues. Thanks Critic in footnote.
  4. Critic happy with footnote. Speedy process.

Error and corrections

People think this will not / does not work because:

  1. Critics will not have incentives to spot mistakes.
  2. Critics deserve more credit for spotting mistakes.
  3. Authors will downplay importance of errors.

My thoughts: 1. Really? 2. Really? 3. For sure!

For 3: we may need threat of third party correction?

Status quo: Bloat risk

  • Critics dramatize risks
  • Focus on errors not overall reliability of results; Spend a lot of time cataloging minor errors
  • Add a lot of additional material to make for a full article– e.g. alternative analyses, checking for heterogeneity

Risks:

  • Dull articles, and lots of scope for new errors
  • Mediocre material passes on the coattails of the correction
  • Big push back from authors and ultimate confusion

Remedies?

  • Correction articles should be short! (two pages?)
  • Summarizing essential features and avoiding bloat
  • Contribution should be new code base more than article
  • Possibly provided as a markup of original?

Bigger question: What is the unit of research output in the age of AI

Journal issues 2: Extension / reapprisal replications

Extension / reapprisal replications

If about scope:

  • Should merit publication in their own right
  • Taking account of contribution to cumulation, as always

When is a re-analysis a correction?

Disagreement here: Some favor “gardens of forking paths.”

I worry a lot about:

  • Scope for negative fishing (unpreregistered re-analyses with many degrees of freedom)
  • A bad analysis debunking a reasonable analysis because it finds different results

I would like to see:

  • Expectation that re-analyses are justified on ex ante grounds. e.g. in DeclareDesign framework.

Ex ante justification

  • Home ground dominance. Holding the original constant (i.e., the home ground of the original study), if you can show that a new answer strategy \(A'\) yields better diagnosands than the original…

  • Robustness to alternative models. You can show that a new answer strategy is robust to both the original model and a new, also plausible, \(M'\)

  • Model plausibility. If the diagnosands for a design with \(A'\) are worse than those under \(M\) but better under \(M'\), then justify by showing \(M'\) more plausible than \(M'\)

A reckoning is coming

  • We are going to have to become more willing to accept errors

  • An article that replicates everything and catalogues a million errors, soon there will be

  • Very many errors in code and in interpretation found there will be

  • We’ll have to figure out how to sort through them, and it can’t be just machines talking to machines