Lab\(^2\)

On the Incubator for Collaborative and Transparent Economic Sciences | Macartan Humphreys

1 Outline

  • A stocktaking
  • A wishlist

2 Punchlines

Stocktaking

  • Extraordinary gains over 10 years

A workflow-oriented wishlist

  1. Reproduction: Let’s get it out of the box
  2. Registration: let’s agree on a completeness criterion
  3. Reconciliation: let’s get this automated
  4. Reanalysis: let’s have re-analyses justified by design not data
  5. Replication (field replication): let’s confront scope condition confusion
  6. Cumulation: Make coordinated trials rolling
  7. Reporting: let’s communicate better what_we_know.com

2.1 Stocktaking

We started in a bad place

2.2 Stocktaking

Lots of gains

  • Much more open access; Econstor
  • Much greater access to reproduction data. Imperfect, but still major gains.
  • Serious reproduction initiatives (I4R discussion paper series has 163 papers)
  • Much greater reproduction success rates (I4R success rate > 50%)
  • Pre-publication reproduction increasingly common
  • Registration now mainstream
  • Priors information commonly gathered

2.3 Quick check

AER today:

2.4 Quick check

Open access | Open data with doi | Reproduced | Nice badge!

2.5 Implications

  • It’s getting better all the time
  • Let’s keep dreaming: Lab\(^2\) should think big and systematic

3 Reproduction out of the box

3.1 Vultures article

I tried this one yesterday

 doedit "196461-V1\CODE\build_data_run_analysis_complie_PDF.do" 
...
* EDIT THE PATH TO THE REPLICATION FOLDER HERE
* BUT ALSO IN 
* ~/CODE/Python/fuzzy_merge_water.PY
...

* MAKE SURE TO COPY ols_spatial_HAC_W.ado from ~/CODE/Stata/ADO to your local folder
  • So a little bumpy: manual edits; proprietary software; need python and stata installation, TeX via shell pdflatex
  • but it’s there and seems to work

3.2 Discourse article

 doedit "198744-V1\BVWYAnalysisAll.do" 

One click

Not bad really

3.3 What I’d love

library("replicate_anything")

replicate(issn = "0002-828",  what = "fig_1", from_raw = FALSE,  format = "html")

yielding:

# > Replication of fig_1 from Bartling et al 2024 using pre-processed data

3.4 Reproduction out of the box

What I’d love:

library("replicate_anything")

get_code(issn = "0002-828",  what = "tab_1", from_raw = FALSE,  formatted = FALSE)

yielding:

# Code for fig_1 from Bartling et al 2024 using pre-processed data

"
  read.csv("0002-828/processed/fig_1.csv") |> 
  ggplot(..)
  
"  

3.5 Reproduction out of the box

  • And of course: ex ante we would then code and organize our files so as to make it work with replicate_everything

  • And even better: as a web interface to avoid local installations

Can someone build this?

4 Registration completeness

Problem: We are all registering now but we don’t know what we need to register

Deeper issue: Fuzziness about what a design is

4.1 Registration completeness

4.2 DeclareDesign Main idea

  • Think of a research design as an interrogable object: design_1
  • Define: Models, Inquiries, Data Strategies, Answer Strategies
A A a d a d a m a m d D D d I I m m M m m your answer strategy your answer strategy simulated estimate estimate: the answer you'll get a conjectured estimand estimand: the answer you seek simulated data your data strategy your data strategy the data you'll get inquiry: the question you ask inquiry: the question you ask an imagined world an imagined world model: the worlds you consider the real world the real world Theory Empirics Simulations Design Reality

4.3 A design

With DeclareDesign designs can be quite compact and readable:

b <- .3

design_1 <- 
  
  declare_model(
    N = 500,
    u_0 = rnorm(N),
    u_1 = rnorm(N),
    potential_outcomes(Y ~ u_0 + Z*(b + u_1))) +
  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) + 
  declare_assignment(Z = complete_ra(N = N)) +
  declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
  declare_estimator(Y ~ Z, model = lm_robust)

4.4 A complete design

This is a complete design declaration.

  • it can be “run”: run_design(design_1)
  • and “diagnosed” diagnose_design(design_1)
Inquiry Bias RMSE Power Coverage Mean Estimate SD Estimate Mean Se Mean Estimand
ate 0.00 0.10 0.78 0.97 0.30 0.11 0.11 0.30
(0.00) (0.00) (0.02) (0.01) (0.00) (0.00) (0.00) (0.00)
  • We see the design has nice features: reasonable power, unbiasedness
  • Coverage is off though (why?)

4.5 Registration

The DeclareDesign idea is that the design object is the thing you register.

  • The design object captures:

    • your background assumptions about the conditions in which you are operating
    • what your question really is
    • how you plan to generate data
    • how you plan to analyze it
  • It’s short, complete, interrogable.

This is being done but it is a lift.

5 Reconciliation

Issues:

  • We are getting better at regisration but deviations are the norm.

  • How to spot and make sense of deviations?

5.1 In practice

  • We start with design_1

  • Then we implement and we do a few things differently:

    • We gather data on fewer cases than we expected (at random)
    • We have some missingness in outcome measures (not at random)
    • We end up focusing on analysis in a subgroup
  • In addition, we learn things from that data that we only speculated about at the design stage.

5.2 In practice

  • Try to represent the actually implemented research as an alternative design: design_2

  • Reconcile formally: compare_designs(design_1, design_2) to:

    • automate reconciliation
    • clarify nature of deviations
    • clarify implications of deviations
    • invite critique

5.3 Comparisons

If you provide design_2, with adjusted inputs, along with design_1 then readers can compare them.

inquiry estimator diagnosand design1 design2 difference
ate estimator bias 0.00 0.00 -0.00
(0.00) (0.00) (0.00)
ate estimator coverage 0.97 0.96 -0.01
(0.00) (0.00) (0.00)
ate estimator mean_estimand 0.30 0.30 0.00
(0.00) (0.00) (0.00)
ate estimator mean_estimate 0.30 0.30 -0.00
(0.00) (0.00) (0.00)
ate estimator mean_se 0.11 0.12 0.01*
(0.00) (0.00) (0.00)
ate estimator power 0.79 0.74 -0.05*
(0.01) (0.01) (0.01)
ate estimator rmse 0.10 0.11 0.01*
(0.00) (0.00) (0.00)
ate estimator sd_estimate 0.11 0.12 0.01*
(0.00) (0.00) (0.00)
ate estimator type_s_rate 0.00 0.00 0.00
(0.00) (0.00) (0.00)

5.4 Reconciliation and critique

Your reconciliation can clarify:

  1. how things are different
  2. whether things are different in relevant ways (e.g. under optimistic conditions)
  3. where conclusions in fact depend on the reasons for deviations

Proposal: attach both the original (registered) design and the reconciled design to your manuscript.

6 Design-based Critique

These design modification choices could just as easily be made by a different researcher who is re-analyzing your work

A researcher might propose altering the answer strategy, A. Currently common practice is to evaluate this decision based on how results change (robustness) and not (simply) based on ex ante properties.

7 Design-based Critique

Instead, justify reanalysis decision with respect to:

  • Home ground dominance. Holding the original model constant (the “home ground” of the original study), show that a new answer strategy yields better diagnosands than the original

  • Robustness to alternative models. Demonstrate that a new answer strategy is robust to both the original model and a new, also plausible, model

8 Replication

There seems to be a lot of confusion about when a (field) replication succeeds or fails in a heterogeneous world

8.1 Replication

Tests of the form:

  • Do both results have the same sign?
  • Is the original effect in the CI of the new result (or vice versa); or even
  • Can we reject the null that these are the same

don’t make too much sense given substantial heterogeneity

8.2 But

  • Refusing to test shifts to non-falsifiability

    • Does Posner et al’s results threaten Björkman and Svennson or not?
  • Obvious: the exercise presupposes we are interested in population estimands not sample estimands: so let’s own that

  • Less obvious: articles (findings) should be accompanied by explicit “generalization claims”: to what populations and what conditions and in how far do results plausible extend?

  • Take away: These are what need to be stated and tested

9 Cumulation

we have gotten better at coordinated trials:

9.1 Cumulation

Though they are almost never on random samples

A rolling trial would have:

  • admission criteria
  • priors on effects
  • sampling information
  • heterogeneity data

9.2 Cumulation

A rolling trial would:

  • provide a basis to determine ex ante if a trial should be implemented
  • imply a workflow designed to aggregate

Will someone set these up?

10 Reporting

Temptations to produce splashy headlines appear unbearable

10.1 Reporting

Need some way to keep focus on cumulative findings: though seems very contrary to instincts

Dream: A combination of this

10.2 Reporting

Need some way to keep focus on cumulative findings: though seems very contrary to instincts

Dream: And this

10.3 Reporting

So we can collectively say: this is what the evidence says, about effects under these and those conditions

All at what_we_know.com

11 So…

  • We are moving rapidly in the right direction
  • Still money on the table
  • Can some of these be done in a way that also makes the lives of researchers easier rather than harder