Lab\(^2\)

On the Incubator for Collaborative and Transparent Economic Sciences | Macartan Humphreys

1 Outline

A stocktaking
A wishlist

2 Punchlines

Stocktaking

Extraordinary gains over 10 years

A workflow-oriented wishlist

Reproduction: Let’s get it out of the box
Registration: let’s agree on a completeness criterion
Reconciliation: let’s get this automated
Reanalysis: let’s have re-analyses justified by design not data
Replication (field replication): let’s confront scope condition confusion
Cumulation: Make coordinated trials rolling
Reporting: let’s communicate better what_we_know.com

2.1 Stocktaking

We started in a bad place

2.2 Stocktaking

Lots of gains

Much more open access; Econstor
Much greater access to reproduction data. Imperfect, but still major gains.
Serious reproduction initiatives (I4R discussion paper series has 163 papers)
Much greater reproduction success rates (I4R success rate > 50%)
Pre-publication reproduction increasingly common
Registration now mainstream
Priors information commonly gathered

2.3 Quick check

AER today:

2.4 Quick check

Open access | Open data with doi | Reproduced | Nice badge!

2.5 Implications

It’s getting better all the time
Let’s keep dreaming: Lab\(^2\) should think big and systematic

3 Reproduction out of the box

3.1 Vultures article

I tried this one yesterday

 doedit "196461-V1\CODE\build_data_run_analysis_complie_PDF.do" 
...
* EDIT THE PATH TO THE REPLICATION FOLDER HERE
* BUT ALSO IN 
* ~/CODE/Python/fuzzy_merge_water.PY
...

* MAKE SURE TO COPY ols_spatial_HAC_W.ado from ~/CODE/Stata/ADO to your local folder

So a little bumpy: manual edits; proprietary software; need python and stata installation, TeX via shell pdflatex
but it’s there and seems to work

3.2 Discourse article

 doedit "198744-V1\BVWYAnalysisAll.do"

One click

Not bad really

3.3 What I’d love

library("replicate_anything")

replicate(issn = "0002-828",  what = "fig_1", from_raw = FALSE,  format = "html")

yielding:

# > Replication of fig_1 from Bartling et al 2024 using pre-processed data

3.4 Reproduction out of the box

What I’d love:

library("replicate_anything")

get_code(issn = "0002-828",  what = "tab_1", from_raw = FALSE,  formatted = FALSE)

yielding:

# Code for fig_1 from Bartling et al 2024 using pre-processed data

"
  read.csv("0002-828/processed/fig_1.csv") |> 
  ggplot(..)
  
"

3.5 Reproduction out of the box

And of course: ex ante we would then code and organize our files so as to make it work with replicate_everything
And even better: as a web interface to avoid local installations

Can someone build this?

4 Registration completeness

Problem: We are all registering now but we don’t know what we need to register

Deeper issue: Fuzziness about what a design is

4.1 Registration completeness

4.2 `DeclareDesign` Main idea

Think of a research design as an interrogable object: design_1
Define: Models, Inquiries, Data Strategies, Answer Strategies

4.3 A design

With DeclareDesign designs can be quite compact and readable:

b <- .3

design_1 <- 
  
  declare_model(
    N = 500,
    u_0 = rnorm(N),
    u_1 = rnorm(N),
    potential_outcomes(Y ~ u_0 + Z*(b + u_1))) +
  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) + 
  declare_assignment(Z = complete_ra(N = N)) +
  declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
  declare_estimator(Y ~ Z, model = lm_robust)

4.4 A complete design

This is a complete design declaration.

it can be “run”: run_design(design_1)
and “diagnosed” diagnose_design(design_1)

Inquiry	Bias	RMSE	Power	Coverage	Mean Estimate	SD Estimate	Mean Se	Mean Estimand
ate	0.00	0.10	0.78	0.97	0.30	0.11	0.11	0.30
	(0.00)	(0.00)	(0.02)	(0.01)	(0.00)	(0.00)	(0.00)	(0.00)

We see the design has nice features: reasonable power, unbiasedness
Coverage is off though (why?)

4.5 Registration

The DeclareDesign idea is that the design object is the thing you register.

The design object captures:
- your background assumptions about the conditions in which you are operating
- what your question really is
- how you plan to generate data
- how you plan to analyze it
It’s short, complete, interrogable.

This is being done but it is a lift.

5 Reconciliation

Issues:

We are getting better at regisration but deviations are the norm.
How to spot and make sense of deviations?

5.1 In practice

We start with design_1
Then we implement and we do a few things differently:
- We gather data on fewer cases than we expected (at random)
- We have some missingness in outcome measures (not at random)
- We end up focusing on analysis in a subgroup
In addition, we learn things from that data that we only speculated about at the design stage.

5.2 In practice

Try to represent the actually implemented research as an alternative design: design_2
Reconcile formally: compare_designs(design_1, design_2) to:
- automate reconciliation
- clarify nature of deviations
- clarify implications of deviations
- invite critique

5.3 Comparisons

If you provide design_2, with adjusted inputs, along with design_1 then readers can compare them.

inquiry	estimator	diagnosand	design1	design2	difference
ate	estimator	bias	0.00	0.00	-0.00
			(0.00)	(0.00)	(0.00)
ate	estimator	coverage	0.97	0.96	-0.01
			(0.00)	(0.00)	(0.00)
ate	estimator	mean_estimand	0.30	0.30	0.00
			(0.00)	(0.00)	(0.00)
ate	estimator	mean_estimate	0.30	0.30	-0.00
			(0.00)	(0.00)	(0.00)
ate	estimator	mean_se	0.11	0.12	0.01*
			(0.00)	(0.00)	(0.00)
ate	estimator	power	0.79	0.74	-0.05*
			(0.01)	(0.01)	(0.01)
ate	estimator	rmse	0.10	0.11	0.01*
			(0.00)	(0.00)	(0.00)
ate	estimator	sd_estimate	0.11	0.12	0.01*
			(0.00)	(0.00)	(0.00)
ate	estimator	type_s_rate	0.00	0.00	0.00
			(0.00)	(0.00)	(0.00)

5.4 Reconciliation and critique

Your reconciliation can clarify:

how things are different
whether things are different in relevant ways (e.g. under optimistic conditions)
where conclusions in fact depend on the reasons for deviations

Proposal: attach both the original (registered) design and the reconciled design to your manuscript.

6 Design-based Critique

These design modification choices could just as easily be made by a different researcher who is re-analyzing your work

A researcher might propose altering the answer strategy, A. Currently common practice is to evaluate this decision based on how results change (robustness) and not (simply) based on ex ante properties.

7 Design-based Critique

Instead, justify reanalysis decision with respect to:

Home ground dominance. Holding the original model constant (the “home ground” of the original study), show that a new answer strategy yields better diagnosands than the original
Robustness to alternative models. Demonstrate that a new answer strategy is robust to both the original model and a new, also plausible, model

8 Replication

There seems to be a lot of confusion about when a (field) replication succeeds or fails in a heterogeneous world

8.1 Replication

Tests of the form:

Do both results have the same sign?
Is the original effect in the CI of the new result (or vice versa); or even
Can we reject the null that these are the same

don’t make too much sense given substantial heterogeneity

8.2 But

Refusing to test shifts to non-falsifiability
- Does Posner et al’s results threaten Björkman and Svennson or not?
Obvious: the exercise presupposes we are interested in population estimands not sample estimands: so let’s own that
Less obvious: articles (findings) should be accompanied by explicit “generalization claims”: to what populations and what conditions and in how far do results plausible extend?
Take away: These are what need to be stated and tested

9 Cumulation

we have gotten better at coordinated trials:

9.1 Cumulation

Though they are almost never on random samples

A rolling trial would have:

admission criteria
priors on effects
sampling information
heterogeneity data

9.2 Cumulation

A rolling trial would:

provide a basis to determine ex ante if a trial should be implemented
imply a workflow designed to aggregate

Will someone set these up?

10 Reporting

Temptations to produce splashy headlines appear unbearable

10.1 Reporting

Need some way to keep focus on cumulative findings: though seems very contrary to instincts

Dream: A combination of this

10.2 Reporting