On the Incubator for Collaborative and Transparent Economic Sciences | Macartan Humphreys
Stocktaking
A workflow-oriented wishlist
what_we_know.com
We started in a bad place
Lots of gains
Econstor
I4R
discussion paper series has 163 papers)AER today:
Open access | Open data with doi | Reproduced | Nice badge!
I tried this one yesterday
TeX
via shell pdflatex
One click
Not bad really
yielding:
What I’d love:
yielding:
And of course: ex ante we would then code and organize our files so as to make it work with replicate_everything
And even better: as a web interface to avoid local installations
Can someone build this?
Problem: We are all registering now but we don’t know what we need to register
Deeper issue: Fuzziness about what a design is
DeclareDesign
Main ideadesign_1
With DeclareDesign
designs can be quite compact and readable:
b <- .3
design_1 <-
declare_model(
N = 500,
u_0 = rnorm(N),
u_1 = rnorm(N),
potential_outcomes(Y ~ u_0 + Z*(b + u_1))) +
declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) +
declare_assignment(Z = complete_ra(N = N)) +
declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
declare_estimator(Y ~ Z, model = lm_robust)
This is a complete design declaration.
run_design(design_1)
diagnose_design(design_1)
Inquiry | Bias | RMSE | Power | Coverage | Mean Estimate | SD Estimate | Mean Se | Mean Estimand |
---|---|---|---|---|---|---|---|---|
ate | 0.00 | 0.10 | 0.78 | 0.97 | 0.30 | 0.11 | 0.11 | 0.30 |
(0.00) | (0.00) | (0.02) | (0.01) | (0.00) | (0.00) | (0.00) | (0.00) |
The DeclareDesign
idea is that the design object is the thing you register.
The design object captures:
It’s short, complete, interrogable.
This is being done but it is a lift.
Issues:
We are getting better at regisration but deviations are the norm.
How to spot and make sense of deviations?
We start with design_1
Then we implement and we do a few things differently:
In addition, we learn things from that data that we only speculated about at the design stage.
Try to represent the actually implemented research as an alternative design: design_2
Reconcile formally: compare_designs(design_1, design_2)
to:
If you provide design_2
, with adjusted inputs, along with design_1
then readers can compare them.
inquiry | estimator | diagnosand | design1 | design2 | difference |
---|---|---|---|---|---|
ate | estimator | bias | 0.00 | 0.00 | -0.00 |
(0.00) | (0.00) | (0.00) | |||
ate | estimator | coverage | 0.97 | 0.96 | -0.01 |
(0.00) | (0.00) | (0.00) | |||
ate | estimator | mean_estimand | 0.30 | 0.30 | 0.00 |
(0.00) | (0.00) | (0.00) | |||
ate | estimator | mean_estimate | 0.30 | 0.30 | -0.00 |
(0.00) | (0.00) | (0.00) | |||
ate | estimator | mean_se | 0.11 | 0.12 | 0.01* |
(0.00) | (0.00) | (0.00) | |||
ate | estimator | power | 0.79 | 0.74 | -0.05* |
(0.01) | (0.01) | (0.01) | |||
ate | estimator | rmse | 0.10 | 0.11 | 0.01* |
(0.00) | (0.00) | (0.00) | |||
ate | estimator | sd_estimate | 0.11 | 0.12 | 0.01* |
(0.00) | (0.00) | (0.00) | |||
ate | estimator | type_s_rate | 0.00 | 0.00 | 0.00 |
(0.00) | (0.00) | (0.00) |
Your reconciliation can clarify:
Proposal: attach both the original (registered) design and the reconciled design to your manuscript.
These design modification choices could just as easily be made by a different researcher who is re-analyzing your work
A researcher might propose altering the answer strategy, A. Currently common practice is to evaluate this decision based on how results change (robustness) and not (simply) based on ex ante properties.
Instead, justify reanalysis decision with respect to:
Home ground dominance. Holding the original model constant (the “home ground” of the original study), show that a new answer strategy yields better diagnosands than the original
Robustness to alternative models. Demonstrate that a new answer strategy is robust to both the original model and a new, also plausible, model
There seems to be a lot of confusion about when a (field) replication succeeds or fails in a heterogeneous world
Tests of the form:
don’t make too much sense given substantial heterogeneity
Refusing to test shifts to non-falsifiability
Obvious: the exercise presupposes we are interested in population estimands not sample estimands: so let’s own that
Less obvious: articles (findings) should be accompanied by explicit “generalization claims”: to what populations and what conditions and in how far do results plausible extend?
Take away: These are what need to be stated and tested
we have gotten better at coordinated trials:
Though they are almost never on random samples
A rolling trial would have:
A rolling trial would:
Will someone set these up?
Temptations to produce splashy headlines appear unbearable
Need some way to keep focus on cumulative findings: though seems very contrary to instincts
Dream: A combination of this
Need some way to keep focus on cumulative findings: though seems very contrary to instincts
Dream: And this
So we can collectively say: this is what the evidence says, about effects under these and those conditions
All at what_we_know.com