compare_designs: Using design declaration for reconciliation and critique

Graeme Blair, Alex Coppock, Macartan Humphreys

2 June 2021: CAS Workshop on Prediction, Registration, and Replication of Scientific Findings

Roadmap

For more: https://declaredesign.org/

Main idea

The MIDA framework

Many research designs can be represented as positions or decisions with respect to: Models, Inquiries, Data Strategies, Answer Strategies.

MIDA as a DAG

A design

b <- .3

design_1 <- 
  
  declare_model(
    N = 500,
    u_0 = rnorm(N),
    u_1 = rnorm(N),
    potential_outcomes(Y ~ u_0 + Z*(b + u_1))) +
  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) + 
  declare_assignment(Z = complete_ra(N = N)) +
  declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
  declare_estimator(Y ~ Z, model = lm_robust)

This is a complete design declaration.

Diagnosis

diagnose_design(design_1)
Inquiry Bias RMSE Power Coverage Mean Estimate SD Estimate Mean Se Mean Estimand
ate 0.00 0.10 0.78 0.97 0.30 0.11 0.11 0.30
(0.00) (0.00) (0.02) (0.01) (0.00) (0.00) (0.00) (0.00)

Registration

The DeclareDesign idea is that the design object is the thing you register.

In practice

(Note: not D, though the data \(d=D(m)\) might change because of changes in M)

Data deviations

Our data might have looked like this:

df <- design_1 %>% redesign(b = .2) %>% draw_data()

But instead it looks like this:

df_observed <- 
  df %>%
  dplyr::filter(simple_ra(n(), prob = .8) == 1) %>%
  mutate(Male = simple_ra(n())==1) %>%
  mutate(Y  = ifelse(Y_Z_1 < 0 & Male, NA, Y))

The differences are:

Analysis deviations

Our analysis should have looked like this:

get_estimates(design_1, df)
estimate std.error statistic p.value conf.low conf.high df
0.13 0.11 1.22 0.22 -0.08 0.34 498

Might have looked like this:

get_estimates(design_1, df_observed)
estimate std.error statistic p.value conf.low conf.high df
0.26 0.13 1.91 0.06 -0.01 0.52 322

But instead looks like this:

get_estimates(design_1, df_observed %>% filter(Male)) 
estimate std.error statistic p.value conf.low conf.high df
0.77 0.15 5.04 0 0.46 1.07 108

So what to do?

Our analysis has produced a very misleading answer that stems in part from our analysis decision and in part from unexpected selection biases arising during data collection.

Two key ideas:

  1. The deviations come from two sources: factors that we can control and factors that we cannot control
  2. We can reconcile our actual with our planned design in a way that opens up avenues for critique, rather than closing things down.

Let’s see this kind of reconciliation in operation.

Adjusting beliefs I

First, knowing what we think we know now about challenges in the field, we could provide more realistic inputs to the design. Consider:

design_2 <- 
  
  declare_model(
    N = 500,
    u_0 = rnorm(N),
    u_1 = rnorm(N),
    potential_outcomes(Y ~ u_0 + Z*(b + u_1))) +
  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) + 
  declare_assignment(Z = complete_ra(N = N)) +
  declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
  declare_sampling(S = complete_rs(N = N, prob = .9)) +
  declare_estimator(Y ~ Z, model = lm_robust)

This design differs by allowing for failure to hit targets in data gathering.

This is treated as a deviation that we did not intend but that we do understand.

Comparisons

If you provide design_2 along with design_1 then readers can compare them.

There are a set of tools that automate the comparison of the reconciled design to the original

Compare code

compare_design_code(design_1, design_2, rmd = TRUE, format = "auto", pager = "on", context = 0)
@@ 0 @@
@@ 9,2 @@
~
>
sampling
~
>
"declare_sampling(S = complete_rs(N = N, prob = 0.9))"

Compare data structures

compare_design_data(design_1, design_2, rmd = TRUE, format = "html")
@@ 1,8 @@
@@ 1,9 @@
<
'data.frame': 500 obs. of 7 variables:
>
'data.frame': 450 obs. of 8 variables:
 
$ ID : chr "001" "002" "003" "004" ...
 
$ ID : chr "001" "002" "003" "004" ...
 
$ u_0 : num -0.294 -0.563 0.378 -0.143 0.224 ...
 
$ u_0 : num -0.294 -0.563 0.378 -0.143 0.224 ...
 
$ u_1 : num -0.933 -0.979 0.67 1.328 -0.766 ...
 
$ u_1 : num -0.933 -0.979 0.67 1.328 -0.766 ...
 
$ Y_Z_0: num -0.294 -0.563 0.378 -0.143 0.224 ...
 
$ Y_Z_0: num -0.294 -0.563 0.378 -0.143 0.224 ...
 
$ Y_Z_1: num -0.927 -1.243 1.347 1.485 -0.243 ...
 
$ Y_Z_1: num -0.927 -1.243 1.347 1.485 -0.243 ...
 
$ Z : int 1 0 1 1 0 1 0 1 1 1 ...
 
$ Z : int 1 0 1 1 0 1 0 1 1 1 ...
 
$ Y : num -0.927 -0.563 1.347 1.485 0.224 ...
 
$ Y : num -0.927 -0.563 1.347 1.485 0.224 ...
~
>
$ S : num 1 1 1 1 1 1 1 1 1 1 ...

Compare design runs

compare_design_summaries(design_1, design_2, rmd = TRUE, format = "html", pager = "on")
@@ 1,58 @@
@@ 1,132 @@
 
 
 
 
 
Design Summary
 
Design Summary
 
 
 
 
 
Step 1 (model): declare_model(N = 500, u_0 = rnorm(N), u_1 = rnorm(N), potential_outcomes(Y ~ u_0 + Z * (b + u_1)))
 
Step 1 (model): declare_model(N = 500, u_0 = rnorm(N), u_1 = rnorm(N), potential_outcomes(Y ~ u_0 + Z * (b + u_1)))
 
 
 
 
 
N = 500
 
N = 500
 
 
 
 
 
Added variable: ID
 
Added variable: ID
 
N_missing N_unique class
 
N_missing N_unique class
 
0 500 character
 
0 500 character
 
 
 
 
 
Added variable: u_0
 
Added variable: u_0
 
min median mean max sd N_missing N_unique
 
min median mean max sd N_missing N_unique
<
-2.77 -0.11 -0.06 3.28 1 0 500
>
-3.04 0 0.04 3.14 1.01 0 500
 
 
 
 
 
Added variable: u_1
 
Added variable: u_1
 
min median mean max sd N_missing N_unique
 
min median mean max sd N_missing N_unique
<
-2.89 -0.05 -0.05 2.59 1.01 0 500
>
-3.43 0.06 0.03 3.22 1.02 0 500
 
 
 
 
 
Added variable: Y_Z_0
 
Added variable: Y_Z_0
 
min median mean max sd N_missing N_unique
 
min median mean max sd N_missing N_unique
<
-2.77 -0.11 -0.06 3.28 1 0 500
>
-3.04 0 0.04 3.14 1.01 0 500
 
 
 
 
 
Added variable: Y_Z_1
 
Added variable: Y_Z_1
 
min median mean max sd N_missing N_unique
 
min median mean max sd N_missing N_unique
<
-4.25 0.19 0.19 4.44 1.42 0 500
>
-4.14 0.44 0.37 5.39 1.5 0 500
 
 
 
 
 
Step 2 (inquiry): declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) -------------------
 
Step 2 (inquiry): declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) -------------------
 
 
 
 
 
A single draw of the inquiry:
 
A single draw of the inquiry:
 
inquiry estimand
 
inquiry estimand
<
ate 0.2508963
>
ate 0.3295257
 
 
 
 
 
Step 3 (assignment): declare_assignment(Z = complete_ra(N = N)) ----------------
 
Step 3 (assignment): declare_assignment(Z = complete_ra(N = N)) ----------------
 
 
 
 
 
Added variable: Z
 
Added variable: Z
 
0 1
 
0 1
 
250 250
 
250 250
 
0.50 0.50
 
0.50 0.50
 
 
 
 
 
Step 4 (measurement): declare_measurement(Y = reveal_outcomes(Y ~ Z)) ----------
 
Step 4 (measurement): declare_measurement(Y = reveal_outcomes(Y ~ Z)) ----------
 
 
 
 
 
Added variable: Y
 
Added variable: Y
 
min median mean max sd N_missing N_unique
 
min median mean max sd N_missing N_unique
<
-3.57 -0.01 0.07 4.44 1.24 0 500
>
-2.82 0.22 0.24 5.39 1.26 0 500
 
 
 
 
<
Step 5 (estimator): declare_estimator(Y ~ Z, model = lm_robust) ----------------
>
Step 5 (sampling): declare_sampling(S = complete_rs(N = N, prob = 0.9)) --------
~
>
 
~
>
N = 450 (50 subtracted)
~
>
 
~
>
Added variable: S
~
>
1
~
>
450
~
>
1.00
~
>
 
~
>
Altered variable: ID
~
>
Before:
~
>
N_missing N_unique class
~
>
0 500 character
~
>
 
~
>
After:
~
>
N_missing N_unique class
~
>
0 450 character
~
>
 
~
>
Altered variable: u_0
~
>
Before:
~
>
min median mean max sd N_missing N_unique
~
>
-3.04 0 0.04 3.14 1.01 0 500
~
>
 
~
>
After:
~
>
min median mean max sd N_missing N_unique
~
>
-3.04 0 0.06 3.14 1.03 0 450
~
>
 
~
>
Altered variable: u_1
~
>
Before:
~
>
min median mean max sd N_missing N_unique
~
>
-3.43 0.06 0.03 3.22 1.02 0 500
~
>
 
~
>
After:
~
>
min median mean max sd N_missing N_unique
~
>
-3.43 0.06 0.02 3.22 1.01 0 450
~
>
 
~
>
Altered variable: Y_Z_0
~
>
Before:
~
>
min median mean max sd N_missing N_unique
~
>
-3.04 0 0.04 3.14 1.01 0 500
~
>
 
~
>
After:
~
>
min median mean max sd N_missing N_unique
~
>
-3.04 0 0.06 3.14 1.03 0 450
~
>
 
~
>
Altered variable: Y_Z_1
~
>
Before:
~
>
min median mean max sd N_missing N_unique
~
>
-4.14 0.44 0.37 5.39 1.5 0 500
~
>
 
~
>
After:
~
>
min median mean max sd N_missing N_unique
~
>
-4.14 0.44 0.37 5.39 1.51 0 450
~
>
 
~
>
Altered variable: Z
~
>
Before:
~
>
0 1
~
>
250 250
~
>
0.50 0.50
~
>
 
~
>
After:
~
>
0 1
~
>
227 223
~
>
0.50 0.50
 
 
 
 
~
>
Altered variable: Y
~
>
Before:
~
>
min median mean max sd N_missing N_unique
~
>
-2.82 0.22 0.24 5.39 1.26 0 500
~
>
 
~
>
After:
~
>
min median mean max sd N_missing N_unique
~
>
-2.82 0.22 0.23 5.39 1.27 0 450
~
>
 
~
>
Step 6 (estimator): declare_estimator(Y ~ Z, model = lm_robust) ----------------
~
>
 
 
Formula: Y ~ Z
 
Formula: Y ~ Z
 
 
 
 
 
Model: lm_robust
 
Model: lm_robust
 
 
 
 
 
A single draw of the estimator:
 
A single draw of the estimator:
 
estimator term estimate std.error statistic p.value conf.low conf.high
 
estimator term estimate std.error statistic p.value conf.low conf.high
<
estimator Z 0.3653772 0.1097515 3.329133 0.0009357255 0.1497442 0.5810101
>
estimator Z 0.3955818 0.1190763 3.322087 0.0009666505 0.1615643 0.6295993
 
df outcome
 
df outcome
<
498 Y
>
448 Y
 
 
 
 

Compare diagnoses

We can diagnose both designs and look to see how they perform side by side.

diagnose_design(design_1, design_2)
Design Inquiry Bias RMSE Power Coverage Mean Estimate SD Estimate Mean Se Mean Estimand
design_1 ate 0.00 0.10 0.79 0.97 0.30 0.11 0.11 0.30
(0.00) (0.00) (0.01) (0.00) (0.00) (0.00) (0.00) (0.00)
design_2 ate 0.00 0.11 0.75 0.97 0.30 0.11 0.12 0.30
(0.00) (0.00) (0.01) (0.00) (0.00) (0.00) (0.00) (0.00)

Main thing we see here is that there is a decline in power (increase in RMSE and standard errors). No major differences beyond this.

Comparing diagnoses

Diagnoses may differ due to simulation error. The compare_diagnoses function helps assess whether two diagnoses are really different. (Best run with low alpha and large sims.)

comparison <- compare_diagnoses(design_1, design_2, alpha = .01)
inquiry estimator diagnosand design1 design2 difference
ate estimator bias 0.00 0.00 -0.00
(0.00) (0.00) (0.00)
ate estimator coverage 0.97 0.96 -0.01
(0.00) (0.00) (0.00)
ate estimator mean_estimand 0.30 0.30 0.00
(0.00) (0.00) (0.00)
ate estimator mean_estimate 0.30 0.30 -0.00
(0.00) (0.00) (0.00)
ate estimator mean_se 0.11 0.12 0.01*
(0.00) (0.00) (0.00)
ate estimator power 0.79 0.74 -0.05*
(0.01) (0.01) (0.01)
ate estimator rmse 0.10 0.11 0.01*
(0.00) (0.00) (0.00)
ate estimator sd_estimate 0.11 0.12 0.01*
(0.00) (0.00) (0.00)
ate estimator type_s_rate 0.00 0.00 0.00
(0.00) (0.00) (0.00)

Adjusting beliefs II

Now, we also have larger estimates for effects for men than we expected and we know more about variance within different groups.

More importantly we have also switched estimand.

Adjusting beliefs II

We introduce male as a variable, include an effect that’s closer to the effect we estimated, and include the new estimand in the design.

b <- .5

design_3 <- 
  
  declare_model(
    N = 500,
    male = simple_rs(N = N)==1,
    u_0 = rnorm(N),
    u_1 = rnorm(N),
    potential_outcomes(Y ~ u_0 + Z*(b + u_1))) +
  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0),
                  ate_men = mean((Y_Z_1 - Y_Z_0)[male])) + 
  declare_assignment(Z = complete_ra(N = N)) +
  declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
  declare_sampling(S = complete_rs(N = N, prob = .9)) +
  declare_estimator(Y ~ Z, model = lm_robust, inquiry = "ate") +
  declare_estimator(Y ~ Z, model = lm_robust, subset = male,  inquiry = "ate_men", label = "men")

Diagnosis 3

Does the design comparison give support to the decision to switch estimands?

diagnose_design(design_1, design_2, design_3)
Design Inquiry Bias RMSE Power Coverage Mean Estimate SD Estimate Mean Se Mean Estimand
design_1 ate -0.01 0.10 0.76 0.97 0.29 0.11 0.11 0.30
(0.00) (0.00) (0.01) (0.00) (0.00) (0.00) (0.00) (0.00)
design_2 ate 0.00 0.11 0.74 0.97 0.30 0.11 0.12 0.30
(0.00) (0.00) (0.01) (0.00) (0.00) (0.00) (0.00) (0.00)
design_3 ate 0.01 0.11 0.99 0.97 0.51 0.11 0.12 0.50
(0.00) (0.00) (0.00) (0.01) (0.00) (0.00) (0.00) (0.00)
design_3 ate_men 0.01 0.15 0.87 0.97 0.51 0.16 0.16 0.50
(0.00) (0.00) (0.01) (0.01) (0.01) (0.00) (0.00) (0.00)

A change in estimand is generally hard to justify on design grounds but it is certainly not supported here. (You have better diagnostics for ate_men in design_3 than for ate in design_1, but still better diagnostics for ate in design_3).

Last, and bravest: Redesign III

There is a part of the design-in-practice that we do not understand: the pattern of non-random data missingness among men.

Here is an attempt to reconcile.

rho <- .7

design_4 <- 
  
  declare_model(
    N = 500,
    male = simple_rs(N = N)==1,
    u_0 = rnorm(N), u_1 = rnorm(N),
    # A new term, correlated with potential outcomes:
    u_2 = correlate(given = u_1, rho = rho, rnorm), 
    potential_outcomes(Y ~ u_0 + Z*(b + u_1))) +
  declare_inquiry(ate = mean((Y_Z_1 - Y_Z_0)),
                  ate_men = mean((Y_Z_1 - Y_Z_0)[male]),
                  ate_women = mean((Y_Z_1 - Y_Z_0)[!male])) + 
  declare_assignment(Z = complete_ra(N = N)) +
  declare_sampling(S = complete_rs(N = N, prob = .9)) +
  declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
  
  # Non-random missing data
  declare_measurement(Y = ifelse(male & u_2 < quantile(u_2, .25), NA, Y)) +
  declare_estimator(Y ~ Z, model = lm_robust, inquiry = "ate") +
  declare_estimator(Y ~ Z, model = lm_robust, subset = male, 
                    inquiry = "ate_men", label = "men") +
  declare_estimator(Y ~ Z, model = lm_robust, subset = !male, 
                    inquiry = "ate_women", label = "women" )

Compare diagnoses 4

diagnose_design(design_1, design_4)
Design Inquiry Bias RMSE Power Coverage Mean Estimate SD Estimate Mean Se Mean Estimand
design_1 ate -0.00 0.10 0.76 0.96 0.30 0.11 0.11 0.30
(0.00) (0.00) (0.02) (0.01) (0.00) (0.00) (0.00) (0.00)
design_4 ate 0.12 0.17 1.00 0.84 0.62 0.12 0.12 0.50
(0.00) (0.00) (0.00) (0.02) (0.00) (0.00) (0.00) (0.00)
design_4 ate_men 0.30 0.35 1.00 0.63 0.80 0.17 0.18 0.50
(0.01) (0.01) (0.00) (0.02) (0.01) (0.01) (0.00) (0.00)
design_4 ate_women -0.01 0.15 0.84 0.97 0.49 0.17 0.16 0.50
(0.01) (0.01) (0.02) (0.01) (0.01) (0.01) (0.00) (0.00)

Critique

The honest reconciliation in the last design includes a possible model of the data missingness.

design_4 %>% redesign(rho = c(-.7, 0, .7)) %>% diagnose_design()
rho Inquiry Bias RMSE Power Coverage Mean Estimate SD Estimate Mean Se Mean Estimand
-0.7 ate_men -0.29 0.34 0.21 0.64 0.21 0.19 0.18 0.50
0 ate_men -0.00 0.18 0.76 0.96 0.50 0.18 0.19 0.50
0.7 ate_men 0.30 0.35 0.99 0.62 0.79 0.19 0.18 0.50

The discussion can then focus on the researcher’s claims about missingness patterns which are provided in the reconciliation.

Takeaways

Your reconciliation can clarify:

  1. how things are different
  2. whether things are different in relevant ways (e.g. under optimistic conditions)
  3. where conclusions in fact depend on the reasons for deviations

Proposal: attach both the original (registered) design and the reconciled design to your manuscript.

Risks and questions