DeclareDesign
DeclareDesign
basicsDeclareDesign
deepdiveDeclareDesign
DeclareDesign
DeclareDesign
BasicsHow to define and assess research designs
DeclareDesign
: key resourcesModel
: set of models of what causes what and howInquiry
: a question stated in terms of the modelData strategy
: the set of procedures we use to gather information from the world (sampling, assignment, measurement)Answer strategy
: how we summarize the data produced by the data strategyDesign declaration is telling the computer (and readers) what M
, I
, D
, and A
are.
Design diagnosis is figuring out how the design will perform under imagined conditions.
Estimating “diagnosands” like power, bias, rmse, expected error rates, expected ethical harm, expected “amount learned”.
Diagnosis takes account of model uncertainty: it aims to identify models for which the design works well and models for which it does not
Redesign is the fine-tuning of features of the data- and answer strategies to understand how changing them affects the diagnosands
declare_model()
declare_inquiry()
declare_sampling()
declare_assignment()
declare_measurement()
declare_estimator()
and there are more declare_
functions!
draw_data(design)
draw_estimands(design)
draw_estimates(design)
get_estimates(design, data)
run_design(design)
, simulate_design(design)
diagnose_design(design)
redesign(design, N = 200)
compare_designs()
, compare_diagnoses()
https://raw.githubusercontent.com/rstudio/cheatsheets/master/declaredesign.pdf
?DeclareDesign
+
design
Each step is a function (or rather: a function that generates functions) and each function presupposes what is created by previous functions.
main
data frame in and push the main
dataframe out; this data frame normally builds up as you move along the pipe.Each step is a function (or rather: a function that generates functions) and each function presupposes what is created by previous functions.
declare_estimator
steps take the main
data frame in and send out an estimator_df
dataframedeclare_inquiry
steps take the main
data frame in and send out an estimand_df
dataframe.You can also just run through the whole design once by typing the name of the design:
Research design declaration summary
Step 1 (model): declare_model(N = 100, Y = rnorm(N, mean)) ---------------------
Step 2 (inquiry): declare_inquiry(Q = mean) ------------------------------------
Step 3 (estimator): declare_estimator(Y ~ 1) -----------------------------------
Run of the design:
inquiry estimand estimator term estimate std.error statistic p.value
Q 0 estimator (Intercept) 0.155 0.108 1.43 0.155
conf.low conf.high df outcome
-0.0597 0.371 99 Y
Or by asking for a run of the design
one_run <- simplest_design |> run_design()
one_run |> kable(digits = 2) |> kable_styling(font_size = 18)
inquiry | estimand | estimator | term | estimate | std.error | statistic | p.value | conf.low | conf.high | df | outcome |
---|---|---|---|---|---|---|---|---|---|---|---|
Q | 0 | estimator | (Intercept) | 0.08 | 0.1 | 0.8 | 0.43 | -0.12 | 0.28 | 99 | Y |
A single run creates data, calculates estimands (the answer to inquiries) and calculates estimates plus ancillary statistics.
Or by asking for many runs of the design
some_runs <- simplest_design |> simulate_design(sims = 1000)
some_runs |> head() |> kable(digits = 2) |> kable_styling(font_size = 16)
design | sim_ID | inquiry | estimand | estimator | term | estimate | std.error | statistic | p.value | conf.low | conf.high | df | outcome |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
simplest_design | 1 | Q | 0 | estimator | (Intercept) | 0.03 | 0.09 | 0.32 | 0.75 | -0.15 | 0.21 | 99 | Y |
simplest_design | 2 | Q | 0 | estimator | (Intercept) | 0.00 | 0.10 | 0.03 | 0.98 | -0.19 | 0.20 | 99 | Y |
simplest_design | 3 | Q | 0 | estimator | (Intercept) | -0.13 | 0.09 | -1.46 | 0.15 | -0.31 | 0.05 | 99 | Y |
simplest_design | 4 | Q | 0 | estimator | (Intercept) | -0.14 | 0.10 | -1.36 | 0.18 | -0.35 | 0.06 | 99 | Y |
simplest_design | 5 | Q | 0 | estimator | (Intercept) | 0.02 | 0.12 | 0.16 | 0.88 | -0.22 | 0.26 | 99 | Y |
simplest_design | 6 | Q | 0 | estimator | (Intercept) | -0.07 | 0.09 | -0.69 | 0.49 | -0.25 | 0.12 | 99 | Y |
Once you have simulated many times you can “diagnose”.
This is the next topic
Once you have simulated many times you can “diagnose”.
For instance we can ask about bias: the average difference between the estimand and the estimate:
mean_estimate | mean_estimand | bias |
---|---|---|
0 | 0 | 0 |
diagnose_design()
diagnose_design()
does this in one step for a set of common “diagnosands”:
Design | N Sims | Mean Estimand | Mean Estimate | Bias | SD Estimate | RMSE | Power | Coverage |
---|---|---|---|---|---|---|---|---|
simplest_design | 500 | 0.00 | -0.00 | -0.00 | 0.10 | 0.10 | 0.05 | 0.95 |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.01) | (0.01) |
The diagnosis object is also a list; of class diagnosis
design | sim_ID | inquiry | estimand | estimator | term | estimate | std.error | statistic | p.value | conf.low | conf.high | df | outcome |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
simplest_design | 1 | Q | 0 | estimator | (Intercept) | 0.03 | 0.09 | 0.31 | 0.76 | -0.16 | 0.21 | 99 | Y |
simplest_design | 2 | Q | 0 | estimator | (Intercept) | 0.10 | 0.09 | 1.07 | 0.29 | -0.09 | 0.29 | 99 | Y |
simplest_design | 3 | Q | 0 | estimator | (Intercept) | -0.16 | 0.10 | -1.54 | 0.13 | -0.37 | 0.05 | 99 | Y |
simplest_design | 4 | Q | 0 | estimator | (Intercept) | -0.08 | 0.11 | -0.72 | 0.48 | -0.30 | 0.14 | 99 | Y |
simplest_design | 5 | Q | 0 | estimator | (Intercept) | -0.14 | 0.10 | -1.34 | 0.18 | -0.34 | 0.07 | 99 | Y |
simplest_design | 6 | Q | 0 | estimator | (Intercept) | -0.08 | 0.09 | -0.90 | 0.37 | -0.26 | 0.10 | 99 | Y |
design | inquiry | estimator | outcome | term | mean_estimand | se(mean_estimand) | mean_estimate | se(mean_estimate) | bias | se(bias) | sd_estimate | se(sd_estimate) | rmse | se(rmse) | power | se(power) | coverage | se(coverage) | n_sims |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
simplest_design | Q | estimator | Y | (Intercept) | 0 | 0 | 0 | 0 | 0 | 0 | 0.1 | 0 | 0.1 | 0 | 0.05 | 0.01 | 0.95 | 0.01 | 500 |
design | bootstrap_id | inquiry | estimator | outcome | term | mean_estimand | mean_estimate | bias | sd_estimate | rmse | power | coverage |
---|---|---|---|---|---|---|---|---|---|---|---|---|
simplest_design | 1 | Q | estimator | Y | (Intercept) | 0 | 0.00 | 0.00 | 0.1 | 0.10 | 0.05 | 0.95 |
simplest_design | 2 | Q | estimator | Y | (Intercept) | 0 | -0.01 | -0.01 | 0.1 | 0.11 | 0.06 | 0.94 |
simplest_design | 3 | Q | estimator | Y | (Intercept) | 0 | -0.01 | -0.01 | 0.1 | 0.10 | 0.05 | 0.95 |
simplest_design | 4 | Q | estimator | Y | (Intercept) | 0 | -0.01 | -0.01 | 0.1 | 0.10 | 0.05 | 0.95 |
simplest_design | 5 | Q | estimator | Y | (Intercept) | 0 | 0.00 | 0.00 | 0.1 | 0.10 | 0.05 | 0.95 |
simplest_design | 6 | Q | estimator | Y | (Intercept) | 0 | 0.00 | 0.00 | 0.1 | 0.10 | 0.05 | 0.95 |
The bootstraps dataframe is produced by resampling from the simulations dataframe and producing a diagnosis dataframe from each resampling.
This lets us generate estimates of uncertainty around our diagnosands.
It can be controlled thus:
It’s reshapeable: as a tidy dataframe, ready for graphing
design | inquiry | estimator | outcome | term | diagnosand | estimate | std.error | conf.low | conf.high |
---|---|---|---|---|---|---|---|---|---|
simplest_design | Q | estimator | Y | (Intercept) | mean_estimand | 0.00 | 0.00 | 0.00 | 0.00 |
simplest_design | Q | estimator | Y | (Intercept) | mean_estimate | 0.00 | 0.00 | -0.01 | 0.00 |
simplest_design | Q | estimator | Y | (Intercept) | bias | 0.00 | 0.00 | -0.01 | 0.00 |
simplest_design | Q | estimator | Y | (Intercept) | sd_estimate | 0.10 | 0.00 | 0.10 | 0.11 |
simplest_design | Q | estimator | Y | (Intercept) | rmse | 0.10 | 0.00 | 0.10 | 0.11 |
simplest_design | Q | estimator | Y | (Intercept) | power | 0.05 | 0.01 | 0.03 | 0.07 |
simplest_design | Q | estimator | Y | (Intercept) | coverage | 0.95 | 0.01 | 0.93 | 0.97 |
It’s reshapeable: as a tidy dataframe, ready for graphing
Or turn into a formatted table:
Design | Inquiry | Estimator | Outcome | Term | N Sims | Mean Estimand | Mean Estimate | Bias | SD Estimate | RMSE | Power | Coverage |
---|---|---|---|---|---|---|---|---|---|---|---|---|
simplest_design | Q | estimator | Y | (Intercept) | 500 | 0.00 | -0.00 | -0.00 | 0.10 | 0.10 | 0.05 | 0.95 |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.01) | (0.01) |
Diagnosis alerts to problems in a design. Consider the following simple alternative design.
Here we define the inquiry as the sample average \(Y\) (instead of the population mean). But otherwise things stay the same.
What do we think of this design?
Here is the diagnosis
Design | N Sims | Mean Estimand | Mean Estimate | Bias | SD Estimate | RMSE | Power | Coverage |
---|---|---|---|---|---|---|---|---|
simplest_design_2 | 500 | -0.00 | -0.00 | 0.00 | 0.10 | 0.00 | 0.04 | 1.00 |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.01) | (0.00) |
You can diagnose multiple designs or a list of designs
list(dum = simplest_design, dee = simplest_design) |>
diagnose_design(sims = 5) |>
reshape_diagnosis() |>
kable() |>
kable_styling(font_size = 20)
Design | Inquiry | Estimator | Outcome | Term | N Sims | Mean Estimand | Mean Estimate | Bias | SD Estimate | RMSE | Power | Coverage |
---|---|---|---|---|---|---|---|---|---|---|---|---|
dum | Q | estimator | Y | (Intercept) | 5 | 0.00 | -0.01 | -0.01 | 0.08 | 0.07 | 0.00 | 1.00 |
(0.00) | (0.03) | (0.03) | (0.01) | (0.01) | (0.00) | (0.00) | ||||||
dee | Q | estimator | Y | (Intercept) | 5 | 0.00 | 0.02 | 0.02 | 0.12 | 0.11 | 0.00 | 1.00 |
(0.00) | (0.05) | (0.05) | (0.03) | (0.02) | (0.00) | (0.00) |
Redesign is the process of taking a design and modifying it in some way.
There are a few ways to do this:
replace_step
, insert_step
or delete_step
redesign
we will focus on the third approach
A design parameter is a modifiable quantity of a design.
These quantities are objects that were in your global environment when you made your design, get referred to explicitly in your design, and got scooped up when the design was formed.
In our simplest design above we had a fixed N
, but we could make N
a modifiable quantity like this:
Note that N
is defined in memory; and it gets called in one of the steps. It has now become a parameter of the design and it can be modified using redesign.
Here is a version of the design with N = 200
:
Here is a list of three different designs with different Ns.
The good thing here is that it is now easy to diagnose over multiple designs and compare diagnoses. The parameter names then end up in the diagnosis_df
Consider this:
Then:
Output:
N | m | diagnosand | estimate | std.error | conf.low | conf.high |
---|---|---|---|---|---|---|
100 | 0.0 | mean_estimand | 0.00 | 0.00 | 0.00 | 0.00 |
100 | 0.0 | mean_estimate | 0.00 | 0.00 | -0.01 | 0.01 |
100 | 0.0 | bias | 0.00 | 0.00 | -0.01 | 0.01 |
100 | 0.0 | sd_estimate | 0.10 | 0.00 | 0.10 | 0.11 |
200 | 0.0 | mean_estimand | 0.00 | 0.00 | 0.00 | 0.00 |
200 | 0.0 | mean_estimate | 0.00 | 0.00 | -0.01 | 0.00 |
200 | 0.1 | mean_estimand | 0.10 | 0.00 | 0.10 | 0.10 |
200 | 0.1 | mean_estimate | 0.10 | 0.00 | 0.09 | 0.10 |
300 | 0.2 | bias | 0.00 | 0.00 | 0.00 | 0.00 |
300 | 0.2 | sd_estimate | 0.06 | 0.00 | 0.05 | 0.06 |
300 | 0.2 | rmse | 0.06 | 0.00 | 0.05 | 0.06 |
300 | 0.2 | power | 0.93 | 0.01 | 0.91 | 0.95 |
300 | 0.2 | coverage | 0.95 | 0.01 | 0.92 | 0.97 |
Graphing after redesign is easy:
What can you do with a design once you have it?
We motivate with a slightly more complex experimental design (more on the components of this later)
ID | U | Y_Z_0 | Y_Z_1 | Z | Y |
---|---|---|---|---|---|
001 | -1.2516446 | -1.2516446 | -0.2516446 | 0 | -1.2516446 |
002 | -1.4308705 | -1.4308705 | -0.4308705 | 1 | -0.4308705 |
003 | 0.8499598 | 0.8499598 | 1.8499598 | 0 | 0.8499598 |
004 | 0.2390656 | 0.2390656 | 1.2390656 | 0 | 0.2390656 |
005 | -0.5418315 | -0.5418315 | 0.4581685 | 1 | 0.4581685 |
006 | 2.3695282 | 2.3695282 | 3.3695282 | 1 | 3.3695282 |
Play with the data:
term | estimate | std.error | statistic | p.value | conf.low | conf.high | df | outcome |
---|---|---|---|---|---|---|---|---|
(Intercept) | -0.25 | 0.13 | -1.95 | 0.05 | -0.51 | 0.00 | 98 | Y |
Z | 1.06 | 0.20 | 5.33 | 0.00 | 0.67 | 1.46 | 98 | Y |
Using your actual data:
design | sim_ID | inquiry | estimand | estimator | term | estimate | std.error | statistic | p.value | conf.low | conf.high | df | outcome |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
design | 1 | ate | 1 | estimator | Z | 0.80 | 0.19 | 4.10 | 0 | 0.41 | 1.18 | 98 | Y |
design | 2 | ate | 1 | estimator | Z | 0.95 | 0.20 | 4.80 | 0 | 0.56 | 1.35 | 98 | Y |
design | 3 | ate | 1 | estimator | Z | 1.25 | 0.21 | 5.96 | 0 | 0.83 | 1.66 | 98 | Y |
Mean Estimate | Bias | SD Estimate | RMSE | Power | Coverage |
---|---|---|---|---|---|
0.99 | -0.01 | 0.22 | 0.22 | 1.00 | 0.93 |
(0.02) | (0.02) | (0.01) | (0.01) | (0.00) | (0.02) |
diagnosand | mean_1 | mean_2 | mean_difference | conf.low | conf.high |
---|---|---|---|---|---|
mean_estimand | 0.50 | 0.50 | 0.00 | 0.00 | 0.00 |
mean_estimate | 0.48 | 0.50 | 0.02 | -0.01 | 0.04 |
bias | -0.02 | 0.00 | 0.02 | -0.01 | 0.04 |
sd_estimate | 0.28 | 0.20 | -0.08 | -0.10 | -0.06 |
rmse | 0.28 | 0.20 | -0.08 | -0.10 | -0.06 |
power | 0.38 | 0.71 | 0.32 | 0.26 | 0.37 |
coverage | 0.97 | 0.96 | -0.01 | -0.04 | 0.01 |
DeclareDesign
: A deeper diveWe start with a simple experimental design with all four elements of MIDA and then show ways to extend.
fabricatr
package (and others)randomizr
package (and others)estimatr
package (and others)New elements:
declare_model
can be used much like mutate
with multiple columns created in sequencepotential_outcomes
function is a special function that creates potential outcome columns for different values of Z
reveal_outcome
to reveal the outcome; Z
and Y
are default nameslm_robust
is default)e.g. If you sample before defining the inquiry you get a different inquiry to if you sample after you define the inquiry
e.g. If you sample before defining the inquiry you get a different inquiry to if you sample after you define the inquiry
You can generate hierarchical data like this:
You can generate hierarchical data like this:
You can generate panel data like this:
M <-
declare_model(
countries = add_level(
N = 196,
country_shock = rnorm(N)
),
years = add_level(
N = 100,
time_trend = 1:N,
year_shock = runif(N, 1, 10),
nest = FALSE
),
observation = cross_levels(
by = join_using(countries, years),
observation_shock = rnorm(N),
Y = 0.01 * time_trend + country_shock + year_shock + observation_shock
)
)
You can generate panel data like this:
countries | country_shock | years | time_trend | year_shock | observation | observation_shock | Y |
---|---|---|---|---|---|---|---|
001 | 0.39 | 001 | 1 | 4.66 | 00001 | -1.40 | 3.66 |
002 | 0.09 | 001 | 1 | 4.66 | 00002 | 0.18 | 4.95 |
003 | -1.85 | 001 | 1 | 4.66 | 00003 | 0.75 | 3.58 |
004 | -1.92 | 001 | 1 | 4.66 | 00004 | -2.06 | 0.69 |
005 | -0.11 | 001 | 1 | 4.66 | 00005 | 1.13 | 5.69 |
006 | 0.07 | 001 | 1 | 4.66 | 00006 | 0.88 | 5.63 |
You can repeat steps and play with the order, always conscious of the direction of the pipe
design <-
declare_model(N = N, X = rep(0:1, N/2)) +
declare_model(U = rnorm(N), potential_outcomes(Y ~ b * Z * X + U)) +
declare_assignment(Z = block_ra(blocks = X), Y = reveal_outcomes(Y ~ Z)) +
declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) +
declare_inquiry(cate = mean(Y_Z_1[X==0] - Y_Z_0[X==0])) +
declare_estimator(Y ~ Z, inquiry = "ate", label = "ols") +
declare_estimator(Y ~ Z*X, inquiry = "cate", label = "fe")
Many causal inquiries are simple summaries of potential outcomes:
Inquiry | Units | Code |
---|---|---|
Average treatment effect in a finite population (PATE) | Units in the population | mean(Y_D_1 - Y_D_0) |
Conditional average treatment effect (CATE) for X = 1 | Units for whom X = 1 | mean(Y_D_1[X == 1] - Y_D_0[X == 1]) |
Complier average causal effect (CACE) | Complier units | mean(Y_D_1[D_Z_1 > D_Z_0] - Y_D_0[D_Z_1 > D_Z_0]) |
Causal interactions of \(D_1\) and \(D_2\) | Units in the population | mean((Y_D1_1_D2_1 - Y_D1_0_D2_1) - (Y_D1_1_D2_0 - Y_D1_0_D2_0)) |
Generating potential outcomes columns gets you far
Often though we need to define inquiries as a function of continuous variables. For this generating a potential outcomes function can make life easier. This helps for:
Here is an example of using functions to define complex counterfactuals:
f_M <- function(X, UM) 1*(UM < X)
f_Y <- function(X, M, UY) X + M - .4*X*M + UY
design <-
declare_model(N = 100,
X = simple_rs(N),
UM = runif(N),
UY = rnorm(N),
M = f_M(X, UM),
Y = f_Y(X, M, UY)) +
declare_inquiry(Q1 = mean(f_Y(1, f_M(0, UM), UY) - f_Y(0, f_M(0, UM), UY)))
design |> draw_estimands() |> kable() |> kable_styling(font_size = 20)
inquiry | estimand |
---|---|
Q1 | 1 |
Here is an example of using functions to define effects of continuous treatments.
f_Y <- function(X, UY) X - .25*X^2 + UY
design <-
declare_model(N = 100,
X = rnorm(N),
UY = rnorm(N),
Y = f_Y(X, UY)) +
declare_inquiry(
Q1 = mean(f_Y(X+1, UY) - f_Y(X, UY)),
Q2 = mean(f_Y(1, UY) - f_Y(0, UY)),
Q3 = (lm_robust(Y ~ X)|> tidy())[2,2]
)
design |> draw_estimands() |> kable() |> kable_styling(font_size = 20)
inquiry | estimand |
---|---|
Q1 | 0.7237174 |
Q2 | 0.7500000 |
Q3 | 1.2303680 |
which one is the ATE?
The randomizr
package has a set of functions for different types of block and cluster assignments.
simple_ra(N = 100, prob = 0.25)
complete_ra(N = 100, m = 40)
block_ra(blocks = regions)
cluster_ra(clusters = households)
* Block-and-cluster assignment: Cluster random assignment within blocks of clusters block_and_cluster_ra(blocks = regions, clusters = villages)
You can combine these in various ways. For examples with saturation random assignment first clusters are assigned to a saturation level, then units within clusters are assigned to treatment conditions according to the saturation level:
By default declare_estimates()
assumes you are interested in the first term after the constant from the output of an estimation procedure.
But you can say what you are interested in directly using term
and you can also associate different terms with different quantities of interest using inquiry
.
design <-
declare_model(
N = 100,
X1 = rnorm(N),
X2 = rnorm(N),
X3 = rnorm(N),
Y = X1 - X2 + X3 + rnorm(N)
) +
declare_inquiries(ate_2 = -1, ate_3 = 1) +
declare_estimator(Y ~ X1 + X2 + X3,
term = c("X2", "X3"),
inquiry = c("ate_2", "ate_3"))
design |> run_design() |> kable(digits = 2) |> kable_styling(font_size = 20)
inquiry | estimand | term | estimator | estimate | std.error | statistic | p.value | conf.low | conf.high | df | outcome |
---|---|---|---|---|---|---|---|---|---|---|---|
ate_2 | -1 | X2 | estimator | -0.96 | 0.09 | -10.31 | 0 | -1.15 | -0.78 | 96 | Y |
ate_3 | 1 | X3 | estimator | 1.15 | 0.10 | 12.00 | 0 | 0.96 | 1.34 | 96 | Y |
Sometimes it can be confusing what the names of a term is but you can figure this by running the estimation strategy directly. Here’s an example where the names of a term might be confusing.
lm_robust(Y ~ A*B,
data = data.frame(A = rep(c("a", "b"), 3),
B = rep(c("p", "q"), each = 3),
Y = rnorm(6))) |>
coef() |> kable() |> kable_styling(font_size = 20)
x | |
---|---|
(Intercept) | -0.7100967 |
Ab | 2.1415833 |
Bq | -0.0055076 |
Ab:Bq | -1.9702876 |
The names as they appear in the output here is the name of the term that declare_estimator
will look for.
DeclareDesign
works natively with estimatr
but you you can use whatever packages you like. You do have to make sure though that estimatr gets as input a nice tidy dataframe of estimates, and that might require some tidying.
design <-
declare_model(N = 1000, U = runif(N),
potential_outcomes(Y ~ as.numeric(U < .5 + Z/3))) +
declare_assignment(Z = simple_ra(N), Y = reveal_outcomes(Y ~ Z)) +
declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) +
declare_estimator(Y ~ Z, inquiry = "ate",
.method = glm,
family = binomial(link = "probit"))
Note that we passed additional arguments to glm
; that’s easy.
It’s not a good design though. Just look at the diagnosis:
if(run)
diagnose_design(design) |> write_rds("saved/probit.rds")
read_rds("saved/probit.rds") |>
reshape_diagnosis() |>
kable() |>
kable_styling(font_size = 20)
Design | Inquiry | Estimator | Term | N Sims | Mean Estimand | Mean Estimate | Bias | SD Estimate | RMSE | Power | Coverage |
---|---|---|---|---|---|---|---|---|---|---|---|
design | ate | estimator | Z | 500 | 0.33 | 0.97 | 0.64 | 0.09 | 0.64 | 1.00 | 0.00 |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) |
Why is it so terrible?
Because the probit estimate does not target the ATE directly; you need to do more work to get there.
You essentially have to write a function to get the estimates, calculate the quantity of interest and other stats, and turn these into a nice dataframe.
Luckily you can use the margins
package with tidy
to create a .summary
function which you can pass to declare_estimator
to do all this for you
if(run)
diagnose_design(design) |> write_rds("saved/probit_2.rds")
read_rds("saved/probit_2.rds") |> reshape_diagnosis() |> kable() |>
kable_styling(font_size = 20)
Design | Inquiry | Estimator | Term | N Sims | Mean Estimand | Mean Estimate | Bias | SD Estimate | RMSE | Power | Coverage |
---|---|---|---|---|---|---|---|---|---|---|---|
design | ate | estimator | Z | 500 | 0.33 | 0.97 | 0.64 | 0.09 | 0.64 | 1.00 | 0.00 |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) | |||||
design | ate | margins | Z | 500 | 0.33 | 0.31 | -0.02 | 0.02 | 0.03 | 1.00 | 0.90 |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.01) |
Much better
mean_se = mean(std.error)
type_s_rate = mean((sign(estimate) != sign(estimand))[p.value <= alpha])
exaggeration_ratio = mean((estimate/estimand)[p.value <= alpha])
var_estimate = pop.var(estimate)
mean_var_hat = mean(std.error^2)
prop_pos_sig = estimate > 0 & p.value <= alpha
mean_ci_length = mean(conf.high - conf.low)
my_diagnosands <-
declare_diagnosands(median_bias = median(estimate - estimand))
diagnose_design(simplest_design, diagnosands = my_diagnosands, sims = 10) |>
reshape_diagnosis() |> kable() |> kable_styling(font_size = 20)
Design | Inquiry | Estimator | Outcome | Term | N Sims | Median Bias |
---|---|---|---|---|---|---|
simplest_design | Q | estimator | Y | (Intercept) | 10 | 0.02 |
(0.04) |
You can partition the simulations data frame into groups before calculating diagnosands.
Design | Significant | N Sims | Mean Estimand | Mean Estimate | Bias | SD Estimate | RMSE | Power | Coverage |
---|---|---|---|---|---|---|---|---|---|
design_1 | FALSE | 474 | 0.00 | -0.00 | -0.00 | 0.09 | 0.09 | 0.00 | 1.00 |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) | |||
design_1 | TRUE | 26 | 0.00 | -0.02 | -0.02 | 0.23 | 0.23 | 1.00 | 0.00 |
(0.00) | (0.04) | (0.04) | (0.01) | (0.01) | (0.00) | (0.00) |
Note especially the mean estimate, the power, the coverage, the RMSE, and the bias. (Bias is not large because we have both under and over estimates)
Consider for instance this sampling design:
Compare these two diagnoses:
diagnosis | N Sims | Mean Estimand | Mean Estimate | SD Estimate | Average Se | RMSE | Coverage |
---|---|---|---|---|---|---|---|
diagnosis_1 | 10000 | 1.00 | 0.99 | 1.01 | 1.00 | 0.90 | 0.97 |
diagnosis_1 | (0.00) | (0.01) | (0.01) | (0.00) | (0.01) | (0.00) | |
diagnosis_2 | 10000 | 1.59 | 1.59 | 0.93 | 1.03 | 0.93 | 0.97 |
diagnosis_2 | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) |
In the second the estimate is drawn just once. The SD of the estimate is lower. But the RMSE is not very different.
my_estimator <- function(data)
data.frame(outcome = "Y", estimate = mean(data$Y), std.error = 0)
design <-
declare_model(N = 5000, Y = rnorm(N)) +
declare_inquiry(Y_bar_pop = mean(Y, 1)) +
declare_sampling(S = complete_rs(N = N, n = 500)) +
declare_inquiry(Y_bar_sample = mean(Y)) +
declare_estimator(Y ~ 1, inquiry = "Y_bar_pop",label = "ols") +
declare_estimator(handler = label_estimator(my_estimator),
inquiry = "Y_bar_sample",
label = "mean")
my_diagnosands <-
declare_diagnosands(
bias = mean(estimate - estimand),
rmse = mean((estimate - estimand)^2)^.5,
mean_se = mean(std.error))
Design | Inquiry | Estimator | Outcome | Term | N Sims | Bias | RMSE | Mean Se |
---|---|---|---|---|---|---|---|---|
design | Y_bar_pop | ols | Y | (Intercept) | 500 | 0.00 | 0.04 | 0.04 |
(0.00) | (0.00) | (0.00) | ||||||
design | Y_bar_sample | mean | Y | NA | 500 | 0.00 | 0.00 | 0.00 |
(0.00) | (0.00) | (0.00) |
When redesigning with arguments that are vectors, use list()
in redesign, with each list item representing a design you wish to create
A parameter has to be called correctly. And you get no warning if you misname.
why not 200?
A parameter has to be called explicitly
N <- 100
my_N <- function(n = N) n
simplest_design_N2 <-
declare_model(N = my_N(), Y = rnorm(N)) +
declare_inquiry(Q = 0) +
declare_estimator(Y ~ 1)
simplest_design_N2 |> redesign(N = 200) |> draw_data() |> nrow()
[1] 100
why not 200?
A parameter has to be called explicitly
N <- 100
my_N <- function(n = N) n
simplest_design_N2 <-
declare_model(N = my_N(N), Y = rnorm(N)) +
declare_inquiry(Q = 0) +
declare_estimator(Y ~ 1)
simplest_design_N2 |> redesign(N = 200) |> draw_data() |> nrow()
[1] 200
OK
Here is an example of redesigning where the “parameter” is a function
DeclareDesign
A design with hierarchical data and different assignment schemes.
design <-
declare_model(
school = add_level(N = 16,
u_school = rnorm(N, mean = 0)),
classroom = add_level(N = 4,
u_classroom = rnorm(N, mean = 0)),
student = add_level(N = 20,
u_student = rnorm(N, mean = 0))
) +
declare_model(
potential_outcomes(Y ~ .1*Z + u_classroom + u_student + u_school)
) +
declare_assignment(Z = simple_ra(N)) +
declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) +
declare_estimator(Y ~ Z, .method = difference_in_means)
Here are the first couple of rows and columns of the resulting data frame.
school | u_school | classroom | u_classroom | student | u_student | Y_Z_0 | Y_Z_1 | Z | Y |
---|---|---|---|---|---|---|---|---|---|
01 | -1.4 | 01 | -0.2 | 0001 | -0.97 | -2.57 | -2.47 | 0 | -2.57 |
01 | -1.4 | 01 | -0.2 | 0002 | 0.67 | -0.93 | -0.83 | 1 | -0.83 |
01 | -1.4 | 01 | -0.2 | 0003 | -1.97 | -3.57 | -3.47 | 1 | -3.47 |
01 | -1.4 | 01 | -0.2 | 0004 | 0.52 | -1.08 | -0.98 | 0 | -1.08 |
01 | -1.4 | 01 | -0.2 | 0005 | 0.60 | -1.00 | -0.90 | 1 | -0.90 |
01 | -1.4 | 01 | -0.2 | 0006 | -0.41 | -2.01 | -1.91 | 0 | -2.01 |
Here is the distribution between treatment and control:
We can draw a new set of data and look at the number of subjects in the treatment and control groups.
But what if all students in a given class have to be assigned the same treatment?
assignment_clustered <-
declare_assignment(Z = cluster_ra(clusters = classroom))
estimator_clustered <-
declare_estimator(Y ~ Z, clusters = classroom,
.method = difference_in_means)
design_clustered <-
design |>
replace_step("assignment", assignment_clustered) |>
replace_step("estimator", estimator_clustered)
assignment_clustered_blocked <-
declare_assignment(Z = block_and_cluster_ra(blocks = school,
clusters = classroom))
estimator_clustered_blocked <-
declare_estimator(Y ~ Z, blocks = school, clusters = classroom,
.method = difference_in_means)
design_clustered_blocked <-
design |>
replace_step("assignment", assignment_clustered_blocked) |>
replace_step("estimator", estimator_clustered_blocked)
Design | Power | Coverage |
---|---|---|
simple | 0.16 | 0.95 |
(0.01) | (0.01) | |
complete | 0.20 | 0.96 |
(0.01) | (0.01) | |
blocked | 0.42 | 0.95 |
(0.01) | (0.01) | |
clustered | 0.06 | 0.96 |
(0.01) | (0.01) | |
clustered_blocked | 0.08 | 0.96 |
(0.01) | (0.01) |
In many designs you seek to assign an integer number of subjects to treatment from some set.
Sometimes however your assignment targets are not integers.
Example:
Two strategies:
Can also be used to set targets
# remotes::install_github("macartan/probra")
library(probra)
set.seed(1)
fabricate(N = 4, size = c(47, 53, 87, 25), n_treated = prob_ra(.5*size)) %>%
janitor::adorn_totals("row") |>
kable(caption = "Setting targets to get 50% targets with minimal variance")
ID | size | n_treated |
---|---|---|
1 | 47 | 23 |
2 | 53 | 27 |
3 | 87 | 43 |
4 | 25 | 13 |
Total | 212 | 106 |
Can also be used to set for complete assignment with heterogeneous propensities
probs <- c(.8, .2)
design <-
declare_model(N = 2,
Y_Z_1 = c(1, 1),
Y_Z_0 = c(-1, 1)) +
declare_inquiry(ATE = 1) +
declare_assignment(
Z = prob_ra(prob = probs),
condition_prs = probs,
Y = reveal_outcomes(Y ~ Z)) +
declare_estimator(Y ~ Z, label = "ht",
.method = horvitz_thompson,
condition_prs = condition_prs)
Inquiry | Estimator | N Sims | Bias | RMSE |
---|---|---|---|---|
ATE | ht | 10000 | -0.00 | 2.00 |
(0.02) | (0.02) |
Unbiased but very very noisy (simulations also noisy)
Indirect control
Indirect assignments are generally generated by applying a direct assignment and then figuring our an implied indirect assignment
Looks better: but there are trade offs between the direct and indirect distributions
Figuring out the optimal procedure requires full diagnosis
Load up:
\(T2=0\) | \(T2=1\) | |
---|---|---|
T1 = 0 | \(50\%\) | \(0\%\) |
T1 = 1 | \(50\%\) | \(0\%\) |
Spread out:
\(T2=0\) | \(T2=1\) | |
---|---|---|
T1 = 0 | \(25\%\) | \(25\%\) |
T1 = 1 | \(25\%\) | \(25\%\) |
Three arm it?:
\(T2=0\) | \(T2=1\) | |
---|---|---|
T1 = 0 | \(33.3\%\) | \(33.3\%\) |
T1 = 1 | \(33.3\%\) | \(0\%\) |
Bunch it?:
\(T2=0\) | \(T2=1\) | |
---|---|---|
T1 = 0 | \(40\%\) | \(20\%\) |
T1 = 1 | \(20\%\) | \(20\%\) |
Two ways to do factorial assignments in DeclareDesign
:
In practice if you have a lot of treatments it can be hard to do full factorial designs – there may be too many combinations.
In such cases people use fractional factorial designs, like the one below (5 treatments but only 8 units!)
Variation | T1 | T2 | T3 | T4 | T5 |
---|---|---|---|---|---|
1 | 0 | 0 | 0 | 1 | 1 |
2 | 0 | 0 | 1 | 0 | 0 |
3 | 0 | 1 | 0 | 0 | 1 |
4 | 0 | 1 | 1 | 1 | 0 |
5 | 1 | 0 | 0 | 1 | 0 |
6 | 1 | 0 | 1 | 0 | 1 |
7 | 1 | 1 | 0 | 0 | 0 |
8 | 1 | 1 | 1 | 1 | 1 |
Then randomly assign units to rows. Note columns might also be blocking covariates.
In R, look at library(survey)
Unit | T1 | T2 | T3 | T4 | T5 |
---|---|---|---|---|---|
1 | 0 | 0 | 0 | 1 | 1 |
2 | 0 | 0 | 1 | 0 | 0 |
3 | 0 | 1 | 0 | 0 | 1 |
4 | 0 | 1 | 1 | 1 | 0 |
5 | 1 | 0 | 0 | 1 | 0 |
6 | 1 | 0 | 1 | 0 | 1 |
7 | 1 | 1 | 0 | 0 | 0 |
8 | 1 | 1 | 1 | 1 | 1 |
library(survey)
A focus on power
In the classical approach to testing a hypothesis we ask:
How likely are we to see data like this if indeed the hypothesis is true?
How unlikely is “not very likely”?
When we test a hypothesis we decide first on what sort of evidence we need to see in order to decide that the hypothesis is not reliable.
Othello has a hypothesis that Desdemona is innocent.
Iago confronts him with evidence:
Note that Othello is focused on the probability of the events if she were innocent but not the probability of the events if Iago were trying to trick him.
He is not assessing his belief in whether she is faithful, but rather how likely the data would be if she were faithful.
So:
Illustrating \(p\) values via “randomization inference”
Say you randomized assignment to treatment and your data looked like this.
Unit | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|
Treatment | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
Health score | 4 | 2 | 3 | 1 | 2 | 3 | 4 | 8 | 7 | 6 |
Then:
Power is just the probability of getting a significant result rejecting a hypothesis.
Simple enough but it presupposes:
I want to test the hypothesis that a six never comes up on this dice.
Here’s my test:
What is the power of this test?
I want to test the hypothesis that a six never comes up on this dice.
Here’s my test:
What is the power of this test?
Power sometimes seems more complicated because hypothesis rejection involves a calculated probability and so you need the probability of a probability.
I want to test the hypothesis that this dice is fair.
Here’s my test:
Now:
For this we need to figure a rule for rejection. This is based on identifying events that should be unlikely under the hypothesis.
Here is how many 6’s I would expect if the dice is fair:
I can figure out from this that 143 or fewer is really very few and 190 or more is really very many:
Now we need to stipulate some belief about how the world really works—this is not the null hypothesis that we plan to reject, but something that we actually take to be true.
For instance: we think that in fact sixes appear 20% of the time.
Now what’s the probability of seeing at least 190 sixes?
So given I think 6s appear 20% of the time, I think it likely I’ll see at least 190 sixes and reject the hypothesis of a fair dice.
Is arbitrarily flexible
sim_ID | estimate | p.value |
---|---|---|
1 | 0.81 | 0.00 |
2 | 0.40 | 0.04 |
3 | 0.88 | 0.00 |
4 | 0.72 | 0.00 |
5 | 0.38 | 0.05 |
6 | 0.44 | 0.02 |
Obviously related to the estimates you might get
A valid \(p\)-value satisfies \(\Pr(p≤x)≤x\) for every \(x \in[0,1]\) (under the null)
Mean Estimate | Bias | SD Estimate | RMSE | Power | Coverage |
---|---|---|---|---|---|
0.50 | 0.00 | 0.20 | 0.20 | 0.70 | 0.95 |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) |
b | Mean Estimate | Bias | SD Estimate | RMSE | Power | Coverage |
---|---|---|---|---|---|---|
0 | -0.00 | -0.00 | 0.20 | 0.20 | 0.05 | 0.95 |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) | |
0.25 | 0.25 | -0.00 | 0.20 | 0.20 | 0.23 | 0.95 |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) | |
0.5 | 0.50 | 0.00 | 0.20 | 0.20 | 0.70 | 0.95 |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) | |
1 | 1.00 | 0.00 | 0.20 | 0.20 | 1.00 | 0.95 |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) |
coming up:
We often focus on sample sizes
But
Power also depends on
Say we have access to a “pre” measure of outcome Y_now
; call it Y_base
. Y_base
is informative about potential outcomes. We are considering using Y_now - Y_base
as the outcome instead of Y_now
.
N <- 100
rho <- .5
design <-
declare_model(N,
Y_base = rnorm(N),
Y_Z_0 = 1 + correlate(rnorm, given = Y_base, rho = rho),
Y_Z_1 = correlate(rnorm, given = Y_base, rho = rho),
Z = complete_ra(N),
Y_now = Z*Y_Z_1 + (1-Z)*Y_Z_0,
Y_change = Y_now - Y_base) +
declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) +
declare_estimator(Y_now ~ Z, label = "level") +
declare_estimator(Y_change ~ Z, label = "change")+
declare_estimator(Y_now ~ Z + Y_base, label = "RHS")
Punchline:
You can see from the null design that power is great but bias is terrible and coverage is way off.
Mean Estimate | Bias | SD Estimate | RMSE | Power | Coverage |
---|---|---|---|---|---|
1.59 | 1.59 | 0.12 | 1.60 | 1.00 | 0.00 |
(0.01) | (0.01) | (0.00) | (0.01) | (0.00) | (0.00) |
Power without unbiasedness corrupts, absolutely
another_bad_design <-
declare_model(
N = 100,
female = rep(0:1, N/2),
U = rnorm(N),
potential_outcomes(Y ~ female * Z + U)) +
declare_assignment(
Z = block_ra(blocks = female, block_prob = c(.1, .5)),
Y = reveal_outcomes(Y ~ Z)) +
declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) +
declare_estimator(Y ~ Z + female, inquiry = "ate",
.method = lm_robust)
You can see from the null design that power is great but bias is terrible and coverage is way off.
Mean Estimate | Bias | SD Estimate | RMSE | Power | Coverage |
---|---|---|---|---|---|
0.76 | 0.26 | 0.24 | 0.35 | 0.84 | 0.85 |
(0.01) | (0.01) | (0.01) | (0.01) | (0.01) | (0.02) |
clustered_design <-
declare_model(
cluster = add_level(N = 10, cluster_shock = rnorm(N)),
individual = add_level(
N = 100,
Y_Z_0 = rnorm(N) + cluster_shock,
Y_Z_1 = rnorm(N) + cluster_shock)) +
declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) +
declare_assignment(Z = cluster_ra(clusters = cluster)) +
declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
declare_estimator(Y ~ Z, inquiry = "ATE")
Mean Estimate | Bias | SD Estimate | RMSE | Power | Coverage |
---|---|---|---|---|---|
-0.00 | -0.00 | 0.64 | 0.64 | 0.79 | 0.20 |
(0.01) | (0.01) | (0.01) | (0.01) | (0.01) | (0.01) |
What alerts you to a problem?
Mean Estimate | Bias | SD Estimate | RMSE | Power | Coverage |
---|---|---|---|---|---|
0.00 | -0.00 | 0.66 | 0.65 | 0.06 | 0.94 |
(0.02) | (0.02) | (0.01) | (0.01) | (0.01) | (0.01) |
design_uncertain <-
declare_model(N = 1000, b = 1+rnorm(1), Y_Z_1 = rnorm(N), Y_Z_2 = rnorm(N) + b, Y_Z_3 = rnorm(N) + b) +
declare_assignment(Z = complete_ra(N = N, num_arms = 3, conditions = 1:3)) +
declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
declare_inquiry(ate = mean(b)) +
declare_estimator(Y ~ factor(Z), term = TRUE)
draw_estimands(design_uncertain)
inquiry estimand
1 ate -0.3967765
inquiry estimand
1 ate 0.7887188
Say I run two tests and want to correct for multiple comparisons.
Two approaches. First, by hand:
b = .2
design_mc <-
declare_model(N = 1000, Y_Z_1 = rnorm(N), Y_Z_2 = rnorm(N) + b, Y_Z_3 = rnorm(N) + b) +
declare_assignment(Z = complete_ra(N = N, num_arms = 3, conditions = 1:3)) +
declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
declare_inquiry(ate = b) +
declare_estimator(Y ~ factor(Z), term = TRUE)
design_mc |>
simulate_designs(sims = 1000) |>
filter(term != "(Intercept)") |>
group_by(sim_ID) |>
mutate(p_bonferroni = p.adjust(p = p.value, method = "bonferroni"),
p_holm = p.adjust(p = p.value, method = "holm"),
p_fdr = p.adjust(p = p.value, method = "fdr")) |>
ungroup() |>
summarize(
"Power using naive p-values" = mean(p.value <= 0.05),
"Power using Bonferroni correction" = mean(p_bonferroni <= 0.05),
"Power using Holm correction" = mean(p_holm <= 0.05),
"Power using FDR correction" = mean(p_fdr <= 0.05)
)
Power using naive p-values | Power using Bonferroni correction | Power using Holm correction | Power using FDR correction |
---|---|---|---|
0.7374 | 0.6318 | 0.6886 | 0.7032 |
The alternative approach (generally better!) is to design with a custom estimator that includes your corrections.
my_estimator <- function(data)
lm_robust(Y ~ factor(Z), data = data) |>
tidy() |>
filter(term != "(Intercept)") |>
mutate(p.naive = p.value,
p.value = p.adjust(p = p.naive, method = "bonferroni"))
design_mc_2 <- design_mc |>
replace_step(5, declare_estimator(handler = label_estimator(my_estimator)))
run_design(design_mc_2) |>
select(term, estimate, p.value, p.naive) |> kable()
term | estimate | p.value | p.naive |
---|---|---|---|
factor(Z)2 | 0.1182516 | 0.2502156 | 0.1251078 |
factor(Z)3 | 0.1057031 | 0.3337476 | 0.1668738 |
Lets try same thing for a null model (using redesign(design_mc_2, b = 0)
)
…and power:
Mean Estimate | Bias | SD Estimate | RMSE | Power | Coverage |
---|---|---|---|---|---|
0.00 | 0.00 | 0.08 | 0.08 | 0.02 | 0.95 |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.01) |
-0.00 | -0.00 | 0.08 | 0.08 | 0.02 | 0.96 |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.01) |
bothered?
Introduction to observational strategies using DeclareDesign
Sometimes you give a medicine but only a nonrandom sample of people actually try to use it. Can you still estimate the medicine’s effect?
X=0 | X=1 | |
---|---|---|
Z=0 | \(\overline{y}_{00}\) (\(n_{00}\)) | \(\overline{y}_{01}\) (\(n_{01}\)) |
Z=1 | \(\overline{y}_{10}\) (\(n_{10}\)) | \(\overline{y}_{11}\) (\(n_{11}\)) |
Say that people are one of 3 types:
Sometimes you give a medicine but only a non random sample of people actually try to use it. Can you still estimate the medicine’s effect?
X=0 | X=1 | |
---|---|---|
Z=0 | \(\overline{y}_{00}\) (\(n_{00}\)) | \(\overline{y}_{01}\) (\(n_{01}\)) |
Z=1 | \(\overline{y}_{10}\) (\(n_{10}\)) | \(\overline{y}_{11}\) (\(n_{11}\)) |
We can figure something about types:
\(X=0\) | \(X=1\) | |
---|---|---|
\(Z=0\) | \(\frac{\frac{1}{2}n_c}{\frac{1}{2}n_c + \frac{1}{2}n_n} \overline{y}^0_{c}+\frac{\frac{1}{2}n_n}{\frac{1}{2}n_c + \frac{1}{2}n_n} \overline{y}_{n}\) | \(\overline{y}_{a}\) |
\(Z=1\) | \(\overline{y}_{n}\) | \(\frac{\frac{1}{2}n_c}{\frac{1}{2}n_c + \frac{1}{2}n_a} \overline{y}^1_{c}+\frac{\frac{1}{2}n_a}{\frac{1}{2}n_c + \frac{1}{2}n_a} \overline{y}_{a}\) |
You give a medicine to 50% but only a non random sample of people actually try to use it. Can you still estimate the medicine’s effect?
\(X=0\) | \(X=1\) | |
---|---|---|
\(Z=0\) | \(\frac{n_c}{n_c + n_n} \overline{y}^0_{c}+\frac{n_n}{n_c + n_n} \overline{y}_n\) | \(\overline{y}_{a}\) |
(n) | (\(\frac{1}{2}(n_c + n_n)\)) | (\(\frac{1}{2}n_a\)) |
\(Z=1\) | \(\overline{y}_{n}\) | \(\frac{n_c}{n_c + n_a} \overline{y}^1_{c}+\frac{n_a}{n_c + n_a} \overline{y}_{a}\) |
(n) | (\(\frac{1}{2}n_n\)) | (\(\frac{1}{2}(n_a+n_c)\)) |
Key insight: the contributions of the \(a\)s and \(n\)s are the same in the \(Z=0\) and \(Z=1\) groups so if you difference you are left with the changes in the contributions of the \(c\)s.
Average in \(Z=0\) group: \(\frac{{n_c} \overline{y}^0_{c}+ \left(n_{n}\overline{y}_{n} +{n_a} \overline{y}_a\right)}{n_a+n_c+n_n}\)
Average in \(Z=1\) group: \(\frac{{n_c} \overline{y}^1_{c} + \left(n_{n}\overline{y}_{n} +{n_a} \overline{y}_a \right)}{n_a+n_c+n_n}\)
So, the difference is the ITT: \(({\overline{y}^1_c-\overline{y}^0_c})\frac{n_c}{n}\)
Last step:
\[ITT = ({\overline{y}^1_c-\overline{y}^0_c})\frac{n_c}{n}\]
\[\leftrightarrow\]
\[LATE = \frac{ITT}{\frac{n_c}{n}}= \frac{\text{Intent to treat effect}}{\text{First stage effect}}\]
declaration_iv <-
declare_model(
N = 100,
U = rnorm(N),
potential_outcomes(D ~ if_else(Z + U > 0, 1, 0),
conditions = list(Z = c(0, 1))),
potential_outcomes(Y ~ 0.1 * D + 0.25 + U,
conditions = list(D = c(0, 1))),
complier = D_Z_1 == 1 & D_Z_0 == 0
) +
declare_inquiry(ATE = mean(Y_D_1 - Y_D_0),
LATE = mean(Y_D_1[complier] - Y_D_0[complier])) +
declare_assignment(Z = complete_ra(N, prob = 0.5)) +
declare_measurement(D = reveal_outcomes(D ~ Z),
Y = reveal_outcomes(Y ~ D)) +
declare_estimator(Y ~ D, inquiry = "ATE", label = "OLS") +
declare_estimator(Y ~ D | Z, .method = iv_robust, inquiry = "LATE",
label = "IV")
Inquiry | Estimator | Mean Estimand | Mean Estimate | Bias | RMSE |
---|---|---|---|---|---|
ATE | OLS | 0.10 | 1.55 | 1.45 | 1.46 |
(0.00) | (0.00) | (0.00) | (0.00) | ||
LATE | IV | 0.10 | -0.05 | -0.15 | 0.80 |
(0.00) | (0.01) | (0.01) | (0.03) |
Note:
Key idea: the evolution of units in the control group allow you to impute what the evolution of units in the treatment group would have been had they not been treated
We have group \(A\) that enters treatment at some point and group \(B\) that never does
The estimate:
\[\hat\tau = (\mathbb{E}[Y^A | post] - \mathbb{E}[Y^A | pre]) -(\mathbb{E}[Y^B | post] - \mathbb{E}[Y^B | pre])\] (how different is the change in \(A\) compared to the change in \(B\)?)
can be written using potential outcomes as:
\[\hat\tau = (\mathbb{E}[Y_1^A | post] - \mathbb{E}[Y_0^A | pre]) -(\mathbb{E}[Y_0^B | post] - \mathbb{E}[Y_0^B | pre])\]
With some manipulation and cleaning up:
\[\hat\tau = (\mathbb{E}[Y_1^A | post] - \mathbb{E}[Y_0^A | pre]) -(\mathbb{E}[Y_0^B | post] - \mathbb{E}[Y_0^B | pre])\]
$$ \[\begin{aligned} \hat\tau = (\mathbb{E}[Y_1^A | post] - \color{red}{\mathbb{E}[Y_0^A | post]}) + ((\color{red}{\mathbb{E}[Y_0^A | post]} - \mathbb{E}[Y_0^A | pre]) -(\mathbb{E}[Y_0^B | post] - \mathbb{E}[Y_0^B | pre])) \end{aligned}\]$$
\[\hat\tau_{ATT} = \tau_{ATT} + \text{Difference in trends}\]
n_units <- 2
design <-
declare_model(
unit = add_level(N = n_units, I = 1:N),
time = add_level(N = 6, T = 1:N, nest = FALSE),
obs = cross_levels(by = join_using(unit, time))) +
declare_model(potential_outcomes(Y ~ I + T^.5 + Z*T)) +
declare_assignment(Z = 1*(I>(n_units/2))*(T>3)) +
declare_measurement(Y = reveal_outcomes(Y~Z)) +
declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0),
ATT = mean(Y_Z_1[Z==1] - Y_Z_0[Z==1])) +
declare_estimator(Y ~ Z, label = "naive") +
declare_estimator(Y ~ Z + I, label = "FE1") +
declare_estimator(Y ~ Z + as.factor(T), label = "FE2") +
declare_estimator(Y ~ Z + I + as.factor(T), label = "FE3")
unit | I | time | T | obs | Y_Z_0 | Y_Z_1 | Z | Y |
---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | 01 | 2.000000 | 3.000000 | 0 | 2.000000 |
2 | 2 | 1 | 1 | 02 | 3.000000 | 4.000000 | 0 | 3.000000 |
1 | 1 | 2 | 2 | 03 | 2.414214 | 4.414214 | 0 | 2.414214 |
2 | 2 | 2 | 2 | 04 | 3.414214 | 5.414214 | 0 | 3.414214 |
1 | 1 | 3 | 3 | 05 | 2.732051 | 5.732051 | 0 | 2.732051 |
2 | 2 | 3 | 3 | 06 | 3.732051 | 6.732051 | 0 | 3.732051 |
Here only the two way fixed effects is unbiased and only for the ATT.
The ATT here is averaging over effects for treated units (later periods only). We know nothing about the size of effects in earlier periods when all units are in control!
Inquiry | Estimator | Bias |
---|---|---|
ATE | FE1 | 2.25 |
ATE | FE2 | 6.50 |
ATE | FE3 | 1.50 |
ATE | naive | 5.40 |
ATT | FE1 | 0.75 |
ATT | FE2 | 5.00 |
ATT | FE3 | 0.00 |
ATT | naive | 3.90 |
Inquiry | Estimator | Bias |
---|---|---|
ATE | FE1 | 2.25 |
ATE | FE2 | 6.50 |
ATE | FE3 | 1.50 |
ATE | naive | 5.40 |
ATT | FE1 | 0.75 |
ATT | FE2 | 5.00 |
ATT | FE3 | 0.00 |
ATT | naive | 3.90 |
Things get much more complicated when there is (a) heterogeneous timing in treatment take up and (b) heterogeneous effects
It’s only recently been appreciated how tricky things can get
But we already have an intuition from our analysis of trials with heterogeneous assignment and heterogeneous effects:
in such cases fixed effects analysis weights stratum level treatment effects by the variance in assignment to treatment
something similar here
Just two units assigned at different times:
trend = 0
design <-
declare_model(
unit = add_level(N = 2, ui = rnorm(N), I = 1:N),
time = add_level(N = 6, ut = rnorm(N), T = 1:N, nest = FALSE),
obs = cross_levels(by = join_using(unit, time))) +
declare_model(
potential_outcomes(Y ~ trend*T + (1+Z)*(I == 2))) +
declare_assignment(Z = 1*((I == 1) * (T>3) + (I == 2) * (T>5))) +
declare_measurement(Y = reveal_outcomes(Y~Z),
I_c = I - mean(I)) +
declare_inquiry(mean(Y_Z_1 - Y_Z_0)) +
declare_estimator(Y ~ Z, label = "1. naive") +
declare_estimator(Y ~ Z + I, label = "2. FE1") +
declare_estimator(Y ~ Z + as.factor(T), label = "3. FE2") +
declare_estimator(Y ~ Z + I + as.factor(T), label = "4. FE3") +
declare_estimator(Y ~ Z*I_c + as.factor(T), label = "5. Sat")
Estimator | Mean Estimand | Mean Estimate |
---|---|---|
1. naive | 0.50 | -0.12 |
(0.00) | (0.00) | |
2. FE1 | 0.50 | 0.36 |
(0.00) | (0.00) | |
3. FE2 | 0.50 | -1.00 |
(0.00) | (0.00) | |
4. FE3 | 0.50 | 0.25 |
(0.00) | (0.00) | |
5. Sat | 0.50 | 0.50 |
(0.00) | (0.00) |
See causal inference slides for intuitions on what is happening here as well as declaration using approach of Chaisemartin and d’Haultfoeuille (2020).
Errors and diagnostics
See excellent introduction: Lee and Lemieux (2010)
Kids born on 31 August start school a year younger than kids born on 1 September: does starting younger help or hurt?
Kids born on 12 September 1983 are more likely to register Republican than kids born on 10 September 1983: can this identify the effects of registration on long term voting?
A district in which Republicans got 50.1% of the vote get a Republican representative while districts in which Republicans got 49.9% of the vote do not: does having a Republican representative make a difference for these districts?
Setting:
Two arguments:
Continuity: \(\mathbb{E}[Y(1)|X=x]\) and \(\mathbb{E}[Y(0)|X=x]\) are continuous (at \(x=0\)) in \(x\): so \(\lim_{\hat x \rightarrow 0}\mathbb{E}[Y(0)|X=\hat x] = \mathbb{E}[Y(0)|X=\hat 0]\)
Local randomization: tiny things that determine exact values of \(x\) are as if random and so we can think of a local experiment around \(X=0\).
Note:
Exclusion restriction is implicit in continuity: If something else happens at the threshold then the conditional expectation functions jump at the thresholds
Implicit: \(X\) is exogenous in the sense that units cannot adjust \(X\) in order to be on one or the other side of the threshold
Typically researchers show:
Typically researchers show:
In addition:
Sometimes:
library(rdss) # for helper functions
library(rdrobust)
cutoff <- 0.5
bandwidth <- 0.5
control <- function(X) {
as.vector(poly(X, 4, raw = TRUE) %*% c(.7, -.8, .5, 1))}
treatment <- function(X) {
as.vector(poly(X, 4, raw = TRUE) %*% c(0, -1.5, .5, .8)) + .25}
rdd_design <-
declare_model(
N = 1000,
U = rnorm(N, 0, 0.1),
X = runif(N, 0, 1) + U - cutoff,
D = 1 * (X > 0),
Y_D_0 = control(X) + U,
Y_D_1 = treatment(X) + U
) +
declare_inquiry(LATE = treatment(0) - control(0)) +
declare_measurement(Y = reveal_outcomes(Y ~ D)) +
declare_sampling(S = X > -bandwidth & X < bandwidth) +
declare_estimator(Y ~ D*X, term = "D", label = "lm") +
declare_estimator(
Y, X,
term = "Bias-Corrected",
.method = rdrobust_helper,
label = "optimal"
)
Note rdrobust
implements:
See Calonico, Cattaneo, and Titiunik (2014) and related papers ? rdrobust::rdrobust
Estimator | Mean Estimate | Bias | SD Estimate | Coverage |
---|---|---|---|---|
lm | 0.23 | -0.02 | 0.01 | 0.64 |
(0.00) | (0.00) | (0.00) | (0.02) | |
optimal | 0.25 | 0.00 | 0.03 | 0.89 |
(0.00) | (0.00) | (0.00) | (0.01) |