Introduction to design declaration with `DeclareDesign`

Graeme Blair, Alex Coppock, Macartan Humphreys

1 Roadmap

DeclareDesign basics
DeclareDesign deepdive
Assignment with DeclareDesign
Power analysis with DeclareDesign
Declaring observational strategies

2 `DeclareDesign` Basics

How to define and assess research designs

2.1 Roadmap

The MIDA framework and the declaration-diagnosis-redesign cycle
DeclareDesign: key resources
Design
Diagnosis
Redesign
Using designs

2.2 The MIDA Framework

2.2.1 Four elements of any research design

Model: set of models of what causes what and how
Inquiry: a question stated in terms of the model
Data strategy: the set of procedures we use to gather information from the world (sampling, assignment, measurement)
Answer strategy: how we summarize the data produced by the data strategy

2.2.2 Four elements of any research design

2.2.3 Declaration

Design declaration is telling the computer (and readers) what M, I, D, and A are.

2.2.4 Diagnosis

Design diagnosis is figuring out how the design will perform under imagined conditions.
Estimating “diagnosands” like power, bias, rmse, expected error rates, expected ethical harm, expected “amount learned”.
Diagnosis takes account of model uncertainty: it aims to identify models for which the design works well and models for which it does not

2.2.5 Redesign

Redesign is the fine-tuning of features of the data- and answer strategies to understand how changing them affects the diagnosands

Different sample sizes
Different randomization procedures
Different estimation strategies
Implementation (e.g. gains from effort into compliance versus more effort into sample size)

2.2.6 Very often you have to simulate

Doing all this is often too hard to work out from rules of thumb or power calculators
Specialized formulas exist for some diagnosands, but not all

2.3 Key functions and resources

2.3.1 Key commands for design declaration

declare_model()
declare_inquiry()
declare_sampling()
declare_assignment()
declare_measurement()
declare_estimator()

and there are more declare_ functions!

2.3.2 Key commands for using a design

draw_data(design)
draw_estimands(design)
draw_estimates(design)
get_estimates(design, data)
run_design(design), simulate_design(design)
diagnose_design(design)
redesign(design, N = 200)
compare_designs(), compare_diagnoses()

2.3.3 Pipeable commands

design |> 
  redesign(N = c(200, 400)) |>
  diagnose_designs() |> 
  tidy() |> 
  ggplot(...)

2.3.4 Cheat sheet

https://raw.githubusercontent.com/rstudio/cheatsheets/master/declaredesign.pdf

2.3.5 Other resources

The website: https://declaredesign.org/
The book: https://book.declaredesign.org
The console: ?DeclareDesign

2.4 Design declaration-diagnosis-redesign workflow: Design

2.4.1 The simplest possible (diagnosable) design?

mean <- 0

simplest_design <- 
  declare_model(N = 100, Y = rnorm(N, mean)) +
  declare_inquiry(Q = mean) +
  declare_estimator(Y ~ 1)

we draw 100 units from a standard normal distribution
we define our inquiry as the population expectation
we estimate the average using a regression with a constant term

2.4.2 The simplest possible design?

simplest_design <- 
  declare_model(N = 100, Y = rnorm(N, mean)) +
  declare_inquiry(Q = mean) +
  declare_estimator(Y ~ 1)

This design has three steps, with steps connected by a +
The design itself is just a list of steps and has class design

str(simplest_design)

List of 3
 $ model    :design_step:    declare_model(N = 100, Y = rnorm(N, mean)) 
 $ Q        :design_step:    declare_inquiry(Q = mean) 
 $ estimator:design_step:    declare_estimator(Y ~ 1) 
 - attr(*, "call")= language construct_design(steps = steps)
 - attr(*, "class")= chr [1:2] "design" "dd"

2.4.3 The design is a pipe

Each step is a function (or rather: a function that generates functions) and each function presupposes what is created by previous functions.

The ordering of steps is quite important
Most steps take the main data frame in and push the main dataframe out; this data frame normally builds up as you move along the pipe.

2.4.4 The design is a pipe

Each step is a function (or rather: a function that generates functions) and each function presupposes what is created by previous functions.

The ordering of steps is quite important
declare_estimator steps take the main data frame in and send out an estimator_df dataframe
declare_inquiry steps take the main data frame in and send out an estimand_df dataframe.

2.4.5 The design is a pipe

You can run these functions one at a time if you like.
For instance the third step presupposes the data from the first step:

df <- simplest_design[[1]]()
A  <- simplest_design[[3]](df)

A |> kable(digits = 2) |> kable_styling(font_size = 20)

estimator	term	estimate	std.error	statistic	p.value	conf.low	conf.high	df	outcome
estimator	(Intercept)	-0.1	0.09	-1.2	0.23	-0.27	0.07	99	Y

Estimand  <- simplest_design[[2]](df)

Estimand |> kable(digits = 2) |> kable_styling(font_size = 20)

inquiry	estimand
Q	0

2.4.6 Run it once

You can also just run through the whole design once by typing the name of the design:

simplest_design


Research design declaration summary

Step 1 (model): declare_model(N = 100, Y = rnorm(N, mean)) ---------------------

Step 2 (inquiry): declare_inquiry(Q = mean) ------------------------------------

Step 3 (estimator): declare_estimator(Y ~ 1) -----------------------------------

Run of the design:

 inquiry estimand estimator        term estimate std.error statistic p.value
       Q        0 estimator (Intercept)    0.177    0.0906      1.95  0.0535
 conf.low conf.high df outcome
 -0.00272     0.357 99       Y

2.4.7 Run it again

Or by asking for a run of the design

one_run <- simplest_design |> run_design()
one_run |> kable(digits = 2) |> kable_styling(font_size = 18)

inquiry	estimand	estimator	term	estimate	std.error	statistic	p.value	conf.low	conf.high	df	outcome
Q	0	estimator	(Intercept)	-0.21	0.11	-2.04	0.04	-0.42	-0.01	99	Y

A single run creates data, calculates estimands (the answer to inquiries) and calculates estimates plus ancillary statistics.

2.4.8 Simulation

Or by asking for many runs of the design

some_runs <- simplest_design |> simulate_design(sims = 1000)

some_runs |> head() |> kable(digits = 2) |> kable_styling(font_size = 16)

design	sim_ID	inquiry	estimator	term	estimate	std.error	statistic	p.value	conf.low	conf.high	df	outcome
simplest_design	1	Q	estimator	(Intercept)	-0.06	0.11	-0.53	0.60	-0.28	0.16	99	Y
simplest_design	2	Q	estimator	(Intercept)	-0.13	0.10	-1.26	0.21	-0.33	0.07	99	Y
simplest_design	3	Q	estimator	(Intercept)	-0.13	0.09	-1.39	0.17	-0.31	0.05	99	Y
simplest_design	4	Q	estimator	(Intercept)	0.13	0.10	1.31	0.19	-0.07	0.32	99	Y
simplest_design	5	Q	estimator	(Intercept)	0.08	0.09	0.89	0.37	-0.10	0.27	99	Y
simplest_design	6	Q	estimator	(Intercept)	-0.07	0.11	-0.67	0.51	-0.29	0.15	99	Y

2.4.9 Diagnosis

Once you have simulated many times you can “diagnose”.

This is the next topic

2.5 Design declaration-diagnosis-redesign workflow: Diagnosis

2.5.1 Diagnosis by hand

Once you have simulated many times you can “diagnose”.

For instance we can ask about bias: the average difference between the estimand and the estimate:

some_runs |> 
  summarize(mean_estimate = mean(estimate), 
            mean_estimand = mean(estimand), 
            bias = mean(estimate - estimand))

mean_estimate	mean_estimand	bias
0	0	0

2.5.2 `diagnose_design()`

diagnose_design() does this in one step for a set of common “diagnosands”:

diagnosis <-
  simplest_design |>
  diagnose_design()

Design	N Sims	Mean Estimand	Mean Estimate	Bias	SD Estimate	RMSE	Power	Coverage
simplest_design	500	0.00	-0.00	-0.00	0.10	0.10	0.05	0.95
		(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.01)	(0.01)

2.5.3 What is the diagnosis object?

The diagnosis object is also a list; of class diagnosis

names(diagnosis)

[1] "simulations_df"       "diagnosands_df"       "diagnosand_names"    
[4] "group_by_set"         "parameters_df"        "bootstrap_replicates"
[7] "bootstrap_sims"       "duration"

class(diagnosis)

[1] "diagnosis"

2.5.4 What is the diagnosis object?

diagnosis$simulations_df |> 
  head()

design	sim_ID	inquiry	estimator	term	estimate	std.error	statistic	p.value	conf.low	conf.high	df	outcome
simplest_design	1	Q	estimator	(Intercept)	0.03	0.09	0.31	0.76	-0.16	0.21	99	Y
simplest_design	2	Q	estimator	(Intercept)	0.10	0.09	1.07	0.29	-0.09	0.29	99	Y
simplest_design	3	Q	estimator	(Intercept)	-0.16	0.10	-1.54	0.13	-0.37	0.05	99	Y
simplest_design	4	Q	estimator	(Intercept)	-0.08	0.11	-0.72	0.48	-0.30	0.14	99	Y
simplest_design	5	Q	estimator	(Intercept)	-0.14	0.10	-1.34	0.18	-0.34	0.07	99	Y
simplest_design	6	Q	estimator	(Intercept)	-0.08	0.09	-0.90	0.37	-0.26	0.10	99	Y

2.5.5 What is the diagnosis object?

diagnosis$diagnosands_df |> 
  head()

design	inquiry	estimator	outcome	term	mean_estimand	se(mean_estimand)	mean_estimate	se(mean_estimate)	bias	se(bias)	sd_estimate	se(sd_estimate)	rmse	se(rmse)	power	se(power)	coverage	se(coverage)	n_sims
simplest_design	Q	estimator	Y	(Intercept)	0	0	0	0	0	0	0.1	0	0.1	0	0.05	0.01	0.95	0.01	500

2.5.6 What is the diagnosis object?

diagnosis$bootstrap_replicates |> 
  head()

design	bootstrap_id	inquiry	estimator	outcome	term	mean_estimate	bias	sd_estimate	rmse	power	coverage
simplest_design	1	Q	estimator	Y	(Intercept)	0.00	0.00	0.1	0.10	0.05	0.95
simplest_design	2	Q	estimator	Y	(Intercept)	-0.01	-0.01	0.1	0.11	0.06	0.94
simplest_design	3	Q	estimator	Y	(Intercept)	-0.01	-0.01	0.1	0.10	0.05	0.95
simplest_design	4	Q	estimator	Y	(Intercept)	-0.01	-0.01	0.1	0.10	0.05	0.95
simplest_design	5	Q	estimator	Y	(Intercept)	0.00	0.00	0.1	0.10	0.05	0.95
simplest_design	6	Q	estimator	Y	(Intercept)	0.00	0.00	0.1	0.10	0.05	0.95

2.5.7 Diagnosis: Bootstraps

The bootstraps dataframe is produced by resampling from the simulations dataframe and producing a diagnosis dataframe from each resampling.
This lets us generate estimates of uncertainty around our diagnosands.
It can be controlled thus:

diagnose_design(
  ...,
  bootstrap_sims = 100
)

2.5.8 After Diagnosis

It’s reshapeable: as a tidy dataframe, ready for graphing

diagnosis |> 
  tidy()

design	inquiry	estimator	outcome	term	diagnosand	estimate	std.error	conf.low	conf.high
simplest_design	Q	estimator	Y	(Intercept)	mean_estimand	0.00	0.00	0.00	0.00
simplest_design	Q	estimator	Y	(Intercept)	mean_estimate	0.00	0.00	-0.01	0.00
simplest_design	Q	estimator	Y	(Intercept)	bias	0.00	0.00	-0.01	0.00
simplest_design	Q	estimator	Y	(Intercept)	sd_estimate	0.10	0.00	0.10	0.11
simplest_design	Q	estimator	Y	(Intercept)	rmse	0.10	0.00	0.10	0.11
simplest_design	Q	estimator	Y	(Intercept)	power	0.05	0.01	0.03	0.07
simplest_design	Q	estimator	Y	(Intercept)	coverage	0.95	0.01	0.93	0.97

2.5.9 After Diagnosis

It’s reshapeable: as a tidy dataframe, ready for graphing

diagnosis |> 
  tidy() |> 
  ggplot(aes(estimate, diagnosand)) + geom_point() + 
  geom_errorbarh(aes(xmax = conf.high, xmin = conf.low, height = .2))

2.5.10 After Diagnosis: Tables

Or turn into a formatted table:

diagnosis |> 
  reshape_diagnosis()

Design	Inquiry	Estimator	Outcome	Term	N Sims	Mean Estimand	Mean Estimate	Bias	SD Estimate	RMSE	Power	Coverage
simplest_design	Q	estimator	Y	(Intercept)	500	0.00	-0.00	-0.00	0.10	0.10	0.05	0.95
						(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.01)	(0.01)

2.5.11 Spotting design problems with diagnosis

Diagnosis alerts to problems in a design. Consider the following simple alternative design.

simplest_design_2 <- 
  
  declare_model(N = 100, Y = rnorm(N)) +
  declare_inquiry(Q = mean(Y)) +
  declare_estimator(Y ~ 1)

Here we define the inquiry as the sample average $Y$ (instead of the population mean). But otherwise things stay the same.

What do we think of this design?

2.5.12 Spotting design problems with diagnosis

Here is the diagnosis

Design	N Sims	Mean Estimand	Mean Estimate	Bias	SD Estimate	RMSE	Power	Coverage
simplest_design_2	500	-0.00	-0.00	0.00	0.10	0.00	0.04	1.00
		(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.01)	(0.00)

Why is coverage so high? is that OK?
Why is the RMSE 0? But in each run the std.error > 0? is that OK?
- Is it because the RMSE is too low?
- Or the standard error is too large?

2.5.13 It depends on the inquiry

If we are really interested in the sample average then our standard error is off: we should have no error at all!
If we are really interested in the population average then our inquiry is badly defined: it should not be redefined on each run!

2.5.14 Diagnosing multiple designs

You can diagnose multiple designs or a list of designs

list(dum = simplest_design, dee = simplest_design) |>
  diagnose_design(sims = 5) |>
  reshape_diagnosis() |> 
  kable() |> 
  kable_styling(font_size = 20)

Design	Inquiry	Estimator	Outcome	Term	N Sims	Mean Estimand	Mean Estimate	Bias	SD Estimate	RMSE	Power	Coverage
dum	Q	estimator	Y	(Intercept)	5	0.00	-0.08	-0.08	0.10	0.12	0.20	0.80
						(0.00)	(0.04)	(0.04)	(0.03)	(0.02)	(0.19)	(0.19)
dee	Q	estimator	Y	(Intercept)	5	0.00	0.04	0.04	0.05	0.06	0.00	1.00
						(0.00)	(0.02)	(0.02)	(0.01)	(0.02)	(0.00)	(0.00)

2.6 Design declaration-diagnosis-redesign workflow: Redesign

2.6.1 What is redesign?

Redesign is the process of taking a design and modifying it in some way.

There are a few ways to do this:

Just make a new design using modified code
Take a design and alter some steps using replace_step, insert_step or delete_step
Modify a design parameter using redesign

we will focus on the third approach

2.6.2 Redesign parameters

A design parameter is a modifiable quantity of a design.
These quantities are objects that were in your global environment when you made your design, get referred to explicitly in your design, and got scooped up when the design was formed.
In our simplest design above we had a fixed N, but we could make N a modifiable quantity like this:

N <- 100

simplest_design_N <- 
  
  declare_model(N = N, Y = rnorm(N)) +
  declare_inquiry(Q = 0) +
  declare_estimator(Y ~ 1)

2.6.3 Redesign parameter definition

N <- 100

simplest_design_N <- 
  
  declare_model(N = N, Y = rnorm(N)) +
  declare_inquiry(Q = 0) +
  declare_estimator(Y ~ 1)

Note that N is defined in memory; and it gets called in one of the steps. It has now become a parameter of the design and it can be modified using redesign.

2.6.4 Redesign illustration

Here is a version of the design with N = 200:

design_200 <- simplest_design_N |> redesign(N = 200)
  
design_200 |> draw_data() |> nrow()

[1] 200

2.6.5 Redesigning to a list

Here is a list of three different designs with different Ns.

design_Ns <- simplest_design_N |> redesign(N = c(200, 400, 800))

design_Ns |> lapply(draw_data) |> lapply(nrow)

$design_1
[1] 200

$design_2
[1] 400

$design_3
[1] 800

2.6.6 Redesigning to a list

The good thing here is that it is now easy to diagnose over multiple designs and compare diagnoses. The parameter names then end up in the diagnosis_df

Consider this:

N <- 100
m <- 0

design <- 
  declare_model(N = N, Y = rnorm(N, m)) +
  declare_inquiry(Q = m) +
  declare_estimator(Y ~ 1)

Then:

designs <-  redesign(design, N = c(100, 200, 300), m = c(0, .1, .2))
  
designs |> diagnose_design() |> tidy()

2.6.7 Redesigning to a list

Output:

designs |> diagnose_design() |> tidy()

N	m	diagnosand	estimate	std.error	conf.low	conf.high
100	0.0	mean_estimand	0.00	0.00	0.00	0.00
100	0.0	mean_estimate	0.00	0.00	-0.01	0.01
100	0.0	bias	0.00	0.00	-0.01	0.01
100	0.0	sd_estimate	0.10	0.00	0.10	0.11
200	0.0	mean_estimand	0.00	0.00	0.00	0.00
200	0.0	mean_estimate	0.00	0.00	-0.01	0.00
200	0.1	mean_estimand	0.10	0.00	0.10	0.10
200	0.1	mean_estimate	0.10	0.00	0.09	0.10
300	0.2	bias	0.00	0.00	0.00	0.00
300	0.2	sd_estimate	0.06	0.00	0.05	0.06
300	0.2	rmse	0.06	0.00	0.05	0.06
300	0.2	power	0.93	0.01	0.91	0.95
300	0.2	coverage	0.95	0.01	0.92	0.97

2.6.8 Graphing after redesigning to a list

Graphing after redesign is easy:

designs |> diagnose_design() |> 
  tidy() |>
  filter(diagnosand %in% c("power", "rmse")) |> 
  ggplot(aes(N, estimate, color = factor(m))) + 
  geom_line() + 
  facet_wrap(~diagnosand)

Power depends on N and m, rmse depends on N only

2.7 Using a design

What can you do with a design once you have it?

2.7.1 Using a design

We motivate with a slightly more complex experimental design (more on the components of this later)

b <-1
N <- 100
design <- 
  declare_model(N = N, 
                U = rnorm(N), 
                potential_outcomes(Y ~ b * Z + U)) + 
  declare_assignment(Z = simple_ra(N), 
                     Y = reveal_outcomes(Y ~ Z)) + 
  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) + 
  declare_estimator(Y ~ Z, inquiry = "ate")

2.7.2 Make data from the design

data <- draw_data(design)

data |> head () |> kable() |> kable_styling(font_size = 20)

ID	U	Y_Z_0	Y_Z_1	Z	Y
001	0.1894884	0.1894884	1.1894884	0	0.1894884
002	-0.7969571	-0.7969571	0.2030429	0	-0.7969571
003	-0.3575031	-0.3575031	0.6424969	1	0.6424969
004	-0.2300527	-0.2300527	0.7699473	0	-0.2300527
005	-0.4669124	-0.4669124	0.5330876	0	-0.4669124
006	-0.6736014	-0.6736014	0.3263986	0	-0.6736014

2.7.3 Make data from the design

Play with the data:

lm_robust(Y ~ Z, data = data)

term	estimate	std.error	statistic	p.value	conf.low	conf.high	df	outcome
(Intercept)	-0.15	0.14	-1.07	0.29	-0.42	0.13	98	Y
Z	1.09	0.20	5.42	0.00	0.69	1.49	98	Y

2.7.4 Draw estimands

draw_estimands(design) |>
  kable(digits = 2) |> 
  kable_styling(font_size = 20)

inquiry	estimand
ate	1

2.7.5 Draw estimates

draw_estimates(design) |> 
  kable(digits = 2) |> 
  kable_styling(font_size = 20)

estimator	term	estimate	std.error	statistic	p.value	conf.low	conf.high	df	outcome	inquiry
estimator	Z	0.82	0.18	4.49	0	0.46	1.18	98	Y	ate

2.7.6 Get estimates

Using your actual data:

get_estimates(design, data) |>
  kable(digits = 2) |> 
  kable_styling(font_size = 20)

estimator	term	estimate	std.error	statistic	p.value	conf.low	conf.high	df	outcome	inquiry
estimator	Z	1.09	0.2	5.42	0	0.69	1.49	98	Y	ate

2.7.7 Simulate design

simulate_design(design, sims = 3) |>
  kable(digits = 2) |> 
  kable_styling(font_size = 16)

design	sim_ID	inquiry	estimand	estimator	term	estimate	std.error	statistic	conf.low	conf.high	df	outcome
design	1	ate	1	estimator	Z	0.91	0.22	4.07	0.47	1.35	98	Y
design	2	ate	1	estimator	Z	0.92	0.20	4.52	0.52	1.32	98	Y
design	3	ate	1	estimator	Z	1.19	0.20	5.90	0.79	1.59	98	Y

2.7.8 Diagnose design

design |> 
  diagnose_design(sims = 100)

Mean Estimate	Bias	SD Estimate	RMSE	Power	Coverage
1.01	0.01	0.20	0.19	1.00	0.94
(0.02)	(0.02)	(0.02)	(0.02)	(0.00)	(0.03)

2.7.9 Redesign

new_design <-
  
  design |> redesign(b = 0)

Modify any arguments that are explicitly called on by design steps.
Or add, remove, or replace steps

2.7.10 Compare designs

compare_diagnoses(design1 = design,
                  design2 = redesign(design, N = 50))

diagnosand	mean_1	mean_2	mean_difference	conf.low	conf.high
mean_estimand	0.50	0.50	0.00	0.00	0.00
mean_estimate	0.48	0.50	0.02	-0.01	0.04
bias	-0.02	0.00	0.02	-0.01	0.04
sd_estimate	0.28	0.20	-0.08	-0.10	-0.06
rmse	0.28	0.20	-0.08	-0.10	-0.06
power	0.38	0.71	0.32	0.26	0.37
coverage	0.97	0.96	-0.01	-0.04	0.01

3 `DeclareDesign`: A deeper dive

3.1 Steps in an experimental design

We start with a simple experimental design with all four elements of MIDA and then show ways to extend.

Variations to M and I are supported by the fabricatr package (and others)
Variations to D are supported by the randomizr package (and others)
Variations to A are supported by the estimatr package (and others)

3.1.1 A simple experimental design

N <- 100
b <- .5

design <- 
  declare_model(N = N, U = rnorm(N), 
                potential_outcomes(Y ~ b * Z + U)) + 
  declare_assignment(Z = simple_ra(N), Y = reveal_outcomes(Y ~ Z)) + 
  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) + 
  declare_estimator(Y ~ Z, inquiry = "ate", .method = lm_robust)

New elements:

declare_model can be used much like mutate with multiple columns created in sequence
the potential_outcomes function is a special function that creates potential outcome columns for different values of Z
when you assign a treatment that affects an outcome you can use reveal_outcome to reveal the outcome; Z and Y are default names
when you declare an estimator you should normally include a label, specify an inquiry, and provide the method to be used (lm_robust is default)

3.1.2 Steps: Order matters

e.g. If you sample before defining the inquiry you get a different inquiry to if you sample after you define the inquiry

design_1 <- 
  declare_model(N = 1000, X = rep(0:1, N/2), Y = X + rnorm(N)) + 
  declare_sampling(S= strata_rs(strata = X, strata_prob = c(.2, .8))) +
  declare_inquiry(m = mean(Y))

design_1 |> draw_estimands()

  inquiry  estimand
1       m 0.7606704

3.1.3 Steps: Order matters

e.g. If you sample before defining the inquiry you get a different inquiry to if you sample after you define the inquiry

design_2 <- 
  declare_model(N = 1000, X = rep(0:1, N/2), Y = X + rnorm(N)) + 
  declare_inquiry(m = mean(Y)) +
  declare_sampling(S= strata_rs(strata = X, strata_prob = c(.2, .8))) 

design_2 |> draw_estimands()

  inquiry  estimand
1       m 0.4957891

3.2 M: Key extensions to model declaration

3.2.1 Hierarchical data

You can generate hierarchical data like this:

M <- 
  declare_model(
    households = add_level(
      N = 100, 
      N_members = sample(c(1, 2, 3, 4), N, 
                         prob = c(0.2, 0.3, 0.25, 0.25), replace = TRUE)
    ),
    individuals = add_level(
      N = N_members, 
      age = sample(18:90, N, replace = TRUE)
    )
  )

3.2.2 Hierarchical data

You can generate hierarchical data like this:

M() |> head() |> kable(digits = 2) |> kable_styling(font_size = 20)

households	N_members	individuals	age
001	1	001	79
002	2	002	69
002	2	003	19
003	3	004	21
003	3	005	37
003	3	006	64

3.2.3 Panel data

You can generate panel data like this:

M <- 
  declare_model(
    countries = add_level(
      N = 196, 
      country_shock = rnorm(N)
    ),
    years = add_level(
      N = 100, 
      time_trend = 1:N,
      year_shock = runif(N, 1, 10), 
      nest = FALSE
    ),
    observation = cross_levels(
      by = join_using(countries, years),
      observation_shock = rnorm(N),
      Y = 0.01 * time_trend + country_shock + year_shock + observation_shock 
    )
  )

3.2.4 Panel data

You can generate panel data like this:

M() |> head() |> kable(digits = 2) |> kable_styling(font_size = 20)

countries	country_shock	years	time_trend	year_shock	observation	observation_shock	Y
001	0.39	001	1	4.66	00001	-1.40	3.66
002	0.09	001	1	4.66	00002	0.18	4.95
003	-1.85	001	1	4.66	00003	0.75	3.58
004	-1.92	001	1	4.66	00004	-2.06	0.69
005	-0.11	001	1	4.66	00005	1.13	5.69
006	0.07	001	1	4.66	00006	0.88	5.63

3.2.5 Preexisting data

M <- 
  declare_model(
    data = baseline_data,
    attitudes = sample(1:5, N, replace = TRUE)
  )

3.2.6 Steps

You can repeat steps and play with the order, always conscious of the direction of the pipe

design <- 
  declare_model(N = N, X = rep(0:1, N/2)) +
  declare_model(U = rnorm(N), potential_outcomes(Y ~ b * Z * X + U)) + 
  declare_assignment(Z = block_ra(blocks = X), Y = reveal_outcomes(Y ~ Z)) + 
  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) + 
  declare_inquiry(cate = mean(Y_Z_1[X==0] - Y_Z_0[X==0])) + 
  declare_estimator(Y ~ Z, inquiry = "ate", label = "ols") + 
  declare_estimator(Y ~ Z*X, inquiry = "cate", label = "fe")

3.2.7 You can generate multiple columns together

M2 <-
  declare_model(
    draw_multivariate(c(X1, X2) ~ MASS::mvrnorm(
      n = 1000,
      mu = c(0, 0),
      Sigma = matrix(c(1, 0.3, 0.3, 1), nrow = 2)
    )))

3.2.8 You can generate multiple columns together

M2() |> head() |> kable(digits = 2) |> kable_styling(font_size = 28)

X1	X2
0.42	-0.70
1.54	0.28
-0.62	-0.35
-0.60	0.07
0.79	0.79
0.84	0.96

3.2.9 Cluster structures with cluster correlations

M <-
  declare_model(households = add_level(N = 1000),
                individuals = add_level(
                  N = 4,
                  X = draw_normal_icc(
                    mean = 0,
                    clusters = households,
                    ICC = 0.65
                  )
                ))

3.2.10 Cluster structures with cluster correlations

model <- lm_robust(X ~ households, data = M())
model$adj.r.squared

[1] 0.6215795

3.3 I: Inquiries

3.3.1 Inquiries using predefined potential outcomes

Many causal inquiries are simple summaries of potential outcomes:

Inquiry	Units	Code
Average treatment effect in a finite population (PATE)	Units in the population	`mean(Y_D_1 - Y_D_0)`
Conditional average treatment effect (CATE) for X = 1	Units for whom X = 1	`mean(Y_D_1[X == 1] - Y_D_0[X == 1])`
Complier average causal effect (CACE)	Complier units	`mean(Y_D_1[D_Z_1 > D_Z_0] - Y_D_0[D_Z_1 > D_Z_0])`
Causal interactions of $D_1$ and $D_2$	Units in the population	`mean((Y_D1_1_D2_1 - Y_D1_0_D2_1) - (Y_D1_1_D2_0 - Y_D1_0_D2_0))`

Generating potential outcomes columns gets you far

3.3.2 Inquiries using functions

Often though we need to define inquiries as a function of continuous variables. For this generating a potential outcomes function can make life easier. This helps for:

Continuous quantities
Spillover quantities
Complex counterfactuals

3.3.3 Complex counterfactuals

Here is an example of using functions to define complex counterfactuals:

f_M <- function(X, UM) 1*(UM < X)
f_Y <- function(X, M, UY) X + M - .4*X*M + UY

design <- 
  declare_model(N = 100,
                X = simple_rs(N),
                UM = runif(N),
                UY = rnorm(N),
                M = f_M(X, UM),
                Y = f_Y(X, M, UY)) +
  declare_inquiry(Q1 = mean(f_Y(1, f_M(0, UM), UY) - f_Y(0, f_M(0, UM), UY)))

design |> draw_estimands() |> kable() |> kable_styling(font_size = 20)

inquiry	estimand
Q1	1

3.3.4 Complex counterfactuals

Here is an example of using functions to define effects of continuous treatments.

f_Y <- function(X, UY) X - .25*X^2 + UY

design <- 
  declare_model(N = 100,
                X  = rnorm(N),
                UY = rnorm(N),
                Y = f_Y(X, UY)) +
  declare_inquiry(
    Q1 = mean(f_Y(X+1, UY) - f_Y(X, UY)),
    Q2 = mean(f_Y(1, UY) - f_Y(0, UY)),
    Q3 = (lm_robust(Y ~ X)|> tidy())[2,2]
    )

design |> draw_estimands() |> kable() |> kable_styling(font_size = 20)

inquiry	estimand
Q1	0.7237174
Q2	0.7500000
Q3	1.2303680

which one is the ATE?

3.4 D

3.4.1 Assignment schemes

The randomizr package has a set of functions for different types of block and cluster assignments.

Simple random assignment: “Coin flip” or Bernoulli random assignment. All units have the same probability of assignment: simple_ra(N = 100, prob = 0.25)
Complete random assignment: Exactly m of N units are assigned to treatment, and all units have the same probability of assignment m/N complete_ra(N = 100, m = 40)

3.4.2 Assignment schemes

Block random assignment: Complete random assignment within pre-defined blocks. Units within the same block have the same probability of assignment $m_b / N_b$: block_ra(blocks = regions)
Cluster random assignment: Whole groups of units are assigned to the same treatment condition. cluster_ra(clusters = households) * Block-and-cluster assignment: Cluster random assignment within blocks of clusters block_and_cluster_ra(blocks = regions, clusters = villages)

3.4.3 Assignment schemes

You can combine these in various ways. For examples with saturation random assignment first clusters are assigned to a saturation level, then units within clusters are assigned to treatment conditions according to the saturation level:

saturation = cluster_ra(clusters = villages, 
                        conditions = c(0, 0.25, 0.5, 0.75))

block_ra(blocks = villages, prob_unit = saturation)

3.5 A

3.5.1 A: Answers: terms

By default declare_estimates() assumes you are interested in the first term after the constant from the output of an estimation procedure.

But you can say what you are interested in directly using term and you can also associate different terms with different quantities of interest using inquiry.

design <-
  declare_model(
    N = 100,
    X1 = rnorm(N),
    X2 = rnorm(N),
    X3 = rnorm(N),
    Y = X1 - X2 + X3 + rnorm(N)
  ) +
  declare_inquiries(ate_2 = -1, ate_3 = 1) +
  declare_estimator(Y ~ X1 + X2 + X3,
                    term = c("X2", "X3"),
                    inquiry = c("ate_2", "ate_3"))

design  |> run_design()  |> kable(digits = 2) |> kable_styling(font_size = 20)

inquiry	estimand	term	estimator	estimate	std.error	statistic	p.value	conf.low	conf.high	df	outcome
ate_2	-1	X2	estimator	-0.94	0.13	-7.36	0	-1.19	-0.69	96	Y
ate_3	1	X3	estimator	0.96	0.11	9.09	0	0.75	1.17	96	Y

3.5.2 A: Answers: terms

Sometimes it can be confusing what the names of a term is but you can figure this by running the estimation strategy directly. Here’s an example where the names of a term might be confusing.

lm_robust(Y ~ A*B, 
          data = data.frame(A = rep(c("a",  "b"), 3), 
                            B = rep(c("p", "q"), each = 3), 
                            Y = rnorm(6))) |>
  coef() |> kable() |> kable_styling(font_size = 20)

	x
(Intercept)	-0.2010915
Ab	0.0874179
Bq	-0.6500188
Ab:Bq	0.3296254

The names as they appear in the output here is the name of the term that declare_estimator will look for.

3.5.3 A: Answers: other packages

DeclareDesign works natively with estimatr but you you can use whatever packages you like. You do have to make sure though that estimatr gets as input a nice tidy dataframe of estimates, and that might require some tidying.

design <- 
  declare_model(N = 1000, U = runif(N), 
                potential_outcomes(Y ~ as.numeric(U < .5 + Z/3))) + 
  declare_assignment(Z = simple_ra(N), Y = reveal_outcomes(Y ~ Z)) + 
  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) + 
  declare_estimator(Y ~ Z, inquiry = "ate", 
                    .method = glm, 
                    family = binomial(link = "probit"))

Note that we passed additional arguments to glm; that’s easy.

It’s not a good design though. Just look at the diagnosis:

3.5.4 A: Answers: other packages

diagnose_design(design)

if(run)
  diagnose_design(design) |> write_rds("saved/probit.rds")

read_rds("saved/probit.rds") |> 
  reshape_diagnosis() |>
  kable() |> 
  kable_styling(font_size = 20)

Design	Inquiry	Estimator	Term	N Sims	Mean Estimand	Mean Estimate	Bias	SD Estimate	RMSE	Power	Coverage
design	ate	estimator	Z	500	0.33	0.97	0.64	0.09	0.64	1.00	0.00
					(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)

Why is it so terrible?

3.5.5 A: Answers: other packages

Because the probit estimate does not target the ATE directly; you need to do more work to get there.

You essentially have to write a function to get the estimates, calculate the quantity of interest and other stats, and turn these into a nice dataframe.

Luckily you can use the margins package with tidy to create a .summary function which you can pass to declare_estimator to do all this for you

tidy_margins <- function(x) 
  broom::tidy(margins::margins(x, data = x$data), conf.int = TRUE)

design <- design +  
  declare_estimator(Y ~ Z, inquiry = "ate", 
                    .method = glm, 
                    family = binomial(link = "probit"),
                    .summary = tidy_margins,
                    label = "margins")

3.5.6 A: Answers: other packages

Design	Inquiry	Estimator	Term	N Sims	Mean Estimand	Mean Estimate	Bias	SD Estimate	RMSE	Power	Coverage
design	ate	estimator	Z	500	0.33	0.97	0.64	0.09	0.64	1.00	0.00
					(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)
design	ate	margins	Z	500	0.33	0.31	-0.02	0.02	0.03	1.00	0.90
					(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.01)

Much better

3.6 Advanced diagnosis

3.6.1 diagnosands code

DeclareDesign:::default_diagnosands

    mean_estimand <- mean(estimand)
    mean_estimate <- mean(estimate)
    bias <- mean(estimate - estimand)
    sd_estimate <- sd(estimate)
    rmse <- sqrt(mean((estimate - estimand)^2))
    power <- mean(p.value <= alpha)
    coverage <- mean(estimand <= conf.high & estimand >= conf.low)

3.6.2 More diagnosands

    mean_se = mean(std.error)
    type_s_rate = mean((sign(estimate) != sign(estimand))[p.value <= alpha])
    exaggeration_ratio = mean((estimate/estimand)[p.value <= alpha])
    var_estimate = pop.var(estimate)
    mean_var_hat = mean(std.error^2)
    prop_pos_sig = estimate > 0 & p.value <= alpha
    mean_ci_length = mean(conf.high - conf.low)

3.6.3 Custom diagnosands

my_diagnosands <-
  declare_diagnosands(median_bias = median(estimate - estimand))

diagnose_design(simplest_design, diagnosands = my_diagnosands, sims = 10) |>
  reshape_diagnosis() |> kable() |> kable_styling(font_size = 20)

Design	Inquiry	Estimator	Outcome	Term	N Sims	Median Bias
simplest_design	Q	estimator	Y	(Intercept)	10	0.03
						(0.04)

3.6.4 Adding diagnosands to a design

simplest_design <- 
  set_diagnosands(simplest_design, my_diagnosands)

simplest_design |> diagnose_design(sims = 10)|>
  reshape_diagnosis() |> kable() |> kable_styling(font_size = 20)

Design	Inquiry	Estimator	Outcome	Term	N Sims	Median Bias
simplest_design	Q	estimator	Y	(Intercept)	10	0.00
						(0.04)

3.6.5 Diagnosing in groups

You can partition the simulations data frame into groups before calculating diagnosands.

grouped_diagnosis <- 
  
  simplest_design |>
  diagnose_design(
    make_groups = vars(significant = p.value <= 0.05),
    sims = 500
  )

Design	Significant	N Sims	Mean Estimand	Mean Estimate	Bias	SD Estimate	RMSE	Power	Coverage
design_1	FALSE	474	0.00	-0.00	-0.00	0.09	0.09	0.00	1.00
			(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)
design_1	TRUE	26	0.00	-0.02	-0.02	0.23	0.23	1.00	0.00
			(0.00)	(0.04)	(0.04)	(0.01)	(0.01)	(0.00)	(0.00)

Note especially the mean estimate, the power, the coverage, the RMSE, and the bias. (Bias is not large because we have both under and over estimates)

3.6.6 Significance filter

grouped_diagnosis$simulations_df |>
  ggplot(aes(estimate, p.value, color = significant)) + geom_point()

3.6.7 Multistage simulation

Usually a design simulation simulates “from the top”: going from the beginning to the end of the design in each run and repeating
But sometimes you might want to follow a tree like structure and simulate different steps a different number of times

3.6.8 Multistage simulation illustration

Consider for instance this sampling design:

sampling_design <- 
  
  declare_model(N = 500, Y = 1 + rnorm(N, sd = 10)) +
  declare_inquiry(Q = mean(Y)) +
  declare_sampling(S = complete_rs(N = N, n = 100)) + 
  declare_estimator(Y ~ 1)

3.6.9 Multistage simulation illustration

Compare these two diagnoses:

diagnosis_1 <- diagnose_design(sampling_design, sims = c(5000, 1, 1, 1)) 
diagnosis_2 <- diagnose_design(sampling_design, sims = c(1, 1, 5000, 1))

diagnosis	N Sims	Mean Estimand	Mean Estimate	Bias	SD Estimate	RMSE	Power	Coverage
diagnosis_1	5000	1.00	1.00	-0.00	1.01	0.90	0.17	0.97
diagnosis_1		(0.01)	(0.01)	(0.01)	(0.01)	(0.01)	(0.01)	(0.00)
diagnosis_2	5000	0.73	0.75	0.02	0.92	0.92	0.08	0.97
diagnosis_2		(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)

In the second, the estimand is drawn just once. The SD of the estimate is lower. But the RMSE is not very different.

3.6.10 Illustration with tailored estimator and diagnosand

my_estimator <- function(data) 
  data.frame(outcome = "Y", estimate = mean(data$Y), std.error = 0)

design <-
  declare_model(N = 5000, Y = rnorm(N)) +
  declare_inquiry(Y_bar_pop = mean(Y, 1)) +
  declare_sampling(S = complete_rs(N = N, n = 500)) +
  declare_inquiry(Y_bar_sample = mean(Y)) +
  declare_estimator(Y ~ 1, inquiry = "Y_bar_pop",label = "ols") +
  declare_estimator(handler = label_estimator(my_estimator),
                    inquiry = "Y_bar_sample",
                    label = "mean")
my_diagnosands <-
  declare_diagnosands(
    bias = mean(estimate - estimand),
    rmse = mean((estimate - estimand)^2)^.5,
    mean_se = mean(std.error))

3.6.11 Diagnosis

diagnose_design(design, diagnosands = my_diagnosands)

Design	Inquiry	Estimator	Outcome	Term	N Sims	Bias	RMSE	Mean Se
design	Y_bar_pop	ols	Y	(Intercept)	500	0.00	0.04	0.04
						(0.00)	(0.00)	(0.00)
design	Y_bar_sample	mean	Y	NA	500	0.00	0.00	0.00
						(0.00)	(0.00)	(0.00)

3.7 Advanced Redesign

3.7.1 Redesign with vector arguments

When redesigning with arguments that are vectors, use list() in redesign, with each list item representing a design you wish to create

prob_each <- c(.1, .5, .4)

design_multi  <- 
  declare_model(N = 10) +
  declare_assignment(Z = complete_ra(N = N, prob_each = prob_each))

## returns two designs

designs <- design_multi |> 
  redesign(prob_each = list(c(.2, .5, .3), c(0, .5, .5)))
  
designs |> lapply(draw_data)

3.7.2 Redesign warnings

A parameter has to be called correctly. And you get no warning if you misname.

simplest_design_N  |> redesign(n = 200) |> draw_data() |> nrow()

[1] 100

why not 200?

3.7.3 Redesign warnings

A parameter has to be called explicitly

N <- 100

my_N <- function(n = N) n

simplest_design_N2 <- 
  
  declare_model(N = my_N(), Y = rnorm(N)) +
  declare_inquiry(Q = 0) +
  declare_estimator(Y ~ 1)

simplest_design_N2 |> redesign(N = 200) |> draw_data() |> nrow()

[1] 100

why not 200?

3.7.4 Redesign warnings

A parameter has to be called explicitly

N <- 100

my_N <- function(n = N) n

simplest_design_N2 <- 
  
  declare_model(N = my_N(N), Y = rnorm(N)) +
  declare_inquiry(Q = 0) +
  declare_estimator(Y ~ 1)

simplest_design_N2 |> redesign(N = 200) |> draw_data() |> nrow()

[1] 200

3.7.5 Redesign with a function

Here is an example of redesigning where the “parameter” is a function

new_N <- function(n, factor = 1.31) n*factor

simplest_design_N2 |> redesign(my_N = new_N) |> draw_data() |> nrow()

[1] 131

4 Assignments with `DeclareDesign`

4.1 Running example

4.1.1 A design: Multilevel data

A design with hierarchical data and different assignment schemes.

design <- 
  declare_model(
    school = add_level(N = 16, 
                       u_school = rnorm(N, mean = 0)),     
    classroom = add_level(N = 4,    
                  u_classroom = rnorm(N, mean = 0)),
    student =  add_level(N = 20,    
                         u_student = rnorm(N, mean = 0))
    ) +
  declare_model(
    potential_outcomes(Y ~ .1*Z + u_classroom + u_student + u_school)
    ) +
  declare_assignment(Z = simple_ra(N)) + 
  declare_measurement(Y = reveal_outcomes(Y ~ Z))  +
  declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) + 
  declare_estimator(Y ~ Z, .method = difference_in_means)

4.1.2 Sample data

Here are the first couple of rows and columns of the resulting data frame.

my_data <- draw_data(design)
kable(head(my_data), digits = 2)

school	u_school	classroom	u_classroom	student	u_student	Y_Z_0	Y_Z_1	Z	Y
01	-1.4	01	-0.2	0001	-0.97	-2.57	-2.47	0	-2.57
01	-1.4	01	-0.2	0002	0.67	-0.93	-0.83	1	-0.83
01	-1.4	01	-0.2	0003	-1.97	-3.57	-3.47	1	-3.47
01	-1.4	01	-0.2	0004	0.52	-1.08	-0.98	0	-1.08
01	-1.4	01	-0.2	0005	0.60	-1.00	-0.90	1	-0.90
01	-1.4	01	-0.2	0006	-0.41	-2.01	-1.91	0	-2.01

4.1.3 Sample data

Here is the distribution between treatment and control:

kable(t(as.matrix(table(my_data$Z))), 
      col.names = c("control", "treatment"))

control	treatment
631	649

4.2 Complete

4.2.1 Complete Random Assignment using the built in function

assignment_complete <-   declare_assignment(Z = complete_ra(N))  

design_complete <- 
  replace_step(design, "assignment", assignment_complete)

4.2.2 Data from complete assignment

We can draw a new set of data and look at the number of subjects in the treatment and control groups.

set.seed(1:5)
data_complete <- draw_data(design_complete)

kable(t(as.matrix(table(data_complete$Z))))

0	1
640	640

4.2.3 Plotted

4.3 Block

4.3.1 Block Random Assignment

The treatment and control group will in expectation contain the same share of students in different classrooms.
But as we saw this does necessarily hold in realization
We make this more obvious by sorting the students by treatment status with schools

4.3.2 Blocked design

assignment_blocked <-   
  declare_assignment(Z = block_ra(blocks = classroom))  

estimator_blocked <- 
  declare_estimator(Y ~ Z, blocks = classroom, 
                    .method = difference_in_means)  

design_blocked <- 
  design |> 
  replace_step("assignment", assignment_blocked) |>
  replace_step("estimator", estimator_blocked)

4.3.3 Illustration of blocked assignment

Note that subjects are sorted here after the assignment to make it easier to see that in this case blocking ensures that exactly 5 students within each classroom are assigned to treatment.

4.4 Clustered

4.4.1 Clustering

But what if all students in a given class have to be assigned the same treatment?

assignment_clustered <- 
  declare_assignment(Z = cluster_ra(clusters = classroom))  
estimator_clustered <- 
  declare_estimator(Y ~ Z, clusters = classroom, 
                    .method = difference_in_means)  


design_clustered <- 
  design |> 
  replace_step("assignment", assignment_clustered) |> 
  replace_step("estimator", estimator_clustered)

4.4.2 Illustration of clustered assignment

4.5 Clustered and Blocked

4.5.1 Clustered and Blocked

assignment_clustered_blocked <-   
  declare_assignment(Z = block_and_cluster_ra(blocks = school,
                                              clusters = classroom))  
estimator_clustered_blocked <- 
  declare_estimator(Y ~ Z, blocks = school, clusters = classroom, 
                    .method = difference_in_means)  


design_clustered_blocked <- 
  design |> 
  replace_step("assignment", assignment_clustered_blocked) |> 
  replace_step("estimator", estimator_clustered_blocked)

4.5.2 Illustration of clustered and blocked assignment

4.6 Comparisons

4.6.1 Illustration of efficiency gains from blocking

designs <- 
  list(
    simple = design, 
    complete = design_complete, 
    blocked = design_blocked, 
    clustered = design_clustered,  
    clustered_blocked = design_clustered_blocked)

diagnoses <- diagnose_design(designs)

4.6.2 Illustration of efficiency gains from blocking

Design	Power	Coverage
simple	0.16	0.95
	(0.01)	(0.01)
complete	0.20	0.96
	(0.01)	(0.01)
blocked	0.42	0.95
	(0.01)	(0.01)
clustered	0.06	0.96
	(0.01)	(0.01)
clustered_blocked	0.08	0.96
	(0.01)	(0.01)

4.6.3 Sampling distributions

diagnoses$simulations_df |> 
  mutate(design = factor(design, c("blocked", "complete", "simple", "clustered_blocked", "clustered"))) |>
  ggplot(aes(estimate)) +
  geom_histogram() + facet_grid(~design)

4.7 Nasty integer issues

4.7.1 The issues

In many designs you seek to assign an integer number of subjects to treatment from some set.
Sometimes however your assignment targets are not integers.

Example:

I have 12 subjects in four blocks of 3 and I want to assign each subject to treatment with a 50% probability.

Two strategies:

I randomly set a target of either 1 or 2 for each block and then do complete assignment in each block. This can result in the numbers treated varying from 4 to 8
I randomly assign a target of 1 for two blocks and 2 for the other two blocks: Intuition–set a floor for the minimal target and then distribue the residual probability across blocks

4.7.2 Nasty integer issues

# remotes::install_github("macartan/probra")
library(probra)

Error in library(probra): there is no package called 'probra'

set.seed(1)

blocks <- rep(1:4, each = 3)

table(blocks, prob_ra(blocks = blocks))

Error in prob_ra(blocks = blocks): could not find function "prob_ra"

table(blocks, block_ra(blocks = blocks))

4.7.3 Nasty integer issues

Can also be used to set targets

# remotes::install_github("macartan/probra")
library(probra)

Error in library(probra): there is no package called 'probra'

set.seed(1)

fabricate(N = 4,  size = c(47, 53, 87, 25), n_treated = prob_ra(.5*size)) %>%
  janitor::adorn_totals("row") |> 
  kable(caption = "Setting targets to get 50% targets with minimal variance")

Error in loadNamespace(x): there is no package called 'janitor'

4.7.4 Nasty integer issues

Can also be used to set for complete assignment with heterogeneous propensities

set.seed(1)

df <- fabricate(N = 100,  p = seq(.1, .9, length = 100), Z = prob_ra(p))

Error in prob_ra(p): could not find function "prob_ra"

mean(df$Z)

[1] NA

df |> ggplot(aes(p, Z)) + geom_point() + theme_bw()

Error in `geom_point()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error:
! object 'p' not found

4.7.5 Design with two units and heterogeneous probabilities

probs <- c(.8, .2)

design <- 
  declare_model(N = 2, 
                Y_Z_1 = c(1, 1),
                Y_Z_0 = c(-1, 1)) +
  declare_inquiry(ATE = 1) +
  declare_assignment(
    Z = prob_ra(prob = probs),
    condition_prs = probs,
    Y = reveal_outcomes(Y ~ Z)) +
  declare_estimator(Y ~ Z, label  = "ht", 
                    .method = horvitz_thompson,
                    condition_prs = condition_prs)

4.7.6 Design with two units and heterogeneous probabilities

Inquiry	Estimator	N Sims	Bias	RMSE
ATE	ht	10000	-0.00	2.00
			(0.02)	(0.02)

Unbiased but very very noisy (simulations also noisy)

4.8 Indirect assignments

Indirect control

4.8.1 Indirect assignments

Indirect assignments are generally generated by applying a direct assignment and then figuring our an implied indirect assignment

set.seed(1)

df <-
  fabricate(
    N = 100, 
    latitude = runif(N),
    longitude = runif(N))

adjacency <- 
  sapply(1:nrow(df), function(i) 
    1*((df$latitude[i] - df$latitude)^2 + (df$longitude[i] - df$longitude)^2)^.5 < .1)

diag(adjacency) <- 0

4.8.2 Indirect assignments

adjacency |>  
  reshape2::melt(c("x", "y"), value.name = "z") |> mutate(z = factor(z)) |>
  ggplot(aes(x=x,y=y,fill=z))+
  geom_tile()

Error in loadNamespace(x): there is no package called 'reshape2'

4.8.3 Indirect assignments

n_assigned <- 50

design <-
  declare_model(data = df) + 
  declare_assignment(
    direct = complete_ra(N, m = n_assigned),
    indirect = 1*(as.vector(as.vector(direct) %*% adjacency) >= 1))

draw_data(design) |> with(table(direct, indirect))

      indirect
direct  0  1
     0 13 37
     1 13 37

4.8.4 Indirect assignments: Properties

indirect_propensities <- replicate(5000, draw_data(design)$indirect) |> 
  apply(1, mean)

4.8.5 Indirect assignments: Properties

df |> ggplot(aes(latitude, longitude, label = round(indirect_propensities_1, 2))) + geom_text()

4.8.6 Indirect assignments: Redesign

replicate(5000, draw_data(design |> redesign(n_assigned = 25))$indirect) |> 
  apply(1, mean)

4.8.7 Indirect assignments: Redesign

df |> ggplot(aes(latitude, longitude, label = round(indirect_propensities_2, 2))) + 
  geom_text()

Looks better: but there are trade offs between the direct and indirect distributions

Figuring out the optimal procedure requires full diagnosis

4.9 Factorial Designs

4.9.1 Factorial Designs

Often when you set up an experiment you want to look at more than one treatment.
Should you do this or not? How should you use your power?

4.9.2 Factorial Designs

Often when you set up an experiment you want to look at more than one treatment.
Should you do this or not? How should you use your power?

Load up:

	$T2=0$	$T2=1$
T1 = 0	$50\%$	$0\%$
T1 = 1	$50\%$	$0\%$

Spread out:

	$T2=0$	$T2=1$
T1 = 0	$25\%$	$25\%$
T1 = 1	$25\%$	$25\%$

4.9.3 Factorial Designs

Often when you set up an experiment you want to look at more than one treatment.
Should you do this or not? How should you use your power?

Three arm it?:

	$T2=0$	$T2=1$
T1 = 0	$33.3\%$	$33.3\%$
T1 = 1	$33.3\%$	$0\%$

Bunch it?:

	$T2=0$	$T2=1$
T1 = 0	$40\%$	$20\%$
T1 = 1	$20\%$	$20\%$

4.9.4 Factorial Designs

Two ways to do factorial assignments in DeclareDesign:

# Block the second assignment
declare_assignment(Z1 = complete_ra(N)) +
declare_assignment(Z2 = block_ra(blocks = Z1)) +
  
# Recode four arms  
declare_assignment(Z = complete_ra(N, num_arms = 4)) +
declare_measurement(Z1 = (Z == "T2" | Z == "T4"),
                      Z2 = (Z == "T3" | Z == "T4"))

4.9.5 Factorial Designs: In practice

In practice if you have a lot of treatments it can be hard to do full factorial designs – there may be too many combinations.
In such cases people use fractional factorial designs, like the one below (5 treatments but only 8 units!)

Variation	T1	T2	T3	T4	T5
1	0	0	0	1	1
2	0	0	1	0	0
3	0	1	0	0	1
4	0	1	1	1	0
5	1	0	0	1	0
6	1	0	1	0	1
7	1	1	0	0	0
8	1	1	1	1	1

4.9.6 Factorial Designs: In practice

Then randomly assign units to rows. Note columns might also be blocking covariates.
In R, look at library(survey)

4.9.7 Factorial Designs: In practice

But be careful: you have to be comfortable with possibly not having any simple counterfactual unit for any unit (invoke sparsity-of-effects principle).

Unit	T1	T2	T3	T4	T5
1	0	0	0	1	1
2	0	0	1	0	0
3	0	1	0	0	1
4	0	1	1	1	0
5	1	0	0	1	0
6	1	0	1	0	1
7	1	1	0	0	0
8	1	1	1	1	1

In R, look at library(survey)

5 Design diagnosis

A focus on power

5.1 Outline

Tests review
$p$ values and significance
Power
Sources of power
Advanced applications

5.2 Tests

5.2.1 Review

In the classical approach to testing a hypothesis we ask:

How likely are we to see data like this if indeed the hypothesis is true?

If the answer is “not very likely” then we treat the hypothesis as suspect.
If the answer is not “not very likely” then the hypothesis is maintained (some say “accepted” but this is tricky as you may want to “maintain” multiple incompatible hypotheses)

How unlikely is “not very likely”?

5.2.2 Weighing Evidence

When we test a hypothesis we decide first on what sort of evidence we need to see in order to decide that the hypothesis is not reliable.

Othello has a hypothesis that Desdemona is innocent.
Iago confronts him with evidence:
- See how she looks at him: would she look a him like that if she were innocent?
- … would she defend him like that if she were innocent?
- … would he have her handkerchief if she were innocent?
- Othello, the chances of all of these things arising if she were innocent is surely less than 5%

5.2.3 Hypotheses are often rejected, sometimes maintained, but rarely accepted

Note that Othello is focused on the probability of the events if she were innocent but not the probability of the events if Iago were trying to trick him.
He is not assessing his belief in whether she is faithful, but rather how likely the data would be if she were faithful.

So:

He assesses: $\Pr(\text{Data} | \text{Hypothesis is TRUE})$
While a Bayesian would assess: $\Pr(\text{Hypothesis is TRUE} | \text{Data})$

5.2.4 Recap: Calculate a $p$ value in your head

Illustrating $p$ values via “randomization inference”
Say you randomized assignment to treatment and your data looked like this.

Unit	1	2	3	4	5	6	7	8	9	10
Treatment	0	0	0	0	0	0	0	1	0	0
Health score	4	2	3	1	2	3	4	8	7	6

Then:

Does the treatment improve your health?
What’s the $p$ value for the null that treatment had no effect on anybody?

5.3 Power

5.4 What power is

Power is just the probability of ~~getting a significant result~~ rejecting a hypothesis.

Simple enough but it presupposes:

A well defined hypothesis
An actual stipulation of the world under which you evaluate the probability
A procedure for producing results and determining of they are significant / rejecting a hypothesis

5.4.1 By hand

I want to test the hypothesis that a six never comes up on this dice.

Here’s my test:

I will roll the dice once.
If a six comes up I will reject the hypothesis.

What is the power of this test?

5.4.2 By hand

I want to test the hypothesis that a six never comes up on this dice.

Here’s my test:

I will roll the dice twice.
If a six comes up either time I will reject the hypothesis.

What is the power of this test?

5.4.3 Two probabilities

Power sometimes seems more complicated because hypothesis rejection involves a calculated probability and so you need the probability of a probability.

I want to test the hypothesis that this dice is fair.

Here’s my test:

I will roll the dice 1000 times and if I see fewer than x 6s or more than y 6s I will reject the hypothesis.

Now:

What should x and y be?
What is the power of this test?

5.4.4 Step 1: When do you reject?

For this we need to figure a rule for rejection. This is based on identifying events that should be unlikely under the hypothesis.

Here is how many 6’s I would expect if the dice is fair:

fabricate(N = 1001, sixes = 0:1000, p = dbinom(sixes, 1000, 1/6)) |>
  ggplot(aes(sixes, p)) + geom_line()

5.4.5 Step 1: When do you reject?

I can figure out from this that 143 or fewer is really very few and 190 or more is really very many:

c(lower = pbinom(143, 1000, 1/6), upper = 1 - pbinom(189, 1000, 1/6))

     lower      upper 
0.02302647 0.02785689

5.4.6 Step 2: What is the power?

Now we need to stipulate some belief about how the world really works—this is not the null hypothesis that we plan to reject, but something that we actually take to be true.
For instance: we think that in fact sixes appear 20% of the time.

Now what’s the probability of seeing at least 190 sixes?

1 - pbinom(189, 1000, .2)

[1] 0.796066

So given I think 6s appear 20% of the time, I think it likely I’ll see at least 190 sixes and reject the hypothesis of a fair dice.

5.4.7 Rule of thumb

80% or 90% is a common rule of thumb for “sufficient” power
but really, how much power you need depends on the purpose

5.5 Power via design diagnosis

5.5.1 The good

Is arbitrarily flexible

N <- 100
b <- .5

design <- 
  declare_model(N = N, 
    U = rnorm(N),
    potential_outcomes(Y ~ b * Z + U)) + 
  declare_assignment(Z = simple_ra(N),
                     Y = reveal_outcomes(Y ~ Z)) + 
  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) + 
  declare_estimator(Y ~ Z, inquiry = "ate", .method = lm_robust)

5.5.2 Run it many times

sims_1 <- simulate_design(design) 

sims_1 |> select(sim_ID, estimate, p.value)

sim_ID	estimate	p.value
1	0.81	0.00
2	0.40	0.04
3	0.88	0.00
4	0.72	0.00
5	0.38	0.05
6	0.44	0.02

5.5.3 Power is mass of the sampling distribution of decisions under the model

sims_1 |>
  ggplot(aes(p.value)) + 
  geom_histogram() +
  geom_vline(xintercept = .05, color = "red")

5.5.4 Power is mass of the sampling distribution of decisions under the model

Obviously related to the estimates you might get

sims_1 |>
  mutate(significant = p.value <= .05) |>
  ggplot(aes(estimate, p.value, color = significant)) + 
  geom_point()

5.5.5 Check coverage is correct

sims_1 |>
  mutate(within = (b > sims_1$conf.low) & (b < sims_1$conf.high)) |> 
  pull(within) |> mean()

[1] 0.9573333

5.5.6 Check validity of $p$ value

A valid $p$-value satisfies $\Pr(p≤x)≤x$ for every $x \in[0,1]$ (under the null)

sims_2 <- 
  
  redesign(design, b = 0) |>
  
  simulate_design()

5.5.7 Design diagnosis does it all (over multiple designs)

  diagnose_design(design)

Mean Estimate	Bias	SD Estimate	RMSE	Power	Coverage
0.50	0.00	0.20	0.20	0.70	0.95
(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)

5.5.8 Design diagnosis does it all

design |>
  redesign(b = c(0, 0.25, 0.5, 1)) |>
  diagnose_design()

b	Mean Estimate	Bias	SD Estimate	RMSE	Power	Coverage
0	-0.00	-0.00	0.20	0.20	0.05	0.95
	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)
0.25	0.25	-0.00	0.20	0.20	0.23	0.95
	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)
0.5	0.50	0.00	0.20	0.20	0.70	0.95
	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)
1	1.00	0.00	0.20	0.20	1.00	0.95
	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.00)

5.5.9 Diagnose over multiple moving parts (and ggplot)

design |>
  ## Redesign
  redesign(b = c(0.1, 0.3, 0.5), N = 100, 200, 300) |>
  ## Diagnosis
  diagnose_design() |>
  ## Prep
  tidy() |>
  filter(diagnosand == "power") |>
  ## Plot
  ggplot(aes(N, estimate, color = factor(b))) +
  geom_line()

5.5.10 Diagnose over multiple moving parts (and ggplot)

5.5.11 Diagnose over multiple moving parts and multiple diagnosands (and ggplot)

design |>

  ## Redesign
  redesign(b = c(0.1, 0.3, 0.5), N = 100, 200, 300) |>
  
  ## Diagnosis
  diagnose_design() |>
  
  ## Prep
  tidy() |>
  
  ## Plot
  ggplot(aes(N, estimate, color = factor(b))) +
  geom_line()+
  facet_wrap(~diagnosand)

5.5.12 Diagnose over multiple moving parts and multiple diagnosands (and ggplot)

5.6 Beyond basics

5.6.1 Power tips

coming up:

power everywhere
power with bias
power with the wrong standard errors
power with uncertainty over effect sizes
power and multiple comparisons

5.6.2 Power depends on all parts of MIDA

We often focus on sample sizes

But

Power also depends on

the model – obviously signal to noise
the assignments and specifics of sampling strategies
estimation procedures

5.6.3 Power from a lag?

Say we have access to a “pre” measure of outcome Y_now; call it Y_base. Y_base is informative about potential outcomes. We are considering using Y_now - Y_base as the outcome instead of Y_now.

N <- 100
rho <- .5

design <- 
  declare_model(N,
                 Y_base = rnorm(N),
                 Y_Z_0 = 1 + correlate(rnorm, given = Y_base, rho = rho),
                 Y_Z_1 = correlate(rnorm, given = Y_base, rho = rho),
                 Z = complete_ra(N),
                 Y_now = Z*Y_Z_1 + (1-Z)*Y_Z_0,
                 Y_change = Y_now - Y_base) +
  declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) +
  declare_estimator(Y_now ~ Z, label = "level") +
  declare_estimator(Y_change ~ Z, label = "change")+
  declare_estimator(Y_now ~ Z + Y_base, label = "RHS")

5.6.4 Power from a lag?

design |> redesign(N = c(10, 100, 1000, 10000), rho = c(.1, .5, .9)) |>
  diagnose_design()

5.6.5 Power from a lag?

Punchline:

if you difference: the lag has to be sufficiently information to pay its way (the $\rho = .5$ equivalent between level and change follows from Gerber and Green (2012) equation 4.6)
The right hand side is your friend, at least for experiments (Ding and Li (2019))
As $N$ grows the stakes fall

5.6.6 Power when estimates are biased

bad_design <- 
  
  declare_model(N = 100, 
    U = rnorm(N),
    potential_outcomes(Y ~ 0 * X + U, conditions = list(X = 0:1)),
    X = ifelse(U > 0, 1, 0)) + 
  
  declare_measurement(Y = reveal_outcomes(Y ~ X)) + 
  
  declare_inquiry(ate = mean(Y_X_1 - Y_X_0)) + 
  
  declare_estimator(Y ~ X, inquiry = "ate", .method = lm_robust)

5.6.7 Power when estimates are biased

You can see from the null design that power is great but bias is terrible and coverage is way off.

diagnose_design(bad_design)

Mean Estimate	Bias	SD Estimate	RMSE	Power	Coverage
1.59	1.59	0.12	1.60	1.00	0.00
(0.01)	(0.01)	(0.00)	(0.01)	(0.00)	(0.00)

Power without unbiasedness corrupts, absolutely

5.6.8 Power with a more subtly biased experimental design

another_bad_design <- 
  
  declare_model(
    N = 100, 
    female = rep(0:1, N/2),
    U = rnorm(N),
    potential_outcomes(Y ~ female * Z + U)) + 
  
  declare_assignment(
    Z = block_ra(blocks = female, block_prob = c(.1, .5)),
    Y = reveal_outcomes(Y ~ Z)) + 

  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) + 
  
  declare_estimator(Y ~ Z + female, inquiry = "ate", 
                    .method = lm_robust)

  diagnose_design(another_bad_design)

5.6.9 Power with a more subtly biased experimental design

You can see from the null design that power is great but bias is terrible and coverage is way off.

Mean Estimate	Bias	SD Estimate	RMSE	Power	Coverage
0.76	0.26	0.24	0.35	0.84	0.85
(0.01)	(0.01)	(0.01)	(0.01)	(0.01)	(0.02)

5.6.10 Power with the wrong standard errors

clustered_design <-
  declare_model(
    cluster = add_level(N = 10, cluster_shock = rnorm(N)),
    individual = add_level(
        N = 100,
        Y_Z_0 = rnorm(N) + cluster_shock,
        Y_Z_1 = rnorm(N) + cluster_shock)) +
  declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) +
  declare_assignment(Z = cluster_ra(clusters = cluster)) +
  declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
  declare_estimator(Y ~ Z, inquiry = "ATE")

Mean Estimate	Bias	SD Estimate	RMSE	Power	Coverage
-0.00	-0.00	0.64	0.64	0.79	0.20
(0.01)	(0.01)	(0.01)	(0.01)	(0.01)	(0.01)

What alerts you to a problem?

5.6.11 Let’s fix that one

clustered_design_2  <-
  clustered_design |> replace_step(5, 
  declare_estimator(Y ~ Z, clusters = cluster))

Mean Estimate	Bias	SD Estimate	RMSE	Power	Coverage
0.00	-0.00	0.66	0.65	0.06	0.94
(0.02)	(0.02)	(0.01)	(0.01)	(0.01)	(0.01)

5.6.12 Power when you are not sure about effect sizes (always!)

you can do power analysis for multiple stipulations
or you can design with a distribution of effect sizes

design_uncertain <-
  declare_model(N = 1000, b = 1+rnorm(1), Y_Z_1 = rnorm(N), Y_Z_2 = rnorm(N) + b, Y_Z_3 = rnorm(N) + b) +
  declare_assignment(Z = complete_ra(N = N, num_arms = 3, conditions = 1:3)) +
  declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
  declare_inquiry(ate = mean(b)) +
  declare_estimator(Y ~ factor(Z), term = TRUE)

draw_estimands(design_uncertain)

  inquiry   estimand
1     ate -0.3967765

draw_estimands(design_uncertain)

  inquiry  estimand
1     ate 0.7887188

5.6.13 Multiple comparisons correction (complex code)

Say I run two tests and want to correct for multiple comparisons.

Two approaches. First, by hand:

b = .2

design_mc <-
  declare_model(N = 1000, Y_Z_1 = rnorm(N), Y_Z_2 = rnorm(N) + b, Y_Z_3 = rnorm(N) + b) +
  declare_assignment(Z = complete_ra(N = N, num_arms = 3, conditions = 1:3)) +
  declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
  declare_inquiry(ate = b) +
  declare_estimator(Y ~ factor(Z), term = TRUE)

5.6.14 Multiple comparisons correction (complex code)

design_mc |>
  simulate_designs(sims = 1000) |>
  filter(term != "(Intercept)") |>
  group_by(sim_ID) |>
  mutate(p_bonferroni = p.adjust(p = p.value, method = "bonferroni"),
         p_holm = p.adjust(p = p.value, method = "holm"),
         p_fdr = p.adjust(p = p.value, method = "fdr")) |>
  ungroup() |>
  summarize(
    "Power using naive p-values" = mean(p.value <= 0.05),
    "Power using Bonferroni correction" = mean(p_bonferroni <= 0.05),
    "Power using Holm correction" = mean(p_holm <= 0.05),
    "Power using FDR correction" = mean(p_fdr <= 0.05)
    )

Power using naive p-values	Power using Bonferroni correction	Power using Holm correction	Power using FDR correction
0.7374	0.6318	0.6886	0.7032

5.6.15 Multiple comparisons correction (approach 2)

The alternative approach (generally better!) is to design with a custom estimator that includes your corrections.

my_estimator <- function(data) 
  lm_robust(Y ~ factor(Z), data = data) |> 
  tidy() |>
  filter(term != "(Intercept)") |>
  mutate(p.naive = p.value,
         p.value = p.adjust(p = p.naive, method = "bonferroni"))
  

design_mc_2 <- design_mc |>
  replace_step(5, declare_estimator(handler = label_estimator(my_estimator))) 

run_design(design_mc_2) |> 
  select(term, estimate, p.value, p.naive) |> kable()

term	estimate	p.value	p.naive
factor(Z)2	0.1212598	0.2395434	0.1197717
factor(Z)3	0.2722942	0.0008214	0.0004107

5.6.16 Multiple comparisons correction (Null model case)

Lets try same thing for a null model (using redesign(design_mc_2, b = 0))

design_mc_3 <- 
  design_mc_2 |> 
  redesign(b = 0) 

run_design(design_mc_3) |> select(estimate, p.value, p.naive) |> kable(digits = 3)

estimate	p.value	p.naive
0.159	0.072	0.036
0.110	0.282	0.141

5.6.17 Multiple comparisons correction (Null model case)

…and power:

Mean Estimate	Bias	SD Estimate	RMSE	Power	Coverage
0.00	0.00	0.08	0.08	0.02	0.95
(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.01)
-0.00	-0.00	0.08	0.08	0.02	0.96
(0.00)	(0.00)	(0.00)	(0.00)	(0.00)	(0.01)

bothered?

5.6.18 Big takeaways

Power is affected not just by sample size, variability and effect size but also by you data and analysis strategies.
Try to estimate power under multiple scenarios
Try to use the same code for calculating power as you will use in your ultimate analysis
Basically the same procedure can be used for any design. If you can declare a design and have a test, you can calculate power
Your power might be right but misleading. For confidence:
- Don’t just check power, check bias and coverage also
- Check power especially under the null
Don’t let a focus on power distract you from more substantive diagnosands

6 Observational designs

Introduction to observational strategies using DeclareDesign

6.1 Outline

LATE
Diff in Diff
RDD

6.2 Noncompliance and the LATE estimand

6.2.1 Local Average Treatment Effects

Sometimes you give a medicine but only a nonrandom sample of people actually try to use it. Can you still estimate the medicine’s effect?

	X=0	X=1
Z=0	$\overline{y}_{00}$ ($n_{00}$)	$\overline{y}_{01}$ ($n_{01}$)
Z=1	$\overline{y}_{10}$ ($n_{10}$)	$\overline{y}_{11}$ ($n_{11}$)

Say that people are one of 3 types:

$n_a$ “always takers” have $X=1$ no matter what and have average outcome $\overline{y}_a$
$n_n$ “never takers” have $X=0$ no matter what with outcome $\overline{y}_n$
$n_c$ “compliers have” $X=Z$ and average outcomes $\overline{y}^1_c$ if treated and $\overline{y}^0_c$ if not.

6.2.2 Local Average Treatment Effects

Sometimes you give a medicine but only a non random sample of people actually try to use it. Can you still estimate the medicine’s effect?

	X=0	X=1
Z=0	$\overline{y}_{00}$ ($n_{00}$)	$\overline{y}_{01}$ ($n_{01}$)
Z=1	$\overline{y}_{10}$ ($n_{10}$)	$\overline{y}_{11}$ ($n_{11}$)

We can figure something about types:

	$X=0$	$X=1$
$Z=0$	$\frac{\frac{1}{2}n_c}{\frac{1}{2}n_c + \frac{1}{2}n_n} \overline{y}^0_{c}+\frac{\frac{1}{2}n_n}{\frac{1}{2}n_c + \frac{1}{2}n_n} \overline{y}_{n}$	$\overline{y}_{a}$
$Z=1$	$\overline{y}_{n}$	$\frac{\frac{1}{2}n_c}{\frac{1}{2}n_c + \frac{1}{2}n_a} \overline{y}^1_{c}+\frac{\frac{1}{2}n_a}{\frac{1}{2}n_c + \frac{1}{2}n_a} \overline{y}_{a}$

6.2.3 Local Average Treatment Effects

You give a medicine to 50% but only a non random sample of people actually try to use it. Can you still estimate the medicine’s effect?

	$X=0$	$X=1$
$Z=0$	$\frac{n_c}{n_c + n_n} \overline{y}^0_{c}+\frac{n_n}{n_c + n_n} \overline{y}_n$	$\overline{y}_{a}$
(n)	($\frac{1}{2}(n_c + n_n)$)	($\frac{1}{2}n_a$)
$Z=1$	$\overline{y}_{n}$	$\frac{n_c}{n_c + n_a} \overline{y}^1_{c}+\frac{n_a}{n_c + n_a} \overline{y}_{a}$
(n)	($\frac{1}{2}n_n$)	($\frac{1}{2}(n_a+n_c)$)

Key insight: the contributions of the $a$s and $n$s are the same in the $Z=0$ and $Z=1$ groups so if you difference you are left with the changes in the contributions of the $c$s.

6.2.4 Local Average Treatment Effects

Average in $Z=0$ group: $\frac{{n_c} \overline{y}^0_{c}+ \left(n_{n}\overline{y}_{n} +{n_a} \overline{y}_a\right)}{n_a+n_c+n_n}$

Average in $Z=1$ group: $\frac{{n_c} \overline{y}^1_{c} + \left(n_{n}\overline{y}_{n} +{n_a} \overline{y}_a \right)}{n_a+n_c+n_n}$

So, the difference is the ITT: $({\overline{y}^1_c-\overline{y}^0_c})\frac{n_c}{n}$

Last step:

\[ITT = ({\overline{y}^1_c-\overline{y}^0_c})\frac{n_c}{n}\]

\[\leftrightarrow\]

\[LATE = \frac{ITT}{\frac{n_c}{n}}= \frac{\text{Intent to treat effect}}{\text{First stage effect}}\]

6.2.5 The good and the bad of LATE

(with infinite data) You get a good estimate even when there is non-random take-up
May sometimes be used to assess mediation or knock-on effects
But:
- You need assumptions (monotonicity and the exclusion restriction – where were these used above?)
- Your estimate is only for a subpopulation
- The subpopulation is not chosen by you and is unknown
- Different encouragements may yield different estimates since they may encourage different subgroups

6.2.6 Declaration

declaration_iv <-
  declare_model(
    N = 100, 
    U = rnorm(N),
    potential_outcomes(D ~ if_else(Z + U > 0, 1, 0), 
                       conditions = list(Z = c(0, 1))), 
    potential_outcomes(Y ~ 0.1 * D + 0.25 + U, 
                       conditions = list(D = c(0, 1))),
    complier = D_Z_1 == 1 & D_Z_0 == 0
  ) + 
  declare_inquiry(ATE = mean(Y_D_1 - Y_D_0), 
                  LATE = mean(Y_D_1[complier] - Y_D_0[complier])) + 
  declare_assignment(Z = complete_ra(N, prob = 0.5)) +
  declare_measurement(D = reveal_outcomes(D ~ Z),
                      Y = reveal_outcomes(Y ~ D)) + 
  declare_estimator(Y ~ D, inquiry = "ATE", label = "OLS")  +
  declare_estimator(Y ~ D | Z, .method = iv_robust, inquiry = "LATE",
                    label = "IV")

6.2.7 Diagnosis

Inquiry	Estimator	Mean Estimand	Mean Estimate	Bias	RMSE
ATE	OLS	0.10	1.55	1.45	1.46
		(0.00)	(0.00)	(0.00)	(0.00)
LATE	IV	0.10	-0.05	-0.15	0.80
		(0.00)	(0.01)	(0.01)	(0.03)

Note:

The estimands might be the same
The estimators might both be biased
And in opposite directions

6.3 Diff in diff

Key idea: the evolution of units in the control group allow you to impute what the evolution of units in the treatment group would have been had they not been treated

6.3.1 Logic

We have group $A$ that enters treatment at some point and group $B$ that never does

The estimate:

\[\hat\tau = (\mathbb{E}[Y^A | post] - \mathbb{E}[Y^A | pre]) -(\mathbb{E}[Y^B | post] - \mathbb{E}[Y^B | pre])\] (how different is the change in $A$ compared to the change in $B$?)

can be written using potential outcomes as:

\[\hat\tau = (\mathbb{E}[Y_1^A | post] - \mathbb{E}[Y_0^A | pre]) -(\mathbb{E}[Y_0^B | post] - \mathbb{E}[Y_0^B | pre])\]

6.3.2 Logic

With some manipulation and cleaning up:

\[\hat\tau = (\mathbb{E}[Y_1^A | post] - \mathbb{E}[Y_0^A | pre]) -(\mathbb{E}[Y_0^B | post] - \mathbb{E}[Y_0^B | pre])\]

$$ \[\begin{aligned} \hat\tau = (\mathbb{E}[Y_1^A | post] - \color{red}{\mathbb{E}[Y_0^A | post]}) + ((\color{red}{\mathbb{E}[Y_0^A | post]} - \mathbb{E}[Y_0^A | pre]) -(\mathbb{E}[Y_0^B | post] - \mathbb{E}[Y_0^B | pre])) \end{aligned}\]

\[\hat\tau_{ATT} = \tau_{ATT} + \text{Difference in trends}\]

6.3.3 Simplest DiD: Design

n_units <- 2
design <- 
  declare_model(
    unit = add_level(N = n_units, I = 1:N),
    time = add_level(N = 6, T = 1:N, nest = FALSE),
    obs = cross_levels(by = join_using(unit, time))) +
  declare_model(potential_outcomes(Y ~ I + T^.5 + Z*T)) +
  declare_assignment(Z = 1*(I>(n_units/2))*(T>3)) +
  declare_measurement(Y = reveal_outcomes(Y~Z)) + 
  declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0),
                  ATT = mean(Y_Z_1[Z==1] - Y_Z_0[Z==1])) +
  declare_estimator(Y ~ Z, label = "naive") + 
  declare_estimator(Y ~ Z + I, label = "FE1") + 
  declare_estimator(Y ~ Z + as.factor(T), label = "FE2") + 
  declare_estimator(Y ~ Z + I + as.factor(T), label = "FE3")

6.3.4 Simplest DiD: Data

draw_data(design) |> 
  head() |> kable()

unit	I	time	T	obs	Y_Z_0	Y_Z_1	Y
1	1	1	1	01	2.000000	3.000000	2.000000
2	2	1	1	02	3.000000	4.000000	3.000000
1	1	2	2	03	2.414214	4.414214	2.414214
2	2	2	2	04	3.414214	5.414214	3.414214
1	1	3	3	05	2.732051	5.732051	2.732051
2	2	3	3	06	3.732051	6.732051	3.732051

6.3.5 Simplest DiD: Diagnosis

Here only the two way fixed effects is unbiased and only for the ATT.

The ATT here is averaging over effects for treated units (later periods only). We know nothing about the size of effects in earlier periods when all units are in control!

design |> diagnose_design()

Inquiry	Estimator	Bias
ATE	FE1	2.25
ATE	FE2	6.50
ATE	FE3	1.50
ATE	naive	5.40
ATT	FE1	0.75
ATT	FE2	5.00
ATT	FE3	0.00
ATT	naive	3.90

6.3.6 The classic graph

design |> 
  draw_data() |>
  ggplot(aes(T, Y, color = unit)) + geom_line() +
       geom_point(aes(T, Y_Z_0)) + theme_bw()

6.3.7 Extends to multiple units easily

design |> redesign(n_units = 10) |> diagnose_design()

Inquiry	Estimator	Bias
ATE	FE1	2.25
ATE	FE2	6.50
ATE	FE3	1.50
ATE	naive	5.40
ATT	FE1	0.75
ATT	FE2	5.00
ATT	FE3	0.00
ATT	naive	3.90

6.3.8 Extends to multipe units easily

design |> 
  redesign(n_units = 10) |>  
  draw_data() |> 
  ggplot(aes(T, Y, color = unit)) + geom_line() +
       geom_point(aes(T, Y_Z_0)) + theme_bw()

6.3.9 In practice

Need to defend parallel trends
Most typically using an event study
Sometimes: report balance between treatment and control groups in covariates
Placebo leads and lags

6.3.10 Heterogeneity

Things get much more complicated when there is (a) heterogeneous timing in treatment take up and (b) heterogeneous effects
It’s only recently been appreciated how tricky things can get
But we already have an intuition from our analysis of trials with heterogeneous assignment and heterogeneous effects:
in such cases fixed effects analysis weights stratum level treatment effects by the variance in assignment to treatment
something similar here

6.3.11 Staggared assignments

Just two units assigned at different times:

trend = 0

design <- 
  declare_model(
    unit = add_level(N = 2, ui = rnorm(N), I = 1:N),
    time = add_level(N = 6, ut = rnorm(N), T = 1:N, nest = FALSE),
    obs = cross_levels(by = join_using(unit, time))) +
  declare_model(
    potential_outcomes(Y ~ trend*T + (1+Z)*(I == 2))) +
  declare_assignment(Z = 1*((I == 1) * (T>3) + (I == 2) * (T>5))) +
  declare_measurement(Y = reveal_outcomes(Y~Z), 
                      I_c = I - mean(I)) +
  declare_inquiry(mean(Y_Z_1 - Y_Z_0)) +
  declare_estimator(Y ~ Z, label = "1. naive") + 
  declare_estimator(Y ~ Z + I, label = "2. FE1") + 
  declare_estimator(Y ~ Z + as.factor(T), label = "3. FE2") + 
  declare_estimator(Y ~ Z + I + as.factor(T), label = "4. FE3") + 
  declare_estimator(Y ~ Z*I_c + as.factor(T), label = "5. Sat")

6.3.12 Staggared assignments diagnosis

Estimator	Mean Estimand	Mean Estimate
1. naive	0.50	-0.12
	(0.00)	(0.00)
2. FE1	0.50	0.36
	(0.00)	(0.00)
3. FE2	0.50	-1.00
	(0.00)	(0.00)
4. FE3	0.50	0.25
	(0.00)	(0.00)
5. Sat	0.50	0.50
	(0.00)	(0.00)

See causal inference slides for intuitions on what is happening here as well as declaration using approach of Chaisemartin and d’Haultfoeuille (2020).

6.4 Regression discontintuity

Errors and diagnostics

6.4.1 Intuition

The core idea in an RDD design is that if a decision rule assigns units that are almost identical to each other to treatment and control conditions then we can infer effects for those cases¹ by looking at those cases.

See excellent introduction: Lee and Lemieux (2010)

6.4.2 Intuition

Kids born on 31 August start school a year younger than kids born on 1 September: does starting younger help or hurt?
Kids born on 12 September 1983 are more likely to register Republican than kids born on 10 September 1983: can this identify the effects of registration on long term voting?
A district in which Republicans got 50.1% of the vote get a Republican representative while districts in which Republicans got 49.9% of the vote do not: does having a Republican representative make a difference for these districts?

6.4.3 Argument for identification

Setting:

Typically the decision is based on a value on a “running variable”, $X$. e.g. Treatment if $X > 0$
The estimand is $\mathbb{E}[Y(1) - Y(0)|X=0]$

Two arguments:

Continuity: $\mathbb{E}[Y(1)|X=x]$ and $\mathbb{E}[Y(0)|X=x]$ are continuous (at $x=0$) in $x$: so $\lim_{\hat x \rightarrow 0}\mathbb{E}[Y(0)|X=\hat x] = \mathbb{E}[Y(0)|X=\hat 0]$
Local randomization: tiny things that determine exact values of $x$ are as if random and so we can think of a local experiment around $X=0$.

6.4.4 Argument for identification

Note:

continuity argument requires continuous $x$: granularity
also builds off a conditional expectation function defined at $X=0$

Exclusion restriction is implicit in continuity: If something else happens at the threshold then the conditional expectation functions jump at the thresholds

Implicit: $X$ is exogenous in the sense that units cannot adjust $X$ in order to be on one or the other side of the threshold

6.4.5 Evidence

Typically researchers show:

“First stage” results: assignment to treatment does indeed jump at the threshold
“ITT”: outcomes jump at the threshold
LATE (if fuzzy / imperfect compliance) using IV

6.4.6 Evidence

Typically researchers show:

In addition:

Arguments for no other treatments at the threshold
Arguments for no “sorting” at the threshold
Evidence for no “heaping” at the threshold (McCrary density test)

Sometimes:

argue for why estimates extend beyond the threshold
exclude points at the threshold (!)

6.4.7 Design

library(rdss) # for helper functions

Error in library(rdss): there is no package called 'rdss'

library(rdrobust)

Error in library(rdrobust): there is no package called 'rdrobust'

cutoff <- 0.5
bandwidth <- 0.5

control <- function(X) {
  as.vector(poly(X, 4, raw = TRUE) %*% c(.7, -.8, .5, 1))}
treatment <- function(X) {
  as.vector(poly(X, 4, raw = TRUE) %*% c(0, -1.5, .5, .8)) + .25}

rdd_design <-
  declare_model(
    N = 1000,
    U = rnorm(N, 0, 0.1),
    X = runif(N, 0, 1) + U - cutoff,
    D = 1 * (X > 0),
    Y_D_0 = control(X) + U,
    Y_D_1 = treatment(X) + U
  ) +
  declare_inquiry(LATE = treatment(0) - control(0)) +
  declare_measurement(Y = reveal_outcomes(Y ~ D)) + 
  declare_sampling(S = X > -bandwidth & X < bandwidth) +
  declare_estimator(Y ~ D*X, term = "D", label = "lm") + 
  declare_estimator(
    Y, X, 
    term = "Bias-Corrected",
    .method = rdrobust_helper,
    label = "optimal"
  )

Error: object 'rdrobust_helper' not found

6.4.8 RDD Data plotted

Note rdrobust implements:

local polynomial Regression Discontinuity (RD) point estimators
robust bias-corrected confidence intervals

See Calonico, Cattaneo, and Titiunik (2014) and related papers ? rdrobust::rdrobust

6.4.9 RDD Data plotted

rdd_design  |> draw_data() |> 
  ggplot(aes(X, Y, color = factor(D))) + 
  geom_point(alpha = .3) + theme_bw()+ theme(legend.position = "none") + 
  geom_smooth(aes(X, Y_D_0)) + geom_smooth(aes(X, Y_D_1))

Error: object 'rdd_design' not found

6.4.10 RDD diagnosis

rdd_design |> diagnose_design()

Estimator	Mean Estimate	Bias	SD Estimate	Coverage
lm	0.23	-0.02	0.01	0.64
	(0.00)	(0.00)	(0.00)	(0.02)
optimal	0.25	0.00	0.03	0.89
	(0.00)	(0.00)	(0.00)	(0.01)

6.4.11 Bandwidth tradeoff

rdd_design |> 
  redesign(bandwidth = seq(from = 0.05, to = 0.5, by = 0.05)) |> 
  diagnose_designs()

As we increase the bandwidth, the lm bias gets worse, but slowly, while the error falls.
The best bandwidth is relatively wide.
This is more true for the optimal estimator.

7 References

Calonico, Sebastian, Matias D Cattaneo, and Rocio Titiunik. 2014. “Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs.” Econometrica 82 (6): 2295–2326.

Ding, Peng, and Fan Li. 2019. “A Bracketing Relationship Between Difference-in-Differences and Lagged-Dependent-Variable Adjustment.” Political Analysis 27 (4): 605–15.

Gerber, Alan S, and Donald P Green. 2012. Field Experiments: Design, Analysis, and Interpretation. Norton.

Lee, David S, and Thomas Lemieux. 2010. “Regression Discontinuity Designs in Economics.” Journal of Economic Literature 48 (2): 281–355.

	\(T2=0\)	\(T2=1\)
T1 = 0	\(50\%\)	\(0\%\)
T1 = 1	\(50\%\)	\(0\%\)

	\(T2=0\)	\(T2=1\)
T1 = 0	\(25\%\)	\(25\%\)
T1 = 1	\(25\%\)	\(25\%\)

	\(T2=0\)	\(T2=1\)
T1 = 0	\(33.3\%\)	\(33.3\%\)
T1 = 1	\(33.3\%\)	\(0\%\)

	\(T2=0\)	\(T2=1\)
T1 = 0	\(40\%\)	\(20\%\)
T1 = 1	\(20\%\)	\(20\%\)

	X=0	X=1
Z=0	\(\overline{y}_{00}\) (\(n_{00}\))	\(\overline{y}_{01}\) (\(n_{01}\))
Z=1	\(\overline{y}_{10}\) (\(n_{10}\))	\(\overline{y}_{11}\) (\(n_{11}\))

	\(X=0\)	\(X=1\)
\(Z=0\)	\(\frac{\frac{1}{2}n_c}{\frac{1}{2}n_c + \frac{1}{2}n_n} \overline{y}^0_{c}+\frac{\frac{1}{2}n_n}{\frac{1}{2}n_c + \frac{1}{2}n_n} \overline{y}_{n}\)	\(\overline{y}_{a}\)
\(Z=1\)	\(\overline{y}_{n}\)	\(\frac{\frac{1}{2}n_c}{\frac{1}{2}n_c + \frac{1}{2}n_a} \overline{y}^1_{c}+\frac{\frac{1}{2}n_a}{\frac{1}{2}n_c + \frac{1}{2}n_a} \overline{y}_{a}\)

	\(X=0\)	\(X=1\)
\(Z=0\)	\(\frac{n_c}{n_c + n_n} \overline{y}^0_{c}+\frac{n_n}{n_c + n_n} \overline{y}_n\)	\(\overline{y}_{a}\)
(n)	(\(\frac{1}{2}(n_c + n_n)\))	(\(\frac{1}{2}n_a\))
\(Z=1\)	\(\overline{y}_{n}\)	\(\frac{n_c}{n_c + n_a} \overline{y}^1_{c}+\frac{n_a}{n_c + n_a} \overline{y}_{a}\)
(n)	(\(\frac{1}{2}n_n\))	(\(\frac{1}{2}(n_a+n_c)\))

Introduction to design declaration with DeclareDesign

1 Roadmap

2 DeclareDesign Basics

2.1 Roadmap

2.2 The MIDA Framework

2.2.1 Four elements of any research design

2.2.2 Four elements of any research design

2.2.3 Declaration

2.2.4 Diagnosis

2.2.5 Redesign

2.2.6 Very often you have to simulate

2.3 Key functions and resources

2.3.1 Key commands for design declaration

2.3.2 Key commands for using a design

2.3.3 Pipeable commands

2.3.4 Cheat sheet

2.3.5 Other resources

2.4 Design declaration-diagnosis-redesign workflow: Design

2.4.1 The simplest possible (diagnosable) design?

2.4.2 The simplest possible design?

2.4.3 The design is a pipe

2.4.4 The design is a pipe

2.4.5 The design is a pipe

2.4.6 Run it once

2.4.7 Run it again

2.4.8 Simulation

2.4.9 Diagnosis

2.5 Design declaration-diagnosis-redesign workflow: Diagnosis

2.5.1 Diagnosis by hand

2.5.2 diagnose_design()

2.5.3 What is the diagnosis object?

2.5.4 What is the diagnosis object?

2.5.5 What is the diagnosis object?

2.5.6 What is the diagnosis object?

2.5.7 Diagnosis: Bootstraps

2.5.8 After Diagnosis

2.5.9 After Diagnosis

2.5.10 After Diagnosis: Tables

2.5.11 Spotting design problems with diagnosis

2.5.12 Spotting design problems with diagnosis

2.5.13 It depends on the inquiry

2.5.14 Diagnosing multiple designs

2.6 Design declaration-diagnosis-redesign workflow: Redesign

2.6.1 What is redesign?

2.6.2 Redesign parameters

2.6.3 Redesign parameter definition

2.6.4 Redesign illustration

2.6.5 Redesigning to a list

2.6.6 Redesigning to a list

2.6.7 Redesigning to a list

2.6.8 Graphing after redesigning to a list

2.7 Using a design

2.7.1 Using a design

2.7.2 Make data from the design

2.7.3 Make data from the design

2.7.4 Draw estimands

2.7.5 Draw estimates

2.7.6 Get estimates

2.7.7 Simulate design

2.7.8 Diagnose design

2.7.9 Redesign

2.7.10 Compare designs

3 DeclareDesign: A deeper dive

3.1 Steps in an experimental design

3.1.1 A simple experimental design

3.1.2 Steps: Order matters

3.1.3 Steps: Order matters

3.2 M: Key extensions to model declaration

3.2.1 Hierarchical data

3.2.2 Hierarchical data

3.2.3 Panel data

3.2.4 Panel data

3.2.5 Preexisting data

3.2.6 Steps

3.2.7 You can generate multiple columns together

3.2.8 You can generate multiple columns together

3.2.9 Cluster structures with cluster correlations

3.2.10 Cluster structures with cluster correlations

3.3 I: Inquiries

3.3.1 Inquiries using predefined potential outcomes

Introduction to design declaration with `DeclareDesign`

2 `DeclareDesign` Basics

2.5.2 `diagnose_design()`

3 `DeclareDesign`: A deeper dive

4 Assignments with `DeclareDesign`