puzzles for causal inference and experimental design

Author

Macartan Humphreys

Published

January 4, 2025

For each puzzle: explore the issues raised by the puzzle and generate a self contained presentation in .qmd (or .Rmd) that reports on your investigations. Present in the next class session.

1 Familiarity with `DeclareDesign`

Q 1.1

Basic manipulations

Consider the following design:

design <- 
  declare_model(
     N = 1000, 
     U = rnorm(N),
     potential_outcomes(Y ~ (1-Z)*rnorm(N) + Z*rnorm(N, 1))) +
  declare_inquiry(PATE = mean(Y_Z_1 - Y_Z_0))

What do each of these two steps do?
Draw data from this design. What are the dimensions of the data set? Is this what you would expect? Can you describe what each of the columns in this dataset are?
Calculate the population average treatment effect for this dataset using Y_Z_1 and Y_Z_0 columns. Is this what you would expect from the potential outcomes function?
Diagnose this design. Why so many NAs?

Now extend the design as follows:

design <- 
  design + 
  declare_sampling(S = complete_rs(N = N, n = 20)) +
  declare_assignment(Z = complete_ra(N)) + 
  declare_measurement(Y = reveal_outcomes(Y ~ Z)) + 
  declare_inquiry(SATE = mean(Y_Z_1 - Y_Z_0)) + 
  declare_estimator(Y ~ Z, .method = lm)

What do each of the new steps do? What is the difference between the two inquiries? Explain how they are different.
Draw data from this design. What are the dimensions of the data set? Is this what you would expect?
Diagnose the design using diagnose_design. Use a large number of simulations, e.g. setting sims = 2000. Why is there a difference in the RMSE for the two estimands? Why is the power the same? Why is there a difference in coverage?

Q 1.2

[From DeclareDesign instructor material] In this exercise, you will conduct a power analysis for a two-arm trial with 100 units using DeclareDesign. Start with the following model, which defines 100 units to be studied, a variable U representing (unknown) unit characteristics, and two potential outcomes (for treatment and for control) that depend on the expected effect size, the treatment variable Z, and the variable U.

You can find help with the conceptual material in this exercise in Chapters 10 and 11 and help with coding in Chapter 13.

expected_effect_size <- 0.1

two_arm_trial_model <- 
  declare_model(
    N = 100,
    U = rnorm(N), # unknown heterogeneity
    potential_outcomes(Y ~ expected_effect_size * Z + U)
  )

Add an inquiry declaration, which should be the average treatment effect, or average difference between the treated and control potential outcomes.
Add a data strategy declaration, including an assignment step that creates a variable Z using simple random assignment with probability 0.5, and a measurement step in which you reveal the observed outcomes Y using the reveal_outcomes function.
Add an answer strategy declaration, which uses the difference_in_means function (pass it to the .method argument of declare_estimator) and calculates the difference in mean observed outcomes between the treatment and control groups. Link the estimator to the inquiry.
Draw data from the design, and confirm that the variables you expect are present, including unit characteristics, two potential outcomes, the treatment variable, and the observed outcome.
Run the design once using run_design to confirm that the estimand and estimate columns are present. What do these columns refer to?
Diagnose the design with 1000 simulations and calculate its statistical power. What is the power?
Calculate the minimum detectable effect size of the design, which means the smallest expected effect size for which the statistical power of the design reaches the common threshold of 80%. Use the redesign function to redesign over expected_effect_size’s between 0.1 and 1 by steps of 0.1 and then diagnose the resulting list of designs. Find the smallest expected effect size where the diagnosis reports the power is at least 0.8.
Bonus: What is the smallest sample size would you need to obtain 80% power for an expected effect size of 0.3? You can estimate power in sample size steps of 100 and report the smallest size that exceeds 80% power.

Q 1.3

False positives and \(N\)s

Sometimes people worry that with larger samples you are more likely to get a false positive. Is that true?
Assess by generating a simple experimental design from scratch in which we can vary the N and in which there is no true effect of some treatment.

Then:

Plot the distribution of \(p\) values from the simulations_df. What shape is it and why?
Plot the power as \(N\) increases, using the diagnosands_df
Plot the estimates against \(p\) values for different values of \(N\); what do you see?
Discuss

Hint: the slides contain code for a simple experimental design

Q 1.4

Clustering

Say that you have a set of 20 schools randomly sampled from a superpopulation of schools. There are 5 classrooms in each school and 5 students in each class room.
Say you assign a treatment at the classroom level. Should you cluster your standard errors at the level of the school or at the level of the classroom?

Now:

Declare a design with this hierarchical data structure. Allow for the possibility that treatment effects vary at the school level. Assess the performance of the standard errors when you cluster at each of these levels (and when you do not cluster at all).
Examine whether the performance depends on whether you are interested in the population average effects or the sample average effects.

Hint For generating hierarchical models use add_level. Also: be sure to have a reasonable large top level shock in order to see differences arising from clustering at the school level. You could also try heterogeneous effects by school.

g <- 
  declare_model(
    L1 = add_level(N = 10, u = rnorm(N)),
    L2 = add_level(N = 12, v = rnorm(N)))

g() |> slice(1:3, 13:15) |> kable()

L1	u	L2	v
01	-1.120762	001	-0.3777158
01	-1.120762	002	-0.6034576
01	-1.120762	003	-1.7960492
02	-0.686423	013	0.7824514
02	-0.686423	014	-1.4078945
02	-0.686423	015	-0.0141374

Q 1.5

Learning about standard errors

The standard error is standard deviation of the sampling distribution of an estimate.
That sounds complicated, but actually the sampling distribution of an estimate lives in the simulations data frame so you can look at its standard deviation and assess whether standard errors estimate it well.
Challenge: Generate a simple experimental design in which there is a correlation (rho) between the two potential outcomes (Y_Z_0 and Y_Z_1).
Show the distribution of the estimates over different values of rho
Assess the performance of the estimates of the standard errors and the coverage as rho goes from -1 to 0 to 1. Describe how coverage changes. (Be sure to be clear on what coverage is!)

Y_Z_0 <- rnorm(1000)
Y_Z_1 <- correlate(rnorm, given = Y_Z_0, rho = .5)

cor(Y_Z_0, Y_Z_1)

[1] 0.5038955

Q 1.6

Confounded.

Declare a design in which:

The assignment of a treatment \(X\) depends in part on upon some other, binary, variable \(W\): in particular \(\Pr(X=1|W=0) = .2\) and \(\Pr(X=1|W=1) = .5\))
The outcome \(Y\) depends on both \(X\) and \(W\): in particular \(Y = X*W + u\) where \(u\) is a random shock.
Diagnose a design with three approaches to estimating the effect of \(X\) on \(Y\): (a) ignoring \(W\) (b) adding \(W\) as a linear control (c) including both \(W\) and an interaction between \(W\) and \(X\).

Discuss results. Do any of these return the right answer?

Hint: You can add three separate declare_estimator steps. They should have distinct labels. The trickiest part is to figure out how to extract the estimate in (c) because you will have both a main term and an interaction term for \(X\).

2 Causality

2.1 Potential outcomes

These do not require coding

Q 2.1

See the graph from slide:

https://macartan.github.io/ci/ci_2025.html#/back-to-this-example-1

Imagine all variables are binary.

Write down a set of potential outcomes representing a process where

\(X\) does not affect \(Y\), regardless of \(U1\).
\(X\) and \(Y\) are uncorrelated if you do not condition on \(Z\); but correlated if you do.

Potential outcome	Values
X(0)
X(1)
Z(U1 = 0,U2 = 0)
Z(U1 = 0,U2 = 1)
Z(U1 = 1,U2 = 0)
Z(U1 = 1,U2 = 1)
Y(U1=0,X=0)
Y(U1=0,X=1)
Y(U1=1,X=0)
Y(U1=1,X=1)

Demonstrate by completing the following realized dataset:

U1	U2	X	Z	Y
0	0
0	1
1	0
1	1

and describe the relation between X and Y when you condition on Z or do not.

Hint: a lot of the action comes from how you define \(Z(X1, X2)\): consider values in which X1 and X2 interact to produce \(Z\) which still \(Z=1\) half the time and \(Z=0\) half the time.

Q 2.2

Consider an outcome Y that can be affected by two variables X1 and X2 but say that X2 can itself be affected by X1. Write down possible potential outcomes for Y1 and X2 when:

X1 causes X2 and Y, but X1 does not cause Y through X2
X1 causes Y via X2 only (with no direct effect when X2 is fixed at 0)
X1 causes Y via X2 and directly
X1 causes X2, X2 causes Y, but (overall), X1 does not cause Y

Potential outcome	1.	2.	3.	4.
X2(0) =
X2(1) =
Y(0,0) =
Y(1,0) =
Y(0,1) =
Y(1,1) =

Q 2.3

Consider a process with: Y(0,0) =0, Y(1,0) = 1, Y(0,1) =1, Y(1,1) =1.

Say X1=1, X2=1. Then Y = 1. What caused Y = 1?
Say X1=0, X2=0. Then Y = 0. What caused Y = 0?
Say X1 = 1 with 10% probability, otherwise 0 and, independently, X2 = 1 with 50% probability, otherwise 0.

Then what is the average effect of X1 on Y? What is the average effect of X2 on Y? Which cause has the biggest effect?

Q 2.4

Potential outcomes

Consider an outcome \(Y\) that can be affected by two variables \(X_1\) and \(X_2\). All variables are binary. Can you fill in the potential outcomes (rows) for each of the column types?

	X1 is a necessary and sufficient condition for Y	X1 is necessary but not sufficient	X1 is sufficient but not necessary	X1 sometimes causes Y but is neither necessary nor sufficient
Y(0,0) =
Y(1,0) =
Y(0,1) =
Y(1,1) =

Q 2.5

A set of units have outcome \(Y^1_i\) at baseline.
At endline they have potential outcomes \(Y^2_i(0)\) and \(Y^2_i(1)\)

Write down the estimand for the average effect of treatment on endline outcomes
Write down the estimand for the average effect of treatment on the change from baseline to endline for all units

Compare these and discuss.

2.2 Inquiries and identification

These next questions use some concepts we have not introduced yet. Don’t worry if your answers are incomplete but do share your thought processes around these.

Q 2.6

Collider bias

Declare a simple design in which (i) \(X\) and \(Y\) both have a positive effect on (binary) \(K\) but \(X\) does not cause \(Y\) (ii) a researcher conditions on \(K==1\) when estimating the effect of \(X\) on \(Y\)
Show that this can generate biased results. Can you find situations where the bias can be either positive or negative?

Hint: The direction of collider bias is related to the ways that \(K\) and \(Y\) interact to produce \(X\). Also: by “conditioning on \(K=1\)” we mean: using only cases for which \(K=1\).

Q 2.7

Draw a DAG with 5 nodes representing a situation in which \(X\) causes \(Y\) though \(M\), \(C\) affects both \(X\) and \(M\) and \(D\) affects both \(M\) and \(Y\).
Think through what set of nodes which, when controlled for, would allow for the identification of the effect of \(X\) on \(Y\).
Represent it in dagitty and check your answer
Bonus: Declare the design and compare the behavior of designs that do and do not control for these nodes.

Q 2.8

Make a DAG that is consistent with this distribution

A	B	C	p
0	0	0	0.32
0	0	1	0.08
0	1	0	0.08
0	1	1	0.02
1	0	0	0.08
1	0	1	0.12
1	1	0	0.12
1	1	1	0.18

Set up a model in DeclareDesign that has this distribution. Draw a large dataset from it and check if relations of conditional independence implied by your DAG.

Hint:This is relatively tricky. From slides you will see a DAG is a directed acyclic graph. A DAG should represent relations of conditional independence in the sense that any node \(A\) that is separated from another node \(B\) given nodes \(W\) should be conditionally independent given \(W\). You should be able to read from this table which nodes are conditionally independent from each other given other nodes. You should be able to check consistency between the probability distribution and an underlying model by calculating quantities such as \(\Pr(x, m,y) = \Pr(x)\times\Pr(m|x)\times\Pr(y|m,x)\).

Q 2.9

Imagine a model that looks like this:

Say that in truth ATE of \(X\) on \(M\) is .9 and that the ATE of \(M\) on \(Y\) is .9. Is the implied effect of 0.81 on \(X\) on \(Y\) identified?
Say that in truth ATE of \(X\) on \(M\) is 1 and that the ATE of \(M\) on \(Y\) is 1. Is the implied effect of 1 on \(X\) on \(Y\) identified?
Discuss

Hint: This question is asking about the front door criterion. Check whether the conditions apply for the front door criterion to hold. Note that an effect is not identified if the data pattern it produces is also consistent with a different effect. Is that the case here? The second part of this is more important than the first part. Note you can generate and update models of this form with CausalQueries.

3 Estimation and Inference

3.1 Frequentist

Q 3.1

Consider the following data. Treatment is assigned in two blocks. One third of block 1 got randomly assigned to treatment; one half of block two got randomly assigned to treatment.

The data are as below:

Block	Z	Y
1	0	0
1	0	0
1	1	1
2	0	0
2	0	0
2	0	1
2	1	0
2	1	1
2	1	1

Can you estimate the ATE? How about the ATT? And the ATC?
How do these compare to a simple difference in means between treatment and control?
Use DeclareDesign to compare the answers you get if you use exactly this data and calculate (1) OLS and a linear control for block (2) IPW and (3) the Lin estimator (demean the Block variable and interact it with Z)
Imagine now letting the size of the data increase by a factor \(k\)— meaning that if \(k=2\) increase you would have twice as many units in each block. Show how the precision of your estimates changes under these different strategies as \(k\) increases

Hint: If you use exactly this data and replicate it there is no stochastic component; you can run with sims = 1; to get the precision of your estimates you can declare the diagnosand mean_se = mean(std.error)

Optional: Instead of simulated data, use any dataset you have access to that has a treatment and a covariate and implement the same tests using that data (the outcome does not have to be binary).

Q 3.2

randomization inferences

Imagine a population with 100 northerners (X=0) and 100 southerners (X=1).

Assume 50 in each each group are assigned to binary treatment \(Z\) with probability 0.5 (this is a blocked assignment) .

Assume the following model:

\[Y = X + Z - 2*X*Z\]

Note for simplicity there is no stochastic component.

In this case the average treatment effect for Z is 0. Verify that!

Now use randomization inference to calculate \(p\) values for the hypothesis that Z has no effect for all units. Please:

do this by permuting Z (given X) and using a coefficient on Z from the regression Y ~ Z + X as a test statistic
do this by permuting Z (given X) and using a coefficient on X:Z from the regression Y ~ Z + X + X:Z as a test statistic

Note: the “given X” bit for permutations means that you permute X within the Z=0 and also within the Z=1 group.

Implement two sided tests in each case and discuss differences you get from the two approaches. Which one is right? Or are they both right?

Bonus: How would things change if the true data generating process were \(Y = X + Z\)? Implement both tests again with data from this model.

Optional: Instead of simulated data, use any dataset you have access to that has a treatment and a covariate and implement the same tests using that data.

Q 3.3

controls

Replicate Table 1 in Lin (2012);

Focus on these rows:

SD(empirical) × 1000
- Unadjusted
- Usual OLS-adjusted
- OLS with interaction rows
Bias(estimated) × 1000
- Unadjusted
- Usual OLS-adjusted
- OLS with interaction rows

Discuss what you see here.

Note: Precision for some of these may require a large number of simulations.

Q 3.4

Missing data on controls

Say you want to include a control variable. But you have missingness in the control. Should you proceed and what can you do about it?

Declare a design for an experiment in which a binary covariate X is related to potential outcomes, according to b, and so to treatment effects. Say X is missing with probability p.

Compare answer strategies in which you:

do not control for \(X\)
do control for \(X\) but drop whenever \(X\) is missing and
Treat \(X\) as a block in your analysis design with three values (0, 1, and missing).

Assess performance (RMSE) over a range of values for p and b. How do you think the comparison of strategies depends upon N?

Optional: Instead of simulated data, use any dataset you have access to that has a treatment and missing data in a control variable.

Q 3.5

what weights

Say you sampled subject A with probability .6 and subject B with probability .4. Say you assigned each to treatment with probability .6. What weight should you put on A in your analysis if they end up in treatment? What if they end up in control? How about B?

4 Bayesian

Q 4.1

Bayes by hand

Say that we have 50 observations of \(Y_0\) and 50 observations of \(Y_1\) from a random experiment. Assume these are each drawn independently from a normal distribution centered on \(\mu_j\) with sd \(\sigma_j\), \(j\in\{0,1\}\).

Write down a likelihood function that returns the probability of seeing the 100 observations that you see given the four parameters: \(\mu_0, \mu_1, \sigma_0, \sigma_1\).
Use grid <- expand_grid(m0 = ..., m1 = ..., s0 = ..., s1 = ... ) to generate a grid of possible values for the four parameters (make sure s0 and s1 are positive..
Apply your likelihood function to all the possible parameter values contained in your grid.
Now:

Identify the maximum likelihood set of values
Calculate the posterior distribution assuming uniform priors over the range
Identify the posterior modes
Calculate the posterior means
Compare the estimates of a treatment effect you would obtain from
- maximum likelihood
- posterior means
- ols

Q 4.2

Bayesian sequencing and posterior variance

Say you knew the probability of \(Y\) given \(X_1\) and \(X_2\):

	X_1 = 0	X1 = 1
X_2 = 0	.2	.5
X_2 = 1	.8	.9

In addition you know that \(X_1 = 1\) with probability 0.5, and \(X_2 = 1\) with probability 0.5, independently.

Say you do not know \(X_1\) or \(X_2\). What is your prior that \(Y = 1\)?
How uncertain are you? (You can use variance as a measure, where the variance in beliefs about a bernoulli event that arises with probability \(p\) is just \(p(1-p)\))
Say you now learn \(X_1 = 0\): what is your posterior belief now?
What is your uncertainty now? (Again use variance, but now you should look at the posterior variance). Has it gone up or down with the additional information?
Say in addition you now learn \(X_2 = 1\): what now is your posterior? What now is your uncertainty?
There are two ways in which you could have answered the last question: one is to use your prior from (1) and update based on the joint observation of \(X_1, X_2\) using the likelihood: \(\Pr(X_1 = 0, X_2=1 | Y = 1)\). The other is to use the posterior from (3) as your new prior and update using only the new information, given the existing data: \(\Pr(X_2=1 | Y = 1, X_1 = 0)\). Show, analytically, that these are equivalent.
Ex ante you cannot tell what you would observe if you looked for \(X_1\) or \(X_2\). Thinking through all the patterns you might see, what is the expected estimate you would get if you just looked for \(X_1\) or just looked for \(X_2\)? What is the expected posterior variance? If you had to choose just one of \(X_1\) or \(X_2\) to look for, which would be more informative?

Bonus: Think through how things might change if X1 and X2 were not independent?

Q 4.3

Hierarchical

Generate a simple multilevel experimental design (e.g 20 children each in 20 schools), similar to what we saw in slides. Assume that the treatment effect in each schools is drawn from a normal distribution with a given variance \(\sigma\).

Use design diagnosis to assess the ability of a Bayesian multilevel hierarchical to recover \(\sigma\).

Help: You can make a custom estimation function for declare_estimator or you can try using rstanarm which has the advantage of slipping more easily into a declaration– see (Declaration 9.3)[https://book.declaredesign.org/declaration-diagnosis-redesign/choosing-answer-strategy.html#answer-2]. Note: stan updating is not very fast and so you should do this with a relatively small number of simulations.

Bonus: Are estimates of treatment effects more or less reliable than what you would get from a frequentist approach that interacts school IDs with treatment?

Q 4.4

Priors matter

Generate a simple experimental design and estimate treatment effects using

Bayesian regression with stan (you can make a custom estimation function for declare_estimator or you can try using rstanarm which has the advantage of slipping more easily into a declaration– see (Declaration 9.3)[https://book.declaredesign.org/declaration-diagnosis-redesign/choosing-answer-strategy.html#answer-2]). Provide informative priors (e.g. a tight distribution around 0 or someother numebr).
lm_robust

Let the size of the data increase from 10 to 100 to 1000 and plot the estimates from the two approaches as \(N\) increases, plot the average standard error and the posterior variance from the two approaches as \(N\) increases.

Note: stan updating is not always very fast and so you should do this with a relatively small number of simulations. (The first run of a stan model is usually especially slow as the model has to compile.)

Q 4.5

Clues from causal processes

Set up a model in which \(X \rightarrow Y\) (\(X\) as if random) and there is a true effect of 0.56.
Update using large data generated according to some distribution over causal types
Calculate a posterior on the share of \(X=1, Y=1\) cases for which \(Y=1\) is due to \(X=1\)
Say you know that the effect of \(X\) on \(Y\) passes through \(M\) and so you have model \(X \rightarrow M \rightarrow Y\) and effects of 0.8 at the first stage and .7 at the second stage.
Update using large data generated according to some distribution over causal types
Calculate a posterior on the share of \(X=1, Y=1\) cases for which \(Y=1\) is due to \(X=1\) (a) when you know that \(M=1\) and (b) when you know that \(M=0\)

Bonus: Say instead of a single mediator \(M\) you had a chain: \(M_1, M_2, \dots\). Does lengthening the chain sufficiently allow you to identify causal effects?

Help: This exercise uses the CausalQueries package.

You set up the model using model <- make_model("x -> Y").
To set up a true effect of .56 you need to specify parameter values for the model and specifically the share of units for which \(X\) has a positive effect on \(Y\) (Y.01) and for which \(X\) has a negative effect on \(Y\), Y.10. See ?set.parameters for help setting parameters.
You can draw data using data <- draw_data(model, using = "parameters").
You update using update_model(model, data)
You query using query_model(model, query = ...), This is a conditional query which can be written query = "Y[X=1] > Y[X=0] :|: X==1 & Y==1"
Similar steps are take for the X->M->Y model.

Q 4.6

Napkin identification

Consider the Napkin model: W->Z->X->Y; W <-> X; W <-> Y

make_model("W->Z->X->Y; W <-> X; W <-> Y") |> plot(x_coord = 1:4, y_coord = c(1,1,1,1))

Consider some true parameter vector and generate data from this vector, varying the amount of data from 10 to 100 to 1000 observations. Assume in particular that there is confounding: e.g. that the probability \(X=1, Y=1\) depends on \(W\).

In each case calculate the posterior distribution on the average effect of \(X\) on \(Y\). Assess whether the quantity appears to be identified.

Can you use a formula to calculate an effect directly?

Hint: You can generate a “target” model, generate data from that, and calculate from that a target query.

target_model <- make_model("W->Z->X->Y; W <-> X; W <-> Y") |>  
  set_parameters(param_name = "Y.11_W.1", parameters = .9) |> 
  set_parameters(param_name = "X.11_W.1", parameters = .9)

Note: this is hard because the W <-> Y confounding implies an X <-> Y confounding. There is no scope for front door adjustment. If you control for \(W\) you open a path from \(X\) to \(Y\) (since \(W\) is a collider) and, more subtly, conditioning on \(Z\) also partly opens a collider path. See this discussion: https://twitter.com/analisereal/status/1273099716956430340

5 Design

5.1 Experimental Design

Q 5.1

Randomization

100 students sign up to take part in an experiment. You want to measure the effect of immigration on social trust. Half your subjects are men and half are women and you believe gender is very predictive of social trust.

Your experiment involves varying whether a “native” or an “immigrant” facilitator instructs players in how to play a trust game. You have five native and five immigrant facilitators and you want them each to conduct one session with 10 subjects.

You are free to assign both subjects and facilitators to sessions. Propose an appropriate randomization strategy. Is it blocked? Clustered? Both?
Say now that subjects have already signed up for sessions. You can only assign facilitators to sessions, but you have access to the subject lists before you do so. Describe your optimal randomization strategy. Is it blocked? Clustered? Both?

Q 5.2

Heterogeneous propensities

Two of four units are going to be assigned to treatment. A researcher sets up a design in which subjects can decide for themselves the probability with which they receive a treatment. Requested propensities are as below.

Can you :

List the set of admissible treatment allocations
Describe a scheme for allocating subjects to treatment
Calculate your estimate under each allocation
Assess whether your estimate will be biased or not.

Requested propensity	Y0	Y1
0.2	0	1
0.4	0	2
0.6	1	3
0.8	1	4

Q 5.3

Network Randomization

You have access to a network of all friendship links in a classroom. This is in the form of an \(n\times n\) adjacency matrix where a 0/1 means the row individual is / is not a friend of the column individual.

You want to provide political information to a set of students and see how much more likely it is that a student that you do not give information to receives the information if a friend is treated compared to the situation in which a friend of a friend is treated. So you want to be sure that some subjects have friends treated and some have only friends of friends treated. How would you assign treatment? How can you work out your treatment assignment probabilities?

Bonus: Generate data and diagnose the estimation strategy available using the interference package

Q 5.4

Re-randomization

You have 10 units that you want to assign to treatment and control. On each unit you have two covariates, each with a lopsided distribution (for example, log normally distributed) and each strongly associated with the outcome of interest.

You are worried that you will have a good chance of significant imbalance between covariates and are thinking about using a procedure in which you re-randomize in the event in which you have some poor balance (for this you need to define what you mean by poor balance, e.g. you might want that balance is in the lower quartile of possible imbalances).

Compare the following strategies in terms of (a) bias (b) RMSE and (c) coverage:

Ignoring imbalances and using whatever randomization gets realized
Set a rule and rerandomize if the rule is broke, then proceed as normal
Randomize many times and select the set of randomizations that meet your rule. Then select from that set at random.

In case 3 you can make use of your knowledge of the randomization to generate assignment propensities and implement randomization inference.

See Morgan and Rubin

5.2 Design evaluation

Q 5.5

Covariates 1

Compare the power gains from two strategies:

Adding covariates in the analysis stage
Blocked randomization in the design stage

Q 5.6

Covariates 2

Compare the performance of the Lin estimator an the doubly robust estimator discussed in slides

Q 5.7

Covariates 3

Sometimes researchers running an experiment look for imbalance on a covariate and then include the covariate as a control if and only if they see imbalance. Set up a design in which a covariate may or may not affect potential outcomes and assess the performance given different rules

no control
control as a function of correlations of covariates with outcomes
control as a function of correlations of covariates with outcomes in control only
control as a function of correlations of covariates with treatments
control regardless of correlations

6 Topics 1

Q 6.1

Covariates

Equation (4.6) in Gerber and Green (2012) suggests that if the sum of two slopes exceeds 1 then there are gains in efficiency from adding a covariate. Show that this is true in practice using a design declaration.

Q 6.2

Difference in Differences

DD 16.3 shows poor behavior of a multi-period difference in differences design when there is effect heterogeneity.

Imai and Kim (2021) highlights the risk of negative weights in this setting. Can you recover the implied weights and identify which units contribute negatively to estimates?

Q 6.3

Doubly Robust Estimation

Say in truth that:

\(Z\) is assigned according to: \(\Pr(Z=1) := \pi = 1/2 + X1/4 + \alpha X2/4\)
Potential outcomes are given by \(Y = Z + \beta X1 + X2 + \epsilon\)

In particular:

declare_model(
    N = 500, 
    
    X1 = runif(N, -1, 1),
    X2 = runif(N, -1, 1),
    
    # propensity model correct when alpha = 0
    p = 1/2 + X1/4 + alpha*X2/4,
    Z = rbinom(N, 1, p),
    
    # outcome model correct when beta = 0
    Y = Z + beta*X1 + X2 + rnorm(N)
    )

However, you have:

a prediction \(\hat{\pi}\) from regressing \(Z\) on \(X1\) (e.g. using a probit)
a prediction \(\hat{Y_z}\) from regressing \(Y\) on \(X2\) in each treatment group (and predicting to the other group)

Compare your estimates from:

IPW (using \(\hat{\pi}\))
Calculating the average of \(\hat{Y_1} - \hat{Y_0}\)
Employing the doubly robust formula from the slides to get an estimate of treatment effects (don’t worry about estimating standard errors)
Naive model regressing Y on Z.

as \(\alpha\) and \(\beta\) vary over {0, .5, 1}

discuss whether and when doubly robust estimation dominates the other approaches

Hint for 2: your estimate from average differences in imputed potential outcomes should align exactly with what you would get from Lin estimation: why?

Hint for 3: You can use a handler for the doubly robust estimator; assuming you have figured out predicted probabilities (p_pred) and potential outcomes (y0 and y1):

dr <- function(data)
  with(data,
       data.frame(
    estimate = mean((Z/p_pred)*(Y - y1) - ((1-Z)/(1-p_pred))*(Y - y0) + (y1 - y0))
    ))

and

declare_estimator(handler = label_estimator(dr), label = "dr")

Q 6.4

Matching

Assume the same data generating process as in the last example.

declare_model(
    N = 500, 
    
    X1 = runif(N, -1, 1),
    X2 = runif(N, -1, 1),
    
    # propensity model correct when alpha = 0
    p = 1/2 + X1/4 + alpha*X2/4,
    Z = rbinom(N, 1, p),
    
    # outcome model correct when beta = 0
    Y = Z + beta*X1 + X2 + rnorm(N)
    )

Compare results when you match just on \(X1\) or on both \(X1\) and \(X2\). Use Coarsened exact matching and propensity matching, varying \(\alpha\) and \(\beta\) as above.

7 Topics 2

Q 7.1

Spillovers

Imagine a study with three subjects. Each subject’s potential outcomes are as follows:

0 if in control
\(n\) if in treatment when \(n\) subjects are assigned to treatment

So for example if one unit is in treatment that unit has outcome Y=1, and the others have Y = 0; if all 3 are in treatment they all have Y=3.

Write down the potential outcomes for all possible assignments, including all and none assigned o treatment
Write down the difference in means calculation for each possible assignment
Define two estimands of interest.
Define a randomization scheme and answer strategy that will return an unbiased estimate of each estimand.

Q 7.2

Mediation

Baron Kenny have provided a popular method to implement mediation analysis.

Declare a design with a mediation process and possible violations of sequential ignorability, governed by some parameters k.

Demonstrate under what conditions estimates using the Baron-Kenny procedure are misleading.

Q 7.3

LATE

Which of these problems could be addressed using instrumental variables? In each case, what kinds of concern might you have about the IV strategy?

Experimenters introduce a unconditional cash transfers into a set of villages in 2012 and use it to measure access school attendance in 2015. You come on the scene later and are interested in whether the transfer could have led to greater political participation in 2017.
Experimenters introduce an unconditional cash transfers into a set of villages in 2012 and use it to measure access school attendance in 2015. You come on the scene later and are interested in whether the increased school attendance could have led to greater political participation in 2017.
You want to understand the effects of attending a rally on subsequent support for a candidate. You send a random set of voters a flyer about an upcoming demonstration.
You want to understand the effects of attending a rally on subsequent support for a candidate. You send a random set of voters a flyer about an upcoming demonstration but you find out later that your enumerators did not deliver the flyers in a bunch of areas.
You want to understand the effects sending flyers about an upcoming demonstration but you find out later that your computer code used incomplete data when making assignments and so failed to assign treatment to a whole bunch of regions.
You want to understand whether sending flyers increases participation because people actually go to the rallies or because people’s general level awareness of the election increases, whether or not they go.

8 References

Gerber, Alan S, and Donald P Green. 2012. Field Experiments: Design, Analysis, and Interpretation. Norton.

Imai, Kosuke, and In Song Kim. 2021. “On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data.” Political Analysis 29 (3): 405–15.

Lin, Winston. 2012. “Agnostic Notes on Regression Adjustments to Experimental Data: Reexamining Freedman’s Critique.” arXiv Preprint arXiv:1208.2301.

1 Familiarity with DeclareDesign