<-
design declare_model(
N = 1000,
U = rnorm(N),
potential_outcomes(Y ~ (1-Z)*rnorm(N) + Z*rnorm(N, 1))) +
declare_inquiry(PATE = mean(Y_Z_1 - Y_Z_0))
puzzles for causal inference and experimental design
Class: https://macartan.github.io/ci/
For each puzzle: explore the issues raised by the puzzle and generate a self contained presentation in .qmd
(or .Rmd
) that reports on your investigations. Present in the next class session.
1 Familiarity with DeclareDesign
Q 1.1
Basic manipulations
Consider the following design:
- What do each of these two steps do?
- Draw data from this design. What are the dimensions of the data set? Is this what you would expect?
- Calculate the population average treatment effect for this dataset using
Y_Z_
~ andY_Z_0
columns. Is this what you would expect from the potential outcomes function? - Diagnose this design. Why so many NAs?
Now extend the design as follows:
<-
design +
design declare_sampling(S = complete_rs(N = N, n = 20)) +
declare_assignment(Z = complete_ra(N)) +
declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
declare_inquiry(SATE = mean(Y_Z_1 - Y_Z_0)) +
declare_estimator(Y ~ Z, .method = lm)
- What do each of the new steps do? What is the difference between the two inquiries? Explain how they are different.
- Draw data from this design. What are the dimensions of the data set? Is this what you would expect?
- Diagnose the design using
diagnose_design
. Use a large number of simulations, e.g. settingsims = 2000
. Why is there a difference in the RMSE for the two estimands? Why is the power the same? Why is there a difference in coverage?
Q 1.2
[From DeclareDesign instructor material] In this exercise, you will conduct a power analysis for a two-arm trial with 100 units using DeclareDesign
. Start with the following model, which defines 100 units to be studied, a variable U
representing (unknown) unit characteristics, and two potential outcomes (for treatment and for control) that depend on the expected effect size, the treatment variable Z
, and the variable U
.
You can find help with the conceptual material in this exercise in Chapters 10 and 11 and help with coding in Chapter 13.
<- 0.1
expected_effect_size
<-
two_arm_trial_model declare_model(
N = 100,
U = rnorm(N), # unknown heterogeneity
potential_outcomes(Y ~ expected_effect_size * Z + U)
)
Add an inquiry declaration, which should be the average treatment effect, or average difference between the treated and control potential outcomes.
Add a data strategy declaration, including an assignment step that creates a variable
Z
using simple random assignment with probability 0.5, and a measurement step in which you reveal the observed outcomesY
using thereveal_outcomes
function.Add an answer strategy declaration, which uses the
difference_in_means
function (pass it to the.method
argument ofdeclare_estimator
) and calculates the difference in mean observed outcomes between the treatment and control groups. Link the estimator to the inquiry.Draw data from the design, and confirm that the variables you expect are present, including unit characteristics, two potential outcomes, the treatment variable, and the observed outcome.
Run the design once using
run_design
to confirm that the estimand and estimate columns are present. What do these columns refer to?Diagnose the design with 1000 simulations and calculate its statistical power. What is the power?
Calculate the minimum detectable effect size of the design, which means the smallest expected effect size for which the statistical power of the design reaches the common threshold of 80%. Use the
redesign
function to redesign overexpected_effect_size
’s between 0.1 and 1 by steps of 0.1 and thendiagnose
the resulting list of designs. Find the smallest expected effect size where the diagnosis reports the power is at least 0.8.Bonus: What is the smallest sample size would you need to obtain 80% power for an expected effect size of 0.3? You can estimate power in sample size steps of 100 and report the smallest size that exceeds 80% power.
Q 1.3
Sometimes people worry that with larger samples you are more likely to get a false positive. Is that true?
Assess by generating a simple experimental design from scratch in which we can vary the
N
and in which there is no true effect of some treatment.
Then:
- Plot the distribution of \(p\) values from the
simulations_df
. What shape is it and why? - Plot the power as \(N\) increases, using the
diagnosands_df
- Plot the estimates against \(p\) values for different values of \(N\); what do you see?
- Discuss
Hint: the slides contain code for a simple experimental design
Q 1.4
Clustering
- Say that you have a set of 20 schools randomly sampled from a superpopulation of schools. There are 5 classrooms in each school and 5 students in each class room.
- Say you assign a treatment at the classroom level. Should you cluster your standard errors at the level of the school or at the level of the classroom?
Now:
- Declare a design with this hierarchical data structure. Allow for the possibility that treatment effects vary at the school level. Assess the performance of the standard errors when you cluster at each of these levels (and when you do not cluster at all).
- Examine whether the performance depends on whether you are interested in the population average effects or the sample average effects.
Hint For generating hierarchical models use add_level
. Also: be sure to have a reasonable large top level shock in order to see differences arising from clustering at the school level. You could also try heterogeneous effects by school.
<-
g declare_model(
L1 = add_level(N = 10, u = rnorm(N)),
L2 = add_level(N = 12, v = rnorm(N)))
g() |> slice(1:3, 13:15) |> kable()
L1 | u | L2 | v |
---|---|---|---|
01 | -0.2935756 | 001 | 2.0392066 |
01 | -0.2935756 | 002 | -0.5946925 |
01 | -0.2935756 | 003 | -1.0750439 |
02 | 1.0698344 | 013 | 2.1752217 |
02 | 1.0698344 | 014 | 0.3008131 |
02 | 1.0698344 | 015 | -0.4301839 |
Q 1.5
Learning about standard errors
The standard error is standard deviation of the sampling distribution of an estimate.
That sounds complicated, but actually the sampling distribution of an estimate lives in the simulations data frame so you can look at its standard deviation and assess whether standard errors estimate it well.
Challenge
: Generate a simple experimental design in which there is a correlation (rho
) between the two potential outcomes (Y_Z_0
andY_Z_1
).Show the distribution of the estimates over different values of
rho
Assess the performance of the estimates of the standard errors and the coverage as
rho
goes from -1 to 0 to 1. Describe how coverage changes. (Be sure to be clear on what coverage is!)
<- rnorm(1000)
Y_Z_0 <- correlate(rnorm, given = Y_Z_0, rho = .5)
Y_Z_1
cor(Y_Z_0, Y_Z_1)
[1] 0.5018144
Q 1.6
Declare a design in which:
- The assignment of a treatment \(X\) depends in part on upon some other, binary, variable \(W\): in particular \(\Pr(X=1|W=0) = .2\) and \(\Pr(X=1|W=1) = .5\))
- The outcome \(Y\) depends on both \(X\) and \(W\): in particular \(Y = X*W + u\) where \(u\) is a random shock.
- Diagnose a design with three approaches to estimating the effect of \(X\) on \(Y\): (a) ignoring \(W\) (b) adding \(W\) as a linear control (c) including both \(W\) and an interaction between \(W\) and \(X\).
Discuss results. Do any of these return the right answer?
Hint: You can add three separate declare_estimator
steps. They should have distinct labels. The trickiest part is to figure out how to extract the estimate in (c) because you will have both a main term and an interaction term for \(X\).
2 Causality
2.1 Potential outcomes
These do not require coding
Q 2.1
Potential outcomes
Consider an outcome \(Y\) that can be affected by two variables \(X_1\) and \(X_2\). All variables are binary. Can you fill in the potential outcomes (rows) for each of the column types?
X1 is a necessary and sufficient condition for Y | X1 is necessary but not sufficient | X1 is sufficient but not necessary | X1 sometimes causes Y but is neither necessary nor sufficient | |
---|---|---|---|---|
Y(0,0) = | ||||
Y(1,0) = | ||||
Y(0,1) = | ||||
Y(1,1) = |
Q 2.2
Consider an outcome Y that can be affected by two variables X1 and X2 but say that X2 can itself be affected by X1. Write down possible potential outcomes for Y1 and X2 when: X1 causes X2 and Y, but X1 does not cause Y through X2
Potential outcome | Value (0/1) |
---|---|
Y(0,0) = | |
Y(1,0) = | |
Y(0,1) = | |
Y(1,1) = | |
X2(0) = | |
X2(1) = |
Q 2.3
- Say X1=1, X2=1. Then Y = 1. What caused Y = 1?
- Say X1=0, X2=0. Then Y = 0. What caused Y = 0?
- Say X1 = 1 with 10% probability, otherwise 0 and, independently, X2 = 1 with 50% probability, otherwise 0.
Then what is the average effect of X1 on Y? What is the average effect of X2 on Y? Which cause has the biggest effect?
Q 2.4
- A set of units have outcome \(Y^1_i\) at baseline.
- At endline they have potential outcomes \(Y^2_i(0)\) and \(Y^2_i(1)\)
- Write down the estimand for the average effect of treatment on endline outcomes
- Write down the estimand for the average effect of treatment on the change from baseline to endline for all units
Compare these and discuss.
2.2 Inquiries and identification
These next questions use some concepts we have not introduced yet. Don’t worry if your answers are incomplete but do share your thought processes around these.
Q 2.5
- Declare a simple design in which (i) \(X\) and \(Y\) both have a positive effect on (binary) \(K\) but \(X\) does not cause \(Y\) (ii) a researcher conditions on \(K==1\) when estimating the effect of \(X\) on \(Y\)
- Show that this can generate biased results. Can you find situations where the bias can be either positive or negative?
Hint: The direction of collider bias is related to the ways that \(K\) and \(Y\) interact to produce \(X\). Also: by “conditioning on \(K=1\)” we mean: using only cases for which \(K=1\).
Q 2.6
- Draw a DAG with 5 nodes representing a situation in which \(X\) causes \(Y\) though \(M\), \(C\) affects both \(X\) and \(M\) and \(D\) affects both \(M\) and \(Y\).
- Think through what set of nodes which, when controlled for, would allow for the identification of the effect of \(X\) on \(Y\).
- Represent it in
dagitty
and check your answer - Bonus: Declare the design and compare the behavior of designs that do and do not control for these nodes.
Q 2.7
A | B | C | p |
---|---|---|---|
0 | 0 | 0 | 0.32 |
0 | 0 | 1 | 0.08 |
0 | 1 | 0 | 0.08 |
0 | 1 | 1 | 0.02 |
1 | 0 | 0 | 0.08 |
1 | 0 | 1 | 0.12 |
1 | 1 | 0 | 0.12 |
1 | 1 | 1 | 0.18 |
Set up a model in DeclareDesign
that has this distribution. Draw a large dataset from it and check if relations of conditional independence implied by your DAG.
Hint:This is relatively tricky. From slides you will see a DAG is a directed acyclic graph. A DAG should represent relations of conditional independence in the sense that any node \(A\) that is separated from another node \(B\) given nodes \(W\) should be conditionally independent given \(W\). You should be able to read from this table which nodes are conditionally independent from each other given other nodes. You should be able to check consistency between the probability distribution and an underlying model by calculating quantities such as \(\Pr(x, m,y) = \Pr(x)\times\Pr(m|x)\times\Pr(y|m,x)\).
Q 2.8
- Say that in truth ATE of \(X\) on \(M\) is .9 and that the ATE of \(M\) on \(Y\) is .9. Is the implied effect of 0.81 on \(X\) on \(Y\) identified?
- Say that in truth ATE of \(X\) on \(M\) is 1 and that the ATE of \(M\) on \(Y\) is 1. Is the implied effect of 1 on \(X\) on \(Y\) identified?
- Discuss
Hint: This question is asking about the front door criterion. Check whether the conditions apply for the front door criterion to hold. Note that an effect is not identified if the data pattern it produces is also consistent with a different effect. Is that the case here? The second part of this is more important than the first part. Note you can generate and update models of this form with CausalQueries
.
3 Estimation and Inference
3.1 Frequentist
Q 3.1
Block | Z | Y |
---|---|---|
1 | 0 | 0 |
1 | 0 | 0 |
1 | 1 | 1 |
2 | 0 | 0 |
2 | 0 | 0 |
2 | 0 | 1 |
2 | 1 | 0 |
2 | 1 | 1 |
2 | 1 | 1 |
- Can you estimate the ATE? How about the ATT? And the ATC?
- How do these compare to a simple difference in means between treatment and control?
- Use
DeclareDesign
to compare the answers you get if you use exactly this data and calculate (1) OLS and controlled for block (2) IPW and (3) the Lin estimator - Imagine now letting the size of the data increase by a factor \(k\)— meaning that if \(k=2\) increase you would have twice as many units in each block. Show how the precision of your estimates changes under these different strategies as \(k\) increases
Hint: If you use exactly this data and replicate it there is no stochastic component; you can run with sims = 1
; to get the precision of your estimates you can declare the diagnosand mean_se = mean(std.error)
Q 3.2
Declare a factorial design with two binary treatments and an interaction between them. For example \(Y = .1*X1 + .3*X2 - .2*X1*X2\)
Have a non-stochastic model declaration so that the only source of randomness is in the assignment procedure.
Calculate a \(p\) values for the null hypothesis of no interaction between treatments using randomization inference.
- Bonus: Can you check the validity of the p value using the simulations dataframe?
Hint: You can use the ri2
package. Hint you can keep the top level fixed by declaring shocks outside the design (U <- rnorm(N)
) or by using a simulation vector in diagnosis (sims = c(1,1,500, 1, ...)
)
Q 3.3
Replicate Table 1 in Lin (2012); ignore first rows which are theoretical results; include standard errors of estimates (these are not included in Table 1 but are produced by diagnose_design
); ‘Tyranny of the minority’ estimator is optional.
Q 3.4
Say you want to include a control variable. But you have missingness in the control. Should you proceed and what can you do about it?
Declare a design for an experiment in which a binary covariate X
is related to potential outcomes, according to b
, and so to treatment effects. Say X
is missing with probability p
.
Compare answer strategies in which you:
- do not control for \(X\)
- do control for \(X\) but drop whenever \(X\) is missing and
- Treat \(X\) as a block in your analysis design with three values (0, 1, and missing).
Assess performance (RMSE) over a range of values for p
and b
. How do you think the comparison of strategies depends upon N?
3.2 Bayesian
Q 3.5
- Set up a model in which \(X \rightarrow Y\) (\(X\) as if random) and there is a true effect of 0.56.
- Update using large data generated according to some distribution over causal types
- Calculate a posterior on the share of \(X=1, Y=1\) cases for which \(Y=1\) is due to \(X=1\)
- Say you know that the effect of \(X\) on \(Y\) passes through \(M\) and so you have model \(X \rightarrow M \rightarrow Y\) and effects of 0.8 at the first stage and .7 at the second stage.
- Update using large data generated according to some distribution over causal types
- Calculate a posterior on the share of \(X=1, Y=1\) cases for which \(Y=1\) is due to \(X=1\) (a) when you know that \(M=1\) and (b) when you know that \(M=0\)
Bonus: Say instead of a single mediator \(M\) you had a chain: \(M_1, M_2, \dots\). Does lengthening the chain sufficiently allow you to identify causal effects?
Q 3.6
Consider the Napkin model: W->Z->X->Y; W <-> X; W <-> Y
make_model("W->Z->X->Y; W <-> X; W <-> Y") |> plot(x_coord = 1:4, y_coord = c(1,1,1,1))
Consider some true parameter vector and generate data from this vector, varying the amount of data from 10 to 100 to 1000 observations. Assume in particular that there is confounding: e.g. that the probability \(X=1, Y=1\) depends on \(W\).
In each case calculate the posterior distribution on the average effect of \(X\) on \(Y\). Assess whether the quantity appears to be identified.
Can you use a formula to calculate an effect directly?
Hint: You can generate a “target” model, generate data from that, and calculate from that a target query.
<- make_model("W->Z->X->Y; W <-> X; W <-> Y") |>
target_model set_parameters(param_name = "Y.11_W.1", parameters = .9) |>
set_parameters(param_name = "X.11_W.1", parameters = .9)
Note: this is hard because the W <-> Y
confounding implies an X <-> Y
confounding. There is no scope for front door adjustment. If you control for \(W\) you open a path from \(X\) to \(Y\) (since \(W\) is a collider) and, more subtly, conditioning on \(Z\) also partly opens a collider path. See this discussion: https://twitter.com/analisereal/status/1273099716956430340
Q 3.7
Bayes by hand
Say that we have 50 observations of \(Y_0\) and 50 observations of \(Y_1\) from a random experiment. Assume these are each drawn independently from a normal distribution centered on \(\mu_j\) with sd \(\sigma_j\), \(j\in\{0,1\}\).
Write down a likelihood function that returns the probability of seeing the 100 observations that you see given the four parameters: \(\mu_0, \mu_1, \sigma_0, \sigma_1\).
Use
grid <- expand_grid(m0 = ..., m1 = ...)
to generate a grid of possible values for the four parameters.Apply your likelihood function to all the possible parameter values contained in your grid.
Now:
- Identify the maximum likelihood set of values
- Calculate the posterior distribution assuming uniform priors over the range
- Identify the posterior modes
- Calculate the posterior means
- Compare the estimates of a treatment effect you would obtain from
- maximum likelihood
- posterior means
- ols
Q 3.8
Generate a simple multilevel experimental design (e.g 20 children each in 20 schools). Assume that the treatment effect in each schools is drawn from a normal distribution with a given variance \(\sigma\).
Use design diagnosis to assess the ability of a Bayesian multilevel hierarchical to recover \(\sigma\).
Bonus: Are estimates of treatment effects more or less reliable than what you would get from a frequentist approach that interacts school IDs with treatment?
4 Design
4.1 Experimental Design
Q 4.1
Randomization
100 students sign up to take part in an experiment. You want to measure the effect of immigration on social trust. Half your subjects are men and half are women and you believe gender is very predictive of social trust.
Your experiment involves varying whether a “native” or an “immigrant” facilitator instructs players in how to play a trust game. You have five native and five immigrant facilitators and you want them each to conduct one session with 10 subjects.
You are free to assign both subjects and facilitators to sessions. Propose an appropriate randomization strategy. Is it blocked? Clustered? Both?
Say now that subjects have already signed up for sessions. You can only assign facilitators to sessions, but you have access to the subject lists before you do so. Describe your optimal randomization strategy. Is it blocked? Clustered? Both?
Q 4.2
Heterogeneous propensities
Two of four units are going to be assigned to treatment. A researcher sets up a design in which subjects can decide for themselves the probability with which they receive a treatment. Requested propensities are as below.
Can you :
- List the set of admissible treatment allocations
- Describe a scheme for allocating subjects to treatment
- Calculate your estimate under each allocation
- Assess whether your estimate will be biased or not.
Requested propensity | Y0 | Y1 |
---|---|---|
0.2 | 0 | 1 |
0.4 | 0 | 2 |
0.6 | 1 | 3 |
0.8 | 1 | 4 |
Q 4.3
You have access to a network of all friendship links in a classroom. This is in the form of an \(n\times n\) adjacency matrix where a 0/1 means the row individual is / is not a friend of the column individual.
You want to provide political information to a set of students and see how much more likely it is that a student that you do not give information to receives the information if a friend is treated compared to the situation in which a friend of a friend is treated. So you want to be sure that some subjects have friends treated and some have only friends of friends treated. How would you assign treatment? How can you work out your treatment assignment probabilities?
Bonus: Generate data and diagnose the estimation strategy available using the interference package
Q 4.4
You have 10 units that you want to assign to treatment and control. On each unit you have two covariates, each with a lopsided distribution (for example, log normally distributed) and each strongly associated with the outcome of interest.
You are worried that you will have a good chance of significant imbalance between covariates and are thinking about using a procedure in which you re-randomize in the event in which you have some poor balance (for this you need to define what you mean by poor balance, e.g. you might want that balance is in the lower quartile of possible imbalances).
Compare the following strategies in terms of (a) bias (b) RMSE and (c) coverage:
- Ignoring imbalances and using whatever randomization gets realized
- Set a rule and rerandomize if the rule is broke, then proceed as normal
- Randomize many times and select the set of randomizations that meet your rule. Then select from that set at random.
In case 3 you can make use of your knowledge of the randomization to generate assignment propensities and implement randomization inference.
See Morgan and Rubin
4.2 Design evaluation
Q 4.5
Covariates 1
Compare the power gains from two strategies:
- Adding covariates in the analysis stage
- Blocked randomization in the design stage
Q 4.6
Equation (4.6) in Gerber and Green (2012) suggests that if the sum of two slopes exceeds 1 then there are gains in efficiency from adding a covariate. Show that this is true in practice using a design declaration.
Q 4.7
Compare the performance of the Lin estimator an the doubly robust estimator discussed in slides
Q 4.8
Covariates 4
Sometimes researchers running an experiment look for imbalance on a covariate and then include the covariate as a control if and only if they see imbalance. Set up a design in which a covariate may or may not affect potential outcomes and assess the performance given different rules
- no control
- control as a function of correlations of covariates with outcomes
- control as a function of correlations of covariates with outcomes in control only
- control as a function of correlations of covariates with treatments
- control regardless of correlations
5 Topics 1
Q 5.1
Spillovers
Imagine a study with three subjects. Each subject’s potential outcomes are as follows:
- 0 if in control
- \(n\) if in treatment when \(n\) subjects are assigned to treatment
So for example if one unit is in treatment that unit has outcome Y=1, and the others have Y = 0; if all 3 are in treatment they all have Y=3.
- Write down the potential outcomes for all possible assignments, including all and none assigned o treatment
- Write down the difference in means calculation for each possible assignment
- Define two estimands of interest.
- Define a randomization scheme and answer strategy that will return an unbiased estimate of each estimand.
6 Topics 2
Q 6.1
Baron Kenny have provided a popular method to implement mediation analysis.
Declare a design with a mediation process and possible violations of sequential ignorability, governed by some parameters k
.
Demonstrate under what conditions estimates using the Baron-Kenny procedure are misleading.
Q 6.2
DD 16.3 shows poor behavior of a multi period difference in differences design when there is effect heterogeneity.
Imai and Kim (2021) highlights the risk of negative weights in this setting. Can you recover the implied weights and identify which units contribute negatively to estimates?
Q 6.3
LATE
Which of these problems could be addressed using instrumental variables? In each case, what kinds of concern might you have about the IV strategy?
- Experimenters introduce a unconditional cash transfers into a set of villages in 2012 and use it to measure access school attendance in 2015. You come on the scene later and are interested in whether the transfer could have led to greater political participation in 2017.
- Experimenters introduce an unconditional cash transfers into a set of villages in 2012 and use it to measure access school attendance in 2015. You come on the scene later and are interested in whether the increased school attendance could have led to greater political participation in 2017.
- You want to understand the effects of attending a rally on subsequent support for a candidate. You send a random set of voters a flyer about an upcoming demonstration.
- You want to understand the effects of attending a rally on subsequent support for a candidate. You send a random set of voters a flyer about an upcoming demonstration but you find out later that your enumerators did not deliver the flyers in a bunch of areas.
- You want to understand the effects sending flyers about an upcoming demonstration but you find out later that your computer code used incomplete data when making assignments and so failed to assign treatment to a whole bunch of regions.
- You want to understand whether sending flyers increases participation because people actually go to the rallies or because people’s general level awareness of the election increases, whether or not they go.
7 Additional puzzle assortment
Q 7.1
Say you sampled subject A with probability .6 and subject B with probability .4. Say you assigned each to treatment with probability .6. What weight should you put on A in your analysis if they end up in treatment? What if they end up in control? How about B?
Q 7.2
Generate a simple experimental design and estimate treatment effects using a (a) Bayesian regression with rstanarm
. Provide informative priors (b) lm_robust
Let the size of the data increase from 10 to 1000 and: plot the estimates from the two approaches as \(N\) increases, plot the average standard error and the posterior variance from the two approaches as \(N\) increases.