puzzles for causal inference and experimental design

Author

Macartan Humphreys

Published

January 4, 2024

Class: https://macartan.github.io/ci/

For each puzzle: explore the issues raised by the puzzle and generate a self contained presentation in .qmd (or .Rmd) that reports on your investigations. Present in the next class session.

1 Course outline, tools,

2 Introduction to DeclareDesign

Q 2.1
False positives and \(N\)s

  • Sometimes people worry that with larger samples you are more likely to get a false positive. Is that true?

  • Assess by generating a simple experimental design from scratch in which we can vary the N and in which there is no true effect of some treatment.

Then:

  • Plot the distribution of \(p\) values from the simulations_df. What shape is it and why?
  • Plot the power as \(N\) increases, using the diagnosands_df
  • Plot the estimates against \(p\) values for different values of \(N\); what do you see?
  • Discuss

Hint: the slides contain code for a simple experimental design

Q 2.2

Clustering

  • Say that you have a set of 20 schools randomly sampled from a superpopulation of schools. There are 5 classrooms in each school and 5 students in each class room.
  • Say you assign a treatment at the classroom level. Should you cluster your standard errors at the level of the school or at the level of the classroom?

Now:

  • Declare a design with this hierarchical data structure. Allow for the possibility that treatment effects vary at the school level. Assess the performance of the standard errors when you cluster at each of these levels (and when you do not cluster at all).
  • Examine whether the performance depends on whether you are interested in the population average effects or the sample average effects.

Hint For generating hierarchical models use add_level. Also: be sure to have a reasonable large top level shock in order to see differences arising from clustering at the school level. You could also try heterogeneous effects by school.

g <- 
  declare_model(
    L1 = add_level(N = 10, u = rnorm(N)),
    L2 = add_level(N = 12, v = rnorm(N)))

g() |> slice(1:3, 13:15) |> kable()
L1 u L2 v
01 0.5917647 001 0.1260403
01 0.5917647 002 0.1244666
01 0.5917647 003 -1.4247324
02 0.0470489 013 1.3728940
02 0.0470489 014 -0.1026590
02 0.0470489 015 -0.9639386
Q 2.3

Learning about standard errors

  • The standard error is standard deviation of the sampling distribution of an estimate.

  • That sounds complicated, but actually the sampling distribution of an estimate lives in the simulations data frame so you can look at its standard deviation and assess whether standard errors estimate it well.

  • Challenge: Generate a simple experimental design in which there is a correlation (rho) between the two potential outcomes (Y_Z_0 and Y_Z_1).

  • Show the distribution of the estimates over different values of rho

  • Assess the performance of the estimates of the standard errors and the coverage as rho goes from -1 to 0 to 1. Describe how coverage changes. (Be sure to be clear on what coverage is!)

Y_Z_0 <- rnorm(1000)
Y_Z_1 <- correlate(rnorm, given = Y_Z_0, rho = .5)

cor(Y_Z_0, Y_Z_1)
[1] 0.5254649

Q 2.4
Confounded.

Declare a design in which:

  • The assignment of a treatment \(X\) depends in part on upon some other, binary, variable \(W\): in particular \(\Pr(X=1|W=0) = .2\) and \(\Pr(X=1|W=1) = .5\))
  • The outcome \(Y\) depends on both \(X\) and \(W\): in particular \(Y = X*W + u\) where \(u\) is a random shock.
  • Diagnose a design with three approaches to estimating the effect of \(X\) on \(Y\): (a) ignoring \(W\) (b) adding \(W\) as a linear control (c) including both \(W\) and an interaction between \(W\) and \(X\).

Discuss results. Do any of these return the right answer?

Hint: You can add three separate declare_estimator steps. They should have distinct labels. The trickiest part is to figure out how to extract the estimate in (c) because you will have both a main term and an interaction term for \(X\).

3 Causality: Potential outcomes

These do not require coding

Q 3.1

Potential outcomes

Consider an outcome \(Y\) that can be affected by two variables \(X_1\) and \(X_2\). All variables are binary. Can you fill in the potential outcomes (rows) for each of the column types?

X1 is a necessary and sufficient condition for Y X1 is necessary but not sufficient X1 is sufficient but not necessary X1 sometimes causes Y but is neither necessary nor sufficient
Y(0,0) =
Y(1,0) =
Y(0,1) =
Y(1,1) =
Q 3.2

Consider an outcome Y that can be affected by two variables X1 and X2 but say that X2 can itself be affected by X1. Write down possible potential outcomes for Y1 and X2 when: X1 causes X2 and Y, but X1 does not cause Y through X2

Potential outcome Value (0/1)
Y(0,0) =
Y(1,0) =
Y(0,1) =
Y(1,1) =
X2(0) =
X2(1) =

Q 3.3
Consider a process with: Y(0,0) =0, Y(1,0) = 1, Y(0,1) =1, Y(1,1) =1.

  • Say X1=1, X2=1. Then Y = 1. What caused Y = 1?
  • Say X1=0, X2=0. Then Y = 0. What caused Y = 0?
  • Say X1 = 1 with 10% probability, otherwise 0 and, independently, X2 = 1 with 50% probability, otherwise 0.

Then what is the average effect of X1 on Y? What is the average effect of X2 on Y? Which cause has the biggest effect?

Q 3.4
  • A set of units have outcome \(Y^1_i\) at baseline.
  • At endline they have potential outcomes \(Y^2_i(0)\) and \(Y^2_i(1)\)
  1. Write down the estimand for the average effect of treatment on endline outcomes
  2. Write down the estimand for the average effect of treatment on the change from baseline to endline for all units

Compare these and discuss.

4 Inquiries and identification

These next questions use some concepts we have not introduced yet. Don’t worry if your answers are incomplete but do share your thought processes around these.

Q 4.1
Collider bias

  • Declare a simple design in which (i) \(X\) and \(Y\) both have a positive effect on (binary) \(K\) but \(X\) does not cause \(Y\) (ii) a researcher conditions on \(K==1\) when estimating the effect of \(X\) on \(Y\)
  • Show that this can generate biased results. Can you find situations where the bias can be either positive or negative?

Hint: The direction of collider bias is related to the ways that \(K\) and \(Y\) interact to produce \(X\). Also: by “conditioning on \(K=1\)” we mean: using only cases for which \(K=1\).

Q 4.2
  • Draw a DAG with 5 nodes representing a situation in which \(X\) causes \(Y\) though \(M\), \(C\) affects both \(X\) and \(M\) and \(D\) affects both \(M\) and \(Y\).
  • Think through what set of nodes which, when controlled for, would allow for the identification of the effect of \(X\) on \(Y\).
  • Represent it in dagitty and check your answer
  • Bonus: Declare the design and compare the behavior of designs that do and do not control for these nodes.

Q 4.3
Make a DAG that is consistent with this distribution

A B C p
0 0 0 0.32
0 0 1 0.08
0 1 0 0.08
0 1 1 0.02
1 0 0 0.08
1 0 1 0.12
1 1 0 0.12
1 1 1 0.18

Set up a model in DeclareDesign that has this distribution. Draw a large dataset from it and check if relations of conditional independence implied by your DAG.

Hint:This is relatively tricky. From slides you will see a DAG is a directed acyclic graph. A DAG should represent relations of conditional independence in the sense that any node \(A\) that is separated from another node \(B\) given nodes \(W\) should be conditionally independent given \(W\). You should be able to read from this table which nodes are conditionally independent from each other given other nodes. You should be able to check consistency between the probability distribution and an underlying model by calculating quantities such as \(\Pr(x, m,y) = \Pr(x)\times\Pr(m|x)\times\Pr(y|m,x)\).

Q 4.4
Imagine a model that looks like this:

  • Say that in truth ATE of \(X\) on \(M\) is .9 and that the ATE of \(M\) on \(Y\) is .9. Is the implied effect of 0.81 on \(X\) on \(Y\) identified?
  • Say that in truth ATE of \(X\) on \(M\) is 1 and that the ATE of \(M\) on \(Y\) is 1. Is the implied effect of 1 on \(X\) on \(Y\) identified?
  • Discuss

Hint: This question is asking about the front door criterion. Check whether the conditions apply for the front door criterion to hold. Note that an effect is not identified if the data pattern it produces is also consistent with a different effect. Is that the case here? The second part of this is more important than the first part. Note you can generate and update models of this form with CausalQueries.

5 Estimation and Inference: Frequentist

Q 5.1
Consider the following data. Treatment is assigned in two blocks. One third of block 1 got treatment (randomly); one third of block two got treatment (randomly). The data are as below:

Block Z Y
1 0 0
1 0 0
1 1 1
2 0 0
2 0 0
2 0 1
2 1 0
2 1 1
2 1 1
  • Can you estimate the ATE? How about the ATT? And the ATC?
  • How do these compare to a simple difference in means between treatment and control?
  • Use DeclareDesign to compare the answers you get if you use exactly this data and calculate (1) OLS and controlled for block (2) IPW and (3) the Lin estimator
  • Imagine now letting the size of the data increase by a factor \(k\)— meaning that if \(k=2\) increase you would have twice as many units in each block. Show how the precision of your estimates changes under these different strategies as \(k\) increases

Hint: If you use exactly this data and replicate it there is no stochastic component; you can run with sims = 1; to get the precision of your estimates you can declare the diagnosand mean_se = mean(std.error)

Q 5.2
randomization inferences

Declare a factorial design with two binary treatments and an interaction between them. For example \(Y = .1*X1 + .3*X2 - .2*X1*X2\)

Have a non-stochastic model declaration so that the only source of randomness is in the assignment procedure.

Calculate a \(p\) values for the null hypothesis of no interaction between treatments using randomization inference.

  • Bonus: Can you check the validity of the p value using the simulations dataframe?

Hint: You can use the ri2 package. Hint you can keep the top level fixed by declaring shocks outside the design (U <- rnorm(N)) or by using a simulation vector in diagnosis (sims = c(1,1,500, 1, ...))

Q 5.3
doubly robust

Replicate Table 1 in Lin (2012); ignore first rows which are theoretical results; include standard errors of estimates (these are not included in Table 1 but are produced by diagnose_design); ‘Tyranny of the minority’ estimator is optional.

Q 5.4
Missing data on controls

Say you want to include a control variable. But you have missingness in the control. Should you proceed and what can you do about it?

Declare a design for an experiment in which a binary covariate X is related to potential outcomes, according to b, and so to treatment effects. Say X is missing with probability p.

Compare answer strategies in which you:

  1. do not control for \(X\)
  2. do control for \(X\) but drop whenever \(X\) is missing and
  3. Treat \(X\) as a block in your analysis design with three values (0, 1, and missing).

Assess performance (RMSE) over a range of values for p and b. How do you think the comparison of strategies depends upon N?

6 Bayesian

Q 6.1
Clues from causal processes

  • Set up a model in which \(X \rightarrow Y\) (\(X\) as if random) and there is a true effect of 0.56.
  • Update using large data generated according to some distribution over causal types
  • Calculate a posterior on the share of \(X=1, Y=1\) cases for which \(Y=1\) is due to \(X=1\)
  • Say you know that the effect of \(X\) on \(Y\) passes through \(M\) and so you have model \(X \rightarrow M \rightarrow Y\) and effects of 0.8 at the first stage and .7 at the second stage.
  • Update using large data generated according to some distribution over causal types
  • Calculate a posterior on the share of \(X=1, Y=1\) cases for which \(Y=1\) is due to \(X=1\) (a) when you know that \(M=1\) and (b) when you know that \(M=0\)

Bonus: Say instead of a single mediator \(M\) you had a chain: \(M_1, M_2, \dots\). Does lengthening the chain sufficiently allow you to identify causal effects?

Q 6.2
Napkin identification

Consider the Napkin model: W->Z->X->Y; W <-> X; W <-> Y

make_model("W->Z->X->Y; W <-> X; W <-> Y") |> plot(x_coord = 1:4, y_coord = c(1,1,1,1))

Consider some true parameter vector and generate data from this vector, varying the amount of data from 10 to 100 to 1000 observations. Assume in particular that there is confounding: e.g. that the probability \(X=1, Y=1\) depends on \(W\).

In each case calculate the posterior distribution on the average effect of \(X\) on \(Y\). Assess whether the quantity appears to be identified.

Can you use a formula to calculate an effect directly?

Hint: You can generate a “target” model, generate data from that, and calculate from that a target query.

target_model <- make_model("W->Z->X->Y; W <-> X; W <-> Y") |>  
  set_parameters(param_name = "Y.11_W.1", parameters = .9) |> 
  set_parameters(param_name = "X.11_W.1", parameters = .9)

Note: this is hard because the W <-> Y confounding implies an X <-> Y confounding. There is no scope for front door adjustment. If you control for \(W\) you open a path from \(X\) to \(Y\) (since \(W\) is a collider) and, more subtly, conditioning on \(Z\) also partly opens a collider path. See this discussion: https://twitter.com/analisereal/status/1273099716956430340

Q 6.3

Bayes by hand

Say that we have 50 observations of \(Y_0\) and 50 observations of \(Y_1\) from a random experiment. Assume these are each drawn independently from a normal distribution centered on \(\mu_j\) with sd \(\sigma_j\), \(j\in\{0,1\}\).

  1. Write down a likelihood function that returns the probability of seeing the 100 observations that you see given the four parameters: \(\mu_0, \mu_1, \sigma_0, \sigma_1\).

  2. Use grid <- expand_grid(m0 = ..., m1 = ...) to generate a grid of possible values for the four parameters.

  3. Apply your likelihood function to all the possible parameter values contained in your grid.

  4. Now:

  • Identify the maximum likelihood set of values
  • Calculate the posterior distribution assuming uniform priors over the range
  • Identify the posterior modes
  • Calculate the posterior means
  • Compare the estimates of a treatment effect you would obtain from
    • maximum likelihood
    • posterior means
    • ols

Q 6.4
Hierarchical

Generate a simple multilevel experimental design (e.g 20 children each in 20 schools). Assume that the treatment effect in each schools is drawn from a normal distribution with a given variance \(\sigma\).

Use design diagnosis to assess the ability of a Bayesian multilevel hierarchical to recover \(\sigma\).

Bonus: Are estimates of treatment effects more or less reliable than what you would get from a frequentist approach that interacts school IDs with treatment?

7 Experimental Design

Q 7.1

Randomization

100 students sign up to take part in an experiment. You want to measure the effect of immigration on social trust. Half your subjects are men and half are women and you believe gender is very predictive of social trust.

Your experiment involves varying whether a “native” or an “immigrant” facilitator instructs players in how to play a trust game. You have five native and five immigrant facilitators and you want them each to conduct one session with 10 subjects.

  • You are free to assign both subjects and facilitators to sessions. Propose an appropriate randomization strategy. Is it blocked? Clustered? Both?

  • Say now that subjects have already signed up for sessions. You can only assign facilitators to sessions, but you have access to the subject lists before you do so. Describe your optimal randomization strategy. Is it blocked? Clustered? Both?

Q 7.2

Heterogeneous propensities

Two of four units are going to be assigned to treatment. A researcher sets up a design in which subjects can decide for themselves the probability with which they receive a treatment. Requested propensities are as below.

Can you :

  1. List the set of admissible treatment allocations
  2. Describe a scheme for allocating subjects to treatment
  3. Calculate your estimate under each allocation
  4. Assess whether your estimate will be biased or not.
Requested propensity Y0 Y1
0.2 0 1
0.4 0 2
0.6 1 3
0.8 1 4

Q 7.3
Network Randomization

You have access to a network of all friendship links in a classroom. This is in the form of an \(n\times n\) adjacency matrix where a 0/1 means the row individual is / is not a friend of the column individual.

You want to provide political information to a set of students and see how much more likely it is that a student that you do not give information to receives the information if a friend is treated compared to the situation in which a friend of a friend is treated. So you want to be sure that some subjects have friends treated and some have only friends of friends treated. How would you assign treatment? How can you work out your treatment assignment probabilities?

Bonus: Generate data and diagnose the estimation strategy available using the interference package

Q 7.4
Re-randomization

You have 10 units that you want to assign to treatment and control. On each unit you have two covariates, each with a lopsided distribution (for example, log normally distributed) and each strongly associated with the outcome of interest.

You are worried that you will have a good chance of significant imbalance between covariates and are thinking about using a procedure in which you re-randomize in the event in which you have some poor balance (for this you need to define what you mean by poor balance, e.g. you might want that balance is in the lower quartile of possible imbalances).

Compare the following strategies in terms of (a) bias (b) RMSE and (c) coverage:

  1. Ignoring imbalances and using whatever randomization gets realized
  2. Set a rule and rerandomize if the rule is broke, then proceed as normal
  3. Randomize many times and select the set of randomizations that meet your rule. Then select from that set at random.

In case 3 you can make use of your knowledge of the randomization to generate assignment propensities and implement randomization inference.

See Morgan and Rubin

8 Design evaluation

Q 8.1

Covariates 1

Compare the power gains from two strategies:

  • Adding covariates in the analysis stage
  • Blocked randomization in the design stage

Q 8.2
Covariates 2

Equation (4.6) in Gerber and Green (2012) suggests that if the sum of two slopes exceeds 1 then there are gains in efficiency from adding a covariate. Show that this is true in practice using a design declaration.

Q 8.3
Covariates 3

Compare the performance of the Lin estimator an the doubly robust estimator discussed in slides

Q 8.4

Covariates 4

Sometimes researchers running an experiment look for imbalance on a covariate and then include the covariate as a control if and only if they see imbalance. Set up a design in which a covariate may or may not affect potential outcomes and assess the performance given different rules

  • no control
  • control as a function of correlations of covariates with outcomes
  • control as a function of correlations of covariates with outcomes in control only
  • control as a function of correlations of covariates with treatments
  • control regardless of correlations

9 Topics and techniques

Q 9.1

Spillovers

Imagine a study with three subjects. Each subject’s potential outcomes are as follows:

  • 0 if in control
  • \(n\) if in treatment when \(n\) subjects are assigned to treatment

So for example if one unit is in treatment that unit has outcome Y=1, and the others have Y = 0; if all 3 are in treatment they all have Y=3.

  1. Write down the potential outcomes for all possible assignments, including all and none assigned o treatment
  2. Write down the difference in means calculation for each possible assignment
  3. Define two estimands of interest.
  4. Define a randomization scheme and answer strategy that will return an unbiased estimate of each estimand.

Q 9.2
Mediation

Baron Kenny have provided a popular method to implement mediation analysis.

Declare a design with a mediation process and possible violations of sequential ignorability, governed by some parameters k.

Demonstrate under what conditions estimates using the Baron-Kenny procedure are misleading.

Q 9.3
Difference in Differences

DD 16.3 shows poor behavior of a multi period difference in differences design when there is effect heterogeneity.

Imai and Kim (2021) highlights the risk of negative weights in this setting. Can you recover the implied weights and identify which units contribute negatively to estimates?

Q 9.4

LATE

Which of these problems could be addressed using instrumental variables? In each case, what kinds of concern might you have about the IV strategy?

  1. Experimenters introduce a unconditional cash transfers into a set of villages in 2012 and use it to measure access school attendance in 2015. You come on the scene later and are interested in whether the transfer could have led to greater political participation in 2017.
  2. Experimenters introduce an unconditional cash transfers into a set of villages in 2012 and use it to measure access school attendance in 2015. You come on the scene later and are interested in whether the increased school attendance could have led to greater political participation in 2017.
  3. You want to understand the effects of attending a rally on subsequent support for a candidate. You send a random set of voters a flyer about an upcoming demonstration.
  4. You want to understand the effects of attending a rally on subsequent support for a candidate. You send a random set of voters a flyer about an upcoming demonstration but you find out later that your enumerators did not deliver the flyers in a bunch of areas.
  5. You want to understand the effects sending flyers about an upcoming demonstration but you find out later that your computer code used incomplete data when making assignments and so failed to assign treatment to a whole bunch of regions.
  6. You want to understand whether sending flyers increases participation because people actually go to the rallies or because people’s general level awareness of the election increases, whether or not they go.

10 Open science

No exercises

11 Additional puzzle assortment

Q 11.1
what weights

Say you sampled subject A with probability .6 and subject B with probability .4. Say you assigned each to treatment with probability .6. What weight should you put on A in your analysis if they end up in treatment? What if they end up in control? How about B?

Q 11.2
Priors matter

Generate a simple experimental design and estimate treatment effects using a (a) Bayesian regression with rstanarm. Provide informative priors (b) lm_robust

Let the size of the data increase from 10 to 1000 and: plot the estimates from the two approaches as \(N\) increases, plot the average standard error and the posterior variance from the two approaches as \(N\) increases.

12 References

Gerber, Alan S, and Donald P Green. 2012. Field Experiments: Design, Analysis, and Interpretation. Norton.
Imai, Kosuke, and In Song Kim. 2021. “On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data.” Political Analysis 29 (3): 405–15.
Lin, Winston. 2012. “Agnostic Notes on Regression Adjustments to Experimental Data: Reexamining Freedman’s Critique.” arXiv Preprint arXiv:1208.2301.