Causal Inference

Topics III

Macartan Humphreys

1 Topics 3

1.1 Mediation

1.1.1 The problem of unidentified mediators

  • Consider a causal system like the below.
  • The effect of X on M1 and M2 can be measured in the usual way.
  • But unfortunately, if there are multiple mediators, the effect of M1 (or M2) on Y is not identified.
  • The ‘exclusion restriction’ is obviously violated when there are multiple mediators (unless you can account for them all).

1.1.2 The problem of unidentified mediators

Which effects are identified by the random assignment of \(X\)?

1.1.3 The problem of unidentified mediators

An obvious approach is to first examine the (average) effect of X on M1 and then use another manipulation to examine the (average) effect of M1 on Y.

  • But both of these average effects may be positive (for example) even if there is no effect of X on Y through M1.

1.1.4 The problem of unidentified mediators

An obvious approach is to first examine the (average) effect of X on M1 and then use another manipulation to examine the (average) effect of M1 on Y.

  • Similarly both of these average effects may be zero even if X affects on Y through M1 for every unit!

1.1.5 The problem of unidentified mediators

Both instances of unobserved confounding between \(M\) and \(Y\):

1.1.6 The problem of unidentified mediators

Both instances of unobserved confounding between \(M\) and \(Y\):

1.1.7 The problem of unidentified mediators

  • Another somewhat obvious approach is to see how the effect of \(X\) on \(Y\) in a regression is reduced when you control for \(M\).

  • If the effect of \(X\) on \(Y\) passes through \(M\) then surely there should be no effect of \(X\) on \(Y\) after you control for \(M\).

  • This common strategy associated with Baron and Kenny (1986) is also not guaranteed to produce reliable results. See for instance Green, Ha, and Bullock (2010)

1.1.8 Baron Kenny issues

df <- fabricate(N = 1000, 
                U = rbinom(N, 1, .5),     X = rbinom(N, 1, .5),
                M = ifelse(U==1, X, 1-X), Y = ifelse(U==1, M, 1-M)) 
            
list(lm(Y ~ X, data = df), 
     lm(Y ~ X + M, data = df)) |> texreg::htmlreg() 
Statistical models
  Model 1 Model 2
(Intercept) 0.00*** 0.00***
  (0.00) (0.00)
X 1.00*** 1.00***
  (0.00) (0.00)
M   0.00
    (0.00)
R2 1.00 1.00
Adj. R2 1.00 1.00
Num. obs. 1000 1000
***p < 0.001; **p < 0.01; *p < 0.05

1.1.9 The problem of unidentified mediators

  • See Imai on better ways to think about this problem and designs to address it.

1.1.10 The problem of unidentified mediators: Quantities

  • Using potential outcomeswe can describe a mediation effect as (see Imai et al): \[\delta_i(t) = Y_i(t, M_i(1)) - Y_i(t, M_i(0)) \textbf{ for } t = 0,1\]
  • The direct effect is: \[\psi_i(t) = Y_i(1, M_i(t)) - Y_i(0, M_i(t)) \textbf{ for } t = 0,1\]
  • This is a decomposition, since: \[Y_i(1, M_i(1)) - Y_1(0, M_i(0)) = \frac{1}{2}(\delta_i(1) + \delta_i(0) + \psi_i(1) + \psi_i(0)) \]
  • If there are no interaction effects—ie \(\delta_i(1) = \delta_i(0), \psi_i(1) = \psi_i(0)\), then \[Y_i(1, M_i(1)) - Y_1(0, M_i(0)) = \delta_i + \psi_i\]

1.1.11 The problem of unidentified mediators: Solutions?

The bad news is that although a single experiment might identify the total effect, it can not identify these elements of the direct effect.

So:

  • Check formal requirement for identification under single experiment design (“sequential ignorability”—that, conditional on actual treatment, it is as if the value of the mediation variable is randomly assigned relative to potential outcomes). But this is strong (and in fact unverifiable) and if it does not hold, bounds on effects always include zero (Imai et al)

  • Consider sensitivity analyses

1.1.12 Implicit mediation

You can use interactions with covariates if you are willing to make assumptions on no heterogeneity of direct treatment effects over covariates.

eg you think that money makes people get to work faster because they can buy better cars; you look at the marginal effect of more money on time to work for people with and without cars and find it higher for the latter.

This might imply mediation through transport but only if there is no direct effect heterogeneity (eg people with cars are less motivated by money).

1.1.13 The problem of unidentified mediators: Solutions?

Weaker assumptions justify parallel design

  • Group A: \(T\) is randomly assigned, \(M\) left free.
  • Group B: divided into four groups \(T\times M\) (requires two more assumptions (1) that the manipulation of the mediator only affects outcomes through the mediator (2) no interaction, for each unit, \(Y(1,m)-Y(0,m) = Y(1,m')-Y(0,m')\).)

Takeaway: Understanding mechanisms is harder than you think. Figure out what assumptions fly.

1.1.14 In CausalQueries

Lets imagine that sequential ignorability does not hold. What are our posteriors on mediation quantities when in fact all effects are mediated, effects are strong, and we have lots of data?

model <- make_model("X -> M ->Y <- X; M <-> Y")

plot(model)

1.1.15 In CausalQueries

We imagine a true model and consider estimands:

truth <- make_model("X -> M ->Y") |> 
  set_parameters(c(.5, .5, .1, 0, .8, .1, .1, 0, .8, .1))

queries  <- 
  list(
      indirect = "Y[X = 1, M = M[X=1]] - Y[X = 1, M = M[X=0]]",
      direct = "Y[X = 1, M = M[X=0]] - Y[X = 0, M = M[X=0]]"
      )

truth |> query_model(queries) |> kable()
label query given using case_level mean sd cred.low cred.high
indirect Y[X = 1, M = M[X=1]] - Y[X = 1, M = M[X=0]] - parameters FALSE 0.64 NA 0.64 0.64
direct Y[X = 1, M = M[X=0]] - Y[X = 0, M = M[X=0]] - parameters FALSE 0.00 NA 0.00 0.00

1.1.16 In CausalQueries

model |> update_model(data = truth |> make_data(n = 1000)) |>
  query_distribution(queries = queries, using = "posteriors") 
Error in if (parent_nodes == "") {: argument is of length zero

Why such poor behavior? Why isn’t weight going onto indirect effects?

Turns out the data is consistent with direct effects only: specifically that whenever \(M\) is responsive to \(X\), \(Y\) is responsive to \(X\).

1.1.17 In CausalQueries

Error in if (parent_nodes == "") {: argument is of length zero

1.2 Spillovers

1.2.1 SUTVA violations (Spillovers)

Spillovers can result in the estimation of weaker effects when effects are actually stronger.

The key problem is that \(Y(1)\) and \(Y(0)\) are not sufficient to describe potential outcomes

1.2.2 SUTVA violations

Unit Location \(D_\emptyset\) \(y(D_\emptyset)\) \(D_1\) \(y(D_1)\) \(D_2\) \(y(D_2)\) \(D_3\) \(y(D_3)\) \(D_4\) \(y(D_4)\)
A 1 0 0 1 3 0 1 0 0 0 0
B 2 0 0 0 3 1 3 0 3 0 0
C 3 0 0 0 0 0 3 1 3 0 3
D 4 0 0 0 0 0 0 0 1 1 3

Table: Potential outcomes for four units for different treatment profiles. \(D_i\) is an allocation and \(y_j(D_i)\) is the potential outcome for (row) unit \(j\) given (column) \(D_i\).

  • The key is to think through the structure of spillovers.
  • Here immediate neighbors are exposed
  • In this case we can define a direct treatment (being exposed) and an indirect treatment (having a neighbor exposed) and we can work out the propensity for each unit of receiving each type of treatment
  • These may be non uniform (here central types are more likely to have teated neighbors); but we can still use the randomization to assess effects

1.2.3 SUTVA violations

0 1 2 3 4
Unit Location \(D_\emptyset\) \(y(D_\emptyset)\) \(D_1\) \(y(D_1)\) \(D_2\) \(y(D_2)\) \(D_3\) \(y(D_3)\) \(D_4\) \(y(D_4)\)
A 1 0 0 1 3 0 1 0 0 0 0
B 2 0 0 0 3 1 3 0 3 0 0
C 3 0 0 0 0 0 3 1 3 0 3
D 4 0 0 0 0 0 0 0 1 1 3
\(\bar{y}_\text{treated}\) - 3 3 3
\(\bar{y}_\text{untreated}\) 0 1 4/3 4/3
\(\bar{y}_\text{neighbors}\) - 3 2 2
\(\bar{y}_\text{pure control}\) 0 0 0 0
ATT-direct - 3 3 3
ATT-indirect - 3 2 2

1.2.4 Design

dgp <- function(i, Z, G) Z[i]/3 + sum(Z[G == G[i]])^2/5 + rnorm(1)

spillover_design <- 

  declare_model(G = add_level(N = 80), 
                     j = add_level(N = 3, zeros = 0, ones = 1)) +
  
  declare_inquiry(direct = mean(sapply(1:240,  # just i treated v no one treated 
    function(i) { Z_i <- (1:240) == i
                  dgp(i, Z_i, G) - dgp(i, zeros, G)}))) +
  
  declare_inquiry(indirect = mean(sapply(1:240, 
    function(i) { Z_i <- (1:240) == i           # all but i treated v no one treated   
                  dgp(i, ones - Z_i, G) - dgp(i, zeros, G)}))) +
  
  declare_assignment(Z = complete_ra(N)) + 
  
  declare_measurement(
    neighbors_treated = sapply(1:N, function(i) sum(Z[-i][G[-i] == G[i]])),
    one_neighbor  = as.numeric(neighbors_treated == 1),
    two_neighbors = as.numeric(neighbors_treated == 2),
    Y = sapply(1:N, function(i) dgp(i, Z, G))
  ) +
  
  declare_estimator(Y ~ Z, 
                    inquiry = "direct", 
                    model = lm_robust, 
                    label = "naive") +
  
  declare_estimator(Y ~ Z * one_neighbor + Z * two_neighbors,
                    term = c("Z", "two_neighbors"),
                    inquiry = c("direct", "indirect"), 
                    label = "saturated", 
                    model = lm_robust)

1.2.5 Spillovers: direct and indirect treatments

1.2.6 Spillovers: Simulated estimates

1.2.7 Spillovers: Opportunities and Warnings

You can in principle:

  • debias estimates
  • learn about interesting processes
  • optimize design parameters

But to estimate effects you still need some SUTVA like assumption.

1.2.8 Spillovers: Opportunities and Warnings

In this example if one compared the outcome between treated units and all control units that are at least \(n\) positions away from a treated unit you will get the wrong answer unless \(n \geq 7\).

1.3 Transparency & Experimentation

1.3.1 Transparent workflows

Experimental researchers are deeply engaged in the movement towards more transparency social science research.

  • Analytic replication. This should be a no brainer. Set everything up so that replication is easy. Use quarto rmarkdown, or similar. Or produce your replication code as a package.

1.3.2 Contentious Issues

Experimental researchers are deeply engaged in the movement towards more transparency social science research.

Contentious issues (mostly):

  • Data. How soon should you make your data available? My view: as soon as possibe. Along with working papers and before publication. Before it affects policy in any case. Own the ideas not the data.

    • Hard core: no citation without (analytic) replication. Perhaps. Non-replicable results should not be influencing policy.
  • Where should you make your data available? Dataverse is focal for political science. Not personal website (mea culpa)

  • What data should you make available? Disagreement is over how raw your data should be. My view: as raw as you can but at least post cleaning and pre-manipulation.

1.3.3 Open science checklist

Experimental researchers are deeply engaged in the movement towards more transparency social science research.

  • Should you register?: Hard to find reasons against. But case strongest in testing phase rather than exploratory phase.

  • Registration: When should you register? My view: Before treatment assignment. (Not just before analysis, mea culpa)

  • Registration: Should you deviate from a preanalysis plan if you change your mind about optimal estimation strategies. My view: Yes, but make the case and describe both sets of results.

1.3.4 Two distinct rationales for registration

  • File drawer bias (Publication bias)

  • Analysis bias (Fishing)

1.3.5 File drawer bias

– Say in truth \(X\) affects \(Y\) in 50% of cases.

– Researchers conduct multiple excellent studies. But they only write up the 50% that produce “positive” results.

– Even if each individual study is indisputably correct, the account in the research record – that X affects Y in 100% of cases – will be wrong.

1.3.6 File drawer bias

– Say in truth \(X\) affects \(Y\) in 50% of cases.

– Researchers conduct multiple excellent studies. But they only write up the 50% that produce “positive” results.

– Even if each individual study is indisputably correct, the account in the research record – that X affects Y in 100% of cases – will be wrong.

1.3.7 File drawer bias

Exacerbated by:

– Publication bias – the positive results get published

– Citation bias – the positive results get read and cited

– Chatter bias – the positive results gets blogged, tweeted and TEDed.

1.3.8 Analysis bias (Fishing)

– Say in truth \(X\) affects \(Y\) in 50% of cases.

– But say that researchers enjoy discretion to select measures for \(X\) or \(Y\), or enjoy discretion to select statistical models after seeing \(X\) and \(Y\) in each case.

– Then, with enough discretion, 100% of analyses may report positive effects, even if all studies get published.

1.3.9 Analysis bias (Fishing)

– Say in truth \(X\) affects \(Y\) in 50% of cases.

– But say that researchers enjoy discretion to select measures for \(X\) or \(Y\), or enjoy discretion to select statistical models after seeing \(X\) and \(Y\) in each case.

– Then, with enough discretion, 100% of analyses may report positive effects, even if all studies get published.

1.3.10 Analysis bias (Fishing)

– Try the exact fishy test An Exact Fishy Test (https://macartan.shinyapps.io/fish/)

– What’s the problem with this test?

1.3.11 Evidence-Proofing: Illustration

  • When your conclusions do not really depend on the data

  • Eg – some evidence will always support your proposition – some interpretation of evidence will always support your proposition

  • Knowing the mapping from data to inference in advance gives a handle on the false positive rate.

1.3.12 The scope for fishing

1.3.13 Evidence from political science

Source: Gerber and Malhotra

1.3.14 More evidence from TESS

  • Malhotra tracked 221 TESS studies.
  • 20% of the null studies were published. 65% not even written up (file drawer or anticipation of publication bias)
  • 60% of studies with strong results were published.

Implications are:

  • population of results not representative
  • (subtler) individual published studies are also more likely to be overestimates

1.3.15 The problem

  • Summary: we do not know when we can or cannot trust claims made by researchers.

  • [Not a tradition specific claim]

1.3.16 Registration as a possible solution

Simple idea:

  • It’s about communication:
  • just say what you are planning on doing before you do it
  • if you don’t have a plan, say that
  • If you do things differently from what you were planning to do, say that

1.3.17 Worries and Myths

Lots of misunderstandings around registration

1.3.18 Myth: Concerns about fishing presuppose researcher dishonesty

  • Fishing can happen in very subtle ways, and may seem natural and justifiable.

  • Example:

    • I am interested in whether more democratic institutions result in better educational outcomes.
    • I examine the relationship between institutions and literacy and between institutions and school attendance.
    • The attendance measure is significant and the literacy one is not. Puzzled, I look more carefully at the literacy measure and see various outliers and indications of measurement error. As I think more I realize too that literacy is a slow moving variable and may not be the best measure anyhow. I move forward and start to analyze the attendance measure only, perhaps conducting new tests, albeit with the same data.

1.3.19 Structural challenge

Our journal review process is largely organized around advising researchers how to adjust analysis in light of findings in the data.

1.3.20 Myth: Fishing is technique specific

  • Frequentists can do it

  • Bayesians can do it too.

  • Qualitative researchers can also do it.

  • You can even do it with descriptive statistics

1.3.21 Myth: Fishing is estimand specific

  • You can do it when estimating causal effects
  • You can do it when studying mechanisms
  • You can do it when estimating counts

1.3.22 Myth: Registration only makes sense for experimental studies, not for observational studies

  • The key distinction is between prospective and retrospective studies.

  • Not between experimental and observational studies.

  • A reason (from the medical literature) why registration is especially important for experiments: because you owe it to subjects

  • A reason why registration is less important for experiments: because it is more likely that the intended analysis is implied by the design in an experimental study. Researcher degrees of freedom may be greatest for observational qualitative analyses.

1.3.23 Worry: Registration will create administrative burdens for researchers, reviewers, and journals

  • Registration will produce some burden but does not require the creation of content that is not needed anyway

  • It does shift preparation of analyses forward

  • And it also can increase the burden of developing analyses plans even for projects that don’t work. But that is in part, the point.

  • Upside is that ultimate analyses may be much easier.

1.3.24 Worry: Registration will force people to implement analyses that they know are wrong

  • Most arguments for registration in social science advocate for non-binding registration, where deviations from designs are possible, though they should be described.
  • Even if it does not prevent them, a merit of registration is that it makes deviations visible.

1.3.25 Myth: Replication (or other transparency practices) obviates the need for registration

  • There are lots of good things to do, including replication.
  • Many of these do not substitute for each other. (How to interpret a fished replication of a fished analysis?)
  • And they may likely act as complements
  • Registration can clarify details of design and analysis and ensure early preparation of material. Indeed material needed for replication may be available even before data collection

1.3.26 Worry: Registration will put researchers at risk of scooping

  • But existing registries allow people to protect registered designs for some period
  • Registration may let researchers lay claim to a design

1.3.27 Worry: Registration will kill creativity

  • This is an empirical question. However, under a nonmandatory system researchers could:
  • Register a plan for structured exploratory analysis
  • Decide that exploration is at a sufficiently early stage that no substantive registration is possible and proceed without registration.

1.3.28 Implications:

  • In neither case would the creation of a registration facility prevent exploration.

  • What it might do is make it less credible for someone to claim that they have tested a proposition when in fact the proposition was developed using the data used to test it.

  • Registration communicates when researchers are angage in exploration or not. We love exploration and should be proud of it.

1.3.29 Punchline

  • Do it!
  • But if you have reasons to deviate, deviate transparently
  • Don’t implement bad analysis just because you pre-registered
  • Instead: reconcile

1.3.30 Reconciliation

Incentives and strategies

1.3.31 Reconciliation

Table 1: Illustration of an inquiry reconciliation table.
Inquiry In the preanalysis plan In the paper In the appendix
Gender effect X X
Age effect X

1.3.32 Reconciliation

Table 2: Illustration of an answer strategy reconciliation table.
Inquiry Following A from the PAP Following A from the paper Notes
Gender effect estimate = 0.6, s.e = 0.31 estimate = 0.6, s.e = 0.25 Difference due to change in control variables [provide cross references to tables and code]
Baron, Reuben M, and David A Kenny. 1986. “The Moderator–Mediator Variable Distinction in Social Psychological Research: Conceptual, Strategic, and Statistical Considerations.” Journal of Personality and Social Psychology 51 (6): 1173.
Green, Donald P, Shang E Ha, and John G Bullock. 2010. “Enough Already about ‘Black Box’ Experiments: Studying Mediation Is More Difficult Than Most Scholars Suppose.” The Annals of the American Academy of Political and Social Science 628 (1): 200–208.