Experiments

Topics

Macartan Humphreys

1 Noncompliance and the LATE estimand

1.1 Local Average Treatment Effects

Sometimes you give a medicine but only a nonrandom sample of people actually try to use it. Can you still estimate the medicine’s effect?

X=0 X=1
Z=0 \(\overline{y}_{00}\) (\(n_{00}\)) \(\overline{y}_{01}\) (\(n_{01}\))
Z=1 \(\overline{y}_{10}\) (\(n_{10}\)) \(\overline{y}_{11}\) (\(n_{11}\))

Say that people are one of 3 types:

  1. \(n_a\) “always takers” have \(X=1\) no matter what and have average outcome \(\overline{y}_a\)
  2. \(n_n\) “never takers” have \(X=0\) no matter what with outcome \(\overline{y}_n\)
  3. \(n_c\) “compliers have” \(X=Z\) and average outcomes \(\overline{y}^1_c\) if treated and \(\overline{y}^0_c\) if not.

1.2 Local Average Treatment Effects

Sometimes you give a medicine but only a non random sample of people actually try to use it. Can you still estimate the medicine’s effect?

X=0 X=1
Z=0 \(\overline{y}_{00}\) (\(n_{00}\)) \(\overline{y}_{01}\) (\(n_{01}\))
Z=1 \(\overline{y}_{10}\) (\(n_{10}\)) \(\overline{y}_{11}\) (\(n_{11}\))

We can figure something about types:

\(X=0\) \(X=1\)
\(Z=0\) \(\frac{\frac{1}{2}n_c}{\frac{1}{2}n_c + \frac{1}{2}n_n} \overline{y}^0_{c}+\frac{\frac{1}{2}n_n}{\frac{1}{2}n_c + \frac{1}{2}n_n} \overline{y}_{n}\) \(\overline{y}_{a}\)
\(Z=1\) \(\overline{y}_{n}\) \(\frac{\frac{1}{2}n_c}{\frac{1}{2}n_c + \frac{1}{2}n_a} \overline{y}^1_{c}+\frac{\frac{1}{2}n_a}{\frac{1}{2}n_c + \frac{1}{2}n_a} \overline{y}_{a}\)

1.3 Local Average Treatment Effects

You give a medicine to 50% but only a non random sample of people actually try to use it. Can you still estimate the medicine’s effect?

\(X=0\) \(X=1\)
\(Z=0\) \(\frac{n_c}{n_c + n_n} \overline{y}^0_{c}+\frac{n_n}{n_c + n_n} \overline{y}_n\) \(\overline{y}_{a}\)
(n) (\(\frac{1}{2}(n_c + n_n)\)) (\(\frac{1}{2}n_a\))
\(Z=1\) \(\overline{y}_{n}\) \(\frac{n_c}{n_c + n_a} \overline{y}^1_{c}+\frac{n_a}{n_c + n_a} \overline{y}_{a}\)
(n) (\(\frac{1}{2}n_n\)) (\(\frac{1}{2}(n_a+n_c)\))

Key insight: the contributions of the \(a\)s and \(n\)s are the same in the \(Z=0\) and \(Z=1\) groups so if you difference you are left with the changes in the contributions of the \(c\)s.

1.4 Local Average Treatment Effects

Average in \(Z=0\) group: \(\frac{{n_c} \overline{y}^0_{c}+ \left(n_{n}\overline{y}_{n} +{n_a} \overline{y}_a\right)}{n_a+n_c+n_n}\)

Average in \(Z=1\) group: \(\frac{{n_c} \overline{y}^1_{c} + \left(n_{n}\overline{y}_{n} +{n_a} \overline{y}_a \right)}{n_a+n_c+n_n}\)

So, the difference is the ITT: \(({\overline{y}^1_c-\overline{y}^0_c})\frac{n_c}{n}\)

Last step:

\[ITT = ({\overline{y}^1_c-\overline{y}^0_c})\frac{n_c}{n}\]

\[\leftrightarrow\]

\[LATE = \frac{ITT}{\frac{n_c}{n}}= \frac{\text{Intent to treat effect}}{\text{First stage effect}}\]

1.5 The good and the bad of LATE

  • You get a well-defined estimate even when there is non-random take-up
  • May sometimes be used to assess mediation or knock-on effects
  • But:
    • You need assumptions (monotonicity and the exclusion restriction – where were these used above?)
    • Your estimate is only for a subpopulation
    • The subpopulation is not chosen by you and is unknown
    • Different encouragements may yield different estimates since they may encourage different subgroups

1.6 Pearl and Chickering again

With and without an imposition of monotonicity

data("lipids_data")

models <- 
  list(unrestricted =  make_model("Z -> X -> Y; X <-> Y"),
       restricted =  make_model("Z -> X -> Y; X <-> Y") |>
         set_restrictions("X[Z=1] < X[Z=0]")) |> 
  lapply(update_model,  data = lipids_data, refresh = 0) 

models |>
  query_model(query = list(CATE = "Y[X=1] - Y[X=0]", 
                           Nonmonotonic = "X[Z=1] < X[Z=0]"),
              given = list("X[Z=1] > X[Z=0]", TRUE),
              using = "posteriors") 

1.7 Pearl and Chickering again

With and without an imposition of monotonicity:

model query mean sd
unrestricted CATE 0.70 0.05
restricted CATE 0.71 0.05
unrestricted Nonmonotonic 0.01 0.01
restricted Nonmonotonic 0.00 0.00

In one case we assume monotonicity, in the other we update on it (easy in this case because of the empirically verifiable nature of one sided non compliance)

2 Survey experiments

  • Survey experiments are used to measure things: nothing (except answers) should be changed!
  • If the experiment in the survey is changing things then it is a field experiment in a survey, not a survey experiment

2.1 The list experiment: Motivation

  • Multiple survey experimental designs have been generated to make it easier for subjects to answer sensitive questions

  • The key idea is to use inference rather than measurement.

  • Subjects are placed in different conditions and the conditions affect the answers that are given in such a way that you can infer some underlying quantity of interest

2.2 The list experiment: Motivation

This is an obvious DAG but the main point is to be clear that the Value is the quantity of interest and the value is not affected by the treatment, Z.

2.3 The list experiment: Motivation

The list experiment supposes that:

  1. Subjects do not want to give a direct answer to a question
  2. They nevertheless are willing to truthfully answer an indirect question

In other words: sensitivities notwithstanding, they are happy for the researcher to make correct inferences about them or their group

2.4 The list experiment: Strategy

  • Respondents are given a short list and a long list.

  • The long list differs from the short list in having one extra item—the sensitive item

  • We ask how many items in each list does a respondent agree with:

    • \(Y_i(0)\) is the number of elements on a short list that a respondent agrees with
    • \(Y_i(1)\) is the number of elements on a long list that a respondent agrees with
    • \(Y_i(1) - Y_i(0)\) is an indicator for whether an individual agrees with the sensitive item
    • \(\mathbb{E}[Y_i(1) - Y_i(0)]\) is the share of people agreeing with sensitive item

2.5 The list experiment: Simplified example

How many of these do you agree with:

Short list Long list “Effect”
“2 + 2 = 4” “2 + 2 = 4”
“2 * 3 = 6” “2 * 3 = 6”
“3 + 6 = 8” “Climate change is real”
“3 + 6 = 8”
Answer Y(0) = 2 Y(1) = 4 Y(1) - Y(0) = 2

[Note: this is obviously not a good list. Why not?]

2.6 The list experiment: Design

declaration_17.3 <-
  declare_model(
    N = 500,
    control_count = rbinom(N, size = 3, prob = 0.5),
    Y_star = rbinom(N, size = 1, prob = 0.3),
    potential_outcomes(Y_list ~ Y_star * Z + control_count) 
  ) +
  declare_inquiry(prevalence_rate = mean(Y_star)) +
  declare_assignment(Z = complete_ra(N)) + 
  declare_measurement(Y_list = reveal_outcomes(Y_list ~ Z)) +
  declare_estimator(Y_list ~ Z, .method = difference_in_means, 
                    inquiry = "prevalence_rate")

diagnosands <- declare_diagnosands(
  bias = mean(estimate - estimand),
  mean_CI_width = mean(conf.high - conf.low)
)

2.7 Diagnosis

diagnose_design(declaration_17.3, diagnosands = diagnosands)
Design Inquiry Bias Mean CI Width
declaration_17.3 prevalence_rate 0.00 0.32
(0.00) (0.00)

2.8 Tradeoffs: is the question really sensitive?

declaration_17.4 <- 
  declare_model(
    N = N,
    U = rnorm(N),
    control_count = rbinom(N, size = 3, prob = 0.5),
    Y_star = rbinom(N, size = 1, prob = 0.3),
    W = case_when(Y_star == 0 ~ 0L,
                  Y_star == 1 ~ rbinom(N, size = 1, prob = proportion_hiding)),
    potential_outcomes(Y_list ~ Y_star * Z + control_count)
  ) +
  declare_inquiry(prevalence_rate = mean(Y_star)) +
  declare_assignment(Z = complete_ra(N)) + 
  declare_measurement(Y_list = reveal_outcomes(Y_list ~ Z),
                      Y_direct = Y_star - W) +
  declare_estimator(Y_list ~ Z, inquiry = "prevalence_rate", label = "list") + 
  declare_estimator(Y_direct ~ 1, inquiry = "prevalence_rate", label = "direct")

2.9 Diagnosis

declaration_17.4 |> 
  redesign(proportion_hiding = seq(from = 0, to = 0.3, by = 0.1), 
           N = seq(from = 500, to = 2500, by = 500)) |> 
  diagnose_design()

2.10 Negatively correlated items

  • How would estimates be affected if the items selected for the list were negatively correlated?
  • How would subject protection be affected?

2.11 Negatively correlated items

rho <- -.8 

correlated_lists <- 
  declare_model(
    N = 500,
    U = rnorm(N),
    control_1 = rbinom(N, size = 1, prob = 0.5),
    control_2 = correlate(given = control_1, rho = rho, draw_binary, prob = 0.5),
    control_count = control_1 + control_2,
    Y_star = rbinom(N, size = 1, prob = 0.3),
    potential_outcomes(Y_list ~ Y_star * Z + control_count)
  ) +
  declare_inquiry(prevalence_rate = mean(Y_star)) +
  declare_assignment(Z = complete_ra(N)) + 
  declare_measurement(Y_list = reveal_outcomes(Y_list ~ Z)) +
  declare_estimator(Y_list ~ Z) 

2.12 Negatively correlated items

draw_data(correlated_lists) |> ggplot(aes(control_count)) + 
  geom_histogram() + theme_bw()

2.13 Negatively correlated items

correlated_lists |> redesign(rho = c(-.8, 0, .8)) |> diagnose_design()
  • These trade-off against each other: the more accuracy you have the less protection you have

2.14 Individual or group effects?

  • This is typically used to estimate average levels

  • However you can use it in the obvious way to get average levels for groups: this is equivalent to calculating group level heterogeneous effects

  • Extending the idea you can even get individual level estimates: for instance you might use causal forests

  • You can also use this to estimate the effect of an experimental treatment on an item that’s measured using a list, without requiring individual level estimates:

\[Y_i = \beta_0 + \beta_1Z_i + \beta_2Long_i + \beta_3Z_iLong_i\]

2.15 Hiders and liars

  • Note that here we looked at “hiders” – people not answering the direct question truthfully

  • See Li (2019) on bounds when the “no liars” assumption is threatened — this is about whether people respond truthfully to the list experimental question

3 Mediation

3.1 The problem of unidentified mediators

  • Consider a causal system like the below.
  • The effect of X on M1 and M2 can be measured in the usual way.
  • But unfortunately, if there are multiple mediators, the effect of M1 (or M2) on Y is not identified.
  • The ‘exclusion restriction’ is obviously violated when there are multiple mediators (unless you can account for them all).

3.2 The problem of unidentified mediators

Which effects are identified by the random assignment of \(X\)?

3.3 The problem of unidentified mediators

An obvious approach is to first examine the (average) effect of X on M1 and then use another manipulation to examine the (average) effect of M1 on Y.

  • But both of these average effects may be positive (for example) even if there is no effect of X on Y through M1.

3.4 The problem of unidentified mediators

An obvious approach is to first examine the (average) effect of X on M1 and then use another manipulation to examine the (average) effect of M1 on Y.

  • Similarly both of these average effects may be zero even if X affects on Y through M1 for every unit!

3.5 The problem of unidentified mediators

Both instances of unobserved confounding between \(M\) and \(Y\):

3.6 The problem of unidentified mediators

Both instances of unobserved confounding between \(M\) and \(Y\):

3.7 The problem of unidentified mediators

  • Another somewhat obvious approach is to see how the effect of \(X\) on \(Y\) in a regression is reduced when you control for \(M\).

  • If the effect of \(X\) on \(Y\) passes through \(M\) then surely there should be no effect of \(X\) on \(Y\) after you control for \(M\).

  • This common strategy associated with Baron and Kenny (1986) is also not guaranteed to produce reliable results. See for instance Green, Ha, and Bullock (2010)

3.8 Baron Kenny issues

df <- fabricate(N = 1000, 
                U = rbinom(N, 1, .5),     X = rbinom(N, 1, .5),
                M = ifelse(U==1, X, 1-X), Y = ifelse(U==1, M, 1-M)) 
            
list(lm(Y ~ X, data = df), 
     lm(Y ~ X + M, data = df)) |> texreg::htmlreg() 
Statistical models
  Model 1 Model 2
(Intercept) 0.00*** 0.00***
  (0.00) (0.00)
X 1.00*** 1.00***
  (0.00) (0.00)
M   0.00
    (0.00)
R2 1.00 1.00
Adj. R2 1.00 1.00
Num. obs. 1000 1000
***p < 0.001; **p < 0.01; *p < 0.05

3.9 The problem of unidentified mediators

  • See Imai on better ways to think about this problem and designs to address it.

3.10 The problem of unidentified mediators: Quantities

  • Using potential outcomeswe can describe a mediation effect as (see Imai et al): \[\delta_i(t) = Y_i(t, M_i(1)) - Y_i(t, M_i(0)) \textbf{ for } t = 0,1\]
  • The direct effect is: \[\psi_i(t) = Y_i(1, M_i(t)) - Y_i(0, M_i(t)) \textbf{ for } t = 0,1\]
  • This is a decomposition, since: \[Y_i(1, M_i(1)) - Y_1(0, M_i(0)) = \frac{1}{2}(\delta_i(1) + \delta_i(0) + \psi_i(1) + \psi_i(0)) \]
  • If there are no interaction effects—ie \(\delta_i(1) = \delta_i(0), \psi_i(1) = \psi_i(0)\), then \[Y_i(1, M_i(1)) - Y_1(0, M_i(0)) = \delta_i + \psi_i\]

3.11 The problem of unidentified mediators: Solutions?

The bad news is that although a single experiment might identify the total effect, it can not identify these elements of the direct effect.

So:

  • Check formal requirement for identification under single experiment design (“sequential ignorability”—that, conditional on actual treatment, it is as if the value of the mediation variable is randomly assigned relative to potential outcomes). But this is strong (and in fact unverifiable) and if it does not hold, bounds on effects always include zero (Imai et al)

  • Consider sensitivity analyses

3.12 Implicit mediation

You can use interactions with covariates if you are willing to make assumptions on no heterogeneity of direct treatment effects over covariates.

eg you think that money makes people get to work faster because they can buy better cars; you look at the marginal effect of more money on time to work for people with and without cars and find it higher for the latter.

This might imply mediation through transport but only if there is no direct effect heterogeneity (eg people with cars are less motivated by money).

3.13 The problem of unidentified mediators: Solutions?

Weaker assumptions justify parallel design

  • Group A: \(T\) is randomly assigned, \(M\) left free.
  • Group B: divided into four groups \(T\times M\) (requires two more assumptions (1) that the manipulation of the mediator only affects outcomes through the mediator (2) no interaction, for each unit, \(Y(1,m)-Y(0,m) = Y(1,m')-Y(0,m')\).)

Takeaway: Understanding mechanisms is harder than you think. Figure out what assumptions fly.

3.14 In CausalQueries

Lets imagine that sequential ignorability does not hold. What are our posteriors on mediation quantities when in fact all effects are mediated, effects are strong, and we have lots of data?

model <- make_model("X -> M ->Y <- X; M <-> Y")

plot(model)

3.15 In CausalQueries

We imagine a true model and consider estimands:

truth <- make_model("X -> M ->Y") |> 
  set_parameters(c(.5, .5, .1, 0, .8, .1, .1, 0, .8, .1))

queries  <- 
  list(
      indirect = "Y[X = 1, M = M[X=1]] - Y[X = 1, M = M[X=0]]",
      direct = "Y[X = 1, M = M[X=0]] - Y[X = 0, M = M[X=0]]"
      )

truth |> query_model(queries) |> kable()
label query given using case_level mean sd cred.low cred.high
indirect Y[X = 1, M = M[X=1]] - Y[X = 1, M = M[X=0]] - parameters FALSE 0.64 NA 0.64 0.64
direct Y[X = 1, M = M[X=0]] - Y[X = 0, M = M[X=0]] - parameters FALSE 0.00 NA 0.00 0.00

3.16 In CausalQueries

model |> update_model(data = truth |> make_data(n = 1000)) |>
  query_distribution(queries = queries, using = "posteriors") 
Error in if (parent_nodes == "") {: argument is of length zero

Why such poor behavior? Why isn’t weight going onto indirect effects?

Turns out the data is consistent with direct effects only: specifically that whenever \(M\) is responsive to \(X\), \(Y\) is responsive to \(X\).

3.17 In CausalQueries

Error in if (parent_nodes == "") {: argument is of length zero

4 Spillovers

4.1 SUTVA violations (Spillovers)

Spillovers can result in the estimation of weaker effects when effects are actually stronger.

The key problem is that \(Y(1)\) and \(Y(0)\) are not sufficient to describe potential outcomes

4.2 SUTVA violations

Unit Location \(D_\emptyset\) \(y(D_\emptyset)\) \(D_1\) \(y(D_1)\) \(D_2\) \(y(D_2)\) \(D_3\) \(y(D_3)\) \(D_4\) \(y(D_4)\)
A 1 0 0 1 3 0 1 0 0 0 0
B 2 0 0 0 3 1 3 0 3 0 0
C 3 0 0 0 0 0 3 1 3 0 3
D 4 0 0 0 0 0 0 0 1 1 3

Table: Potential outcomes for four units for different treatment profiles. \(D_i\) is an allocation and \(y_j(D_i)\) is the potential outcome for (row) unit \(j\) given (column) \(D_i\).

  • The key is to think through the structure of spillovers.
  • Here immediate neighbors are exposed
  • In this case we can define a direct treatment (being exposed) and an indirect treatment (having a neighbor exposed) and we can work out the propensity for each unit of receiving each type of treatment
  • These may be non uniform (here central types are more likely to have teated neighbors); but we can still use the randomization to assess effects

4.3 SUTVA violations

0 1 2 3 4
Unit Location \(D_\emptyset\) \(y(D_\emptyset)\) \(D_1\) \(y(D_1)\) \(D_2\) \(y(D_2)\) \(D_3\) \(y(D_3)\) \(D_4\) \(y(D_4)\)
A 1 0 0 1 3 0 1 0 0 0 0
B 2 0 0 0 3 1 3 0 3 0 0
C 3 0 0 0 0 0 3 1 3 0 3
D 4 0 0 0 0 0 0 0 1 1 3
\(\bar{y}_\text{treated}\) - 3 3 3
\(\bar{y}_\text{untreated}\) 0 1 4/3 4/3
\(\bar{y}_\text{neighbors}\) - 3 2 2
\(\bar{y}_\text{pure control}\) 0 0 0 0
ATT-direct - 3 3 3
ATT-indirect - 3 2 2

4.4 Design

dgp <- function(i, Z, G) Z[i]/3 + sum(Z[G == G[i]])^2/5 + rnorm(1)

spillover_design <- 

  declare_model(G = add_level(N = 80), 
                     j = add_level(N = 3, zeros = 0, ones = 1)) +
  
  declare_inquiry(direct = mean(sapply(1:240,  # just i treated v no one treated 
    function(i) { Z_i <- (1:240) == i
                  dgp(i, Z_i, G) - dgp(i, zeros, G)}))) +
  
  declare_inquiry(indirect = mean(sapply(1:240, 
    function(i) { Z_i <- (1:240) == i           # all but i treated v no one treated   
                  dgp(i, ones - Z_i, G) - dgp(i, zeros, G)}))) +
  
  declare_assignment(Z = complete_ra(N)) + 
  
  declare_measurement(
    neighbors_treated = sapply(1:N, function(i) sum(Z[-i][G[-i] == G[i]])),
    one_neighbor  = as.numeric(neighbors_treated == 1),
    two_neighbors = as.numeric(neighbors_treated == 2),
    Y = sapply(1:N, function(i) dgp(i, Z, G))
  ) +
  
  declare_estimator(Y ~ Z, 
                    inquiry = "direct", 
                    model = lm_robust, 
                    label = "naive") +
  
  declare_estimator(Y ~ Z * one_neighbor + Z * two_neighbors,
                    term = c("Z", "two_neighbors"),
                    inquiry = c("direct", "indirect"), 
                    label = "saturated", 
                    model = lm_robust)

4.5 Spillovers: direct and indirect treatments

4.6 Spillovers: Simulated estimates

4.7 Spillovers: Opportunities and Warnings

You can in principle:

  • debias estimates
  • learn about interesting processes
  • optimize design parameters

But to estimate effects you still need some SUTVA like assumption.

4.8 Spillovers: Opportunities and Warnings

In this example if one compared the outcome between treated units and all control units that are at least \(n\) positions away from a treated unit you will get the wrong answer unless \(n \geq 7\).

Baron, Reuben M, and David A Kenny. 1986. “The Moderator–Mediator Variable Distinction in Social Psychological Research: Conceptual, Strategic, and Statistical Considerations.” Journal of Personality and Social Psychology 51 (6): 1173.
Green, Donald P, Shang E Ha, and John G Bullock. 2010. “Enough Already about ‘Black Box’ Experiments: Studying Mediation Is More Difficult Than Most Scholars Suppose.” The Annals of the American Academy of Political and Social Science 628 (1): 200–208.
Li, Yimeng. 2019. “Relaxing the No Liars Assumption in List Experiment Analyses.” Political Analysis 27 (4): 540–55.