Topics II
Which effects are identified by the random assignment of \(X\)?
An obvious approach is to first examine the (average) effect of X on M1 and then use another manipulation to examine the (average) effect of M1 on Y.
An obvious approach is to first examine the (average) effect of X on M1 and then use another manipulation to examine the (average) effect of M1 on Y.
Both instances of unobserved confounding between \(M\) and \(Y\):
Both instances of unobserved confounding between \(M\) and \(Y\):
Another somewhat obvious approach is to see how the effect of \(X\) on \(Y\) in a regression is reduced when you control for \(M\).
If the effect of \(X\) on \(Y\) passes through \(M\) then surely there should be no effect of \(X\) on \(Y\) after you control for \(M\).
This common strategy associated with Baron and Kenny (1986) is also not guaranteed to produce reliable results. See for instance Green, Ha, and Bullock (2010)
df <- fabricate(N = 1000,
U = rbinom(N, 1, .5), X = rbinom(N, 1, .5),
M = ifelse(U==1, X, 1-X), Y = ifelse(U==1, M, 1-M))
list(lm(Y ~ X, data = df),
lm(Y ~ X + M, data = df)) |> texreg::htmlreg()
Model 1 | Model 2 | |
---|---|---|
(Intercept) | 0.00*** | 0.00*** |
(0.00) | (0.00) | |
X | 1.00*** | 1.00*** |
(0.00) | (0.00) | |
M | 0.00 | |
(0.00) | ||
R2 | 1.00 | 1.00 |
Adj. R2 | 1.00 | 1.00 |
Num. obs. | 1000 | 1000 |
***p < 0.001; **p < 0.01; *p < 0.05 |
The bad news is that although a single experiment might identify the total effect, it can not identify these elements of the direct effect.
So:
Check formal requirement for identification under single experiment design (“sequential ignorability”—that, conditional on actual treatment, it is as if the value of the mediation variable is randomly assigned relative to potential outcomes). But this is strong (and in fact unverifiable) and if it does not hold, bounds on effects always include zero (Imai et al)
Consider sensitivity analyses
You can use interactions with covariates if you are willing to make assumptions on no heterogeneity of direct treatment effects over covariates.
eg you think that money makes people get to work faster because they can buy better cars; you look at the marginal effect of more money on time to work for people with and without cars and find it higher for the latter.
This might imply mediation through transport but only if there is no direct effect heterogeneity (eg people with cars are less motivated by money).
Weaker assumptions justify parallel design
Takeaway: Understanding mechanisms is harder than you think. Figure out what assumptions fly.
CausalQueries
Lets imagine that sequential ignorability does not hold. What are our posteriors on mediation quantities when in fact all effects are mediated, effects are strong, and we have lots of data?
CausalQueries
We imagine a true model and consider estimands:
truth <- make_model("X -> M ->Y") |>
set_parameters(c(.5, .5, .1, 0, .8, .1, .1, 0, .8, .1))
queries <-
list(
indirect = "Y[X = 1, M = M[X=1]] - Y[X = 1, M = M[X=0]]",
direct = "Y[X = 1, M = M[X=0]] - Y[X = 0, M = M[X=0]]"
)
truth |> query_model(queries) |> kable()
label | query | given | using | case_level | mean | sd | cred.low | cred.high |
---|---|---|---|---|---|---|---|---|
indirect | Y[X = 1, M = M[X=1]] - Y[X = 1, M = M[X=0]] | - | parameters | FALSE | 0.64 | NA | 0.64 | 0.64 |
direct | Y[X = 1, M = M[X=0]] - Y[X = 0, M = M[X=0]] | - | parameters | FALSE | 0.00 | NA | 0.00 | 0.00 |
CausalQueries
Error in if (parent_nodes == "") {: argument is of length zero
Why such poor behavior? Why isn’t weight going onto indirect effects?
Turns out the data is consistent with direct effects only: specifically that whenever \(M\) is responsive to \(X\), \(Y\) is responsive to \(X\).
CausalQueries
Error in if (parent_nodes == "") {: argument is of length zero
Multiple survey experimental designs have been generated to make it easier for subjects to answer sensitive questions
The key idea is to use inference rather than measurement.
Subjects are placed in different conditions and the conditions affect the answers that are given in such a way that you can infer some underlying quantity of interest
This is an obvious DAG but the main point is to be clear that the Value is the quantity of interest and the value is not affected by the treatment, Z.
The list experiment supposes that:
In other words: sensitivities notwithstanding, they are happy for the researcher to make correct inferences about them or their group
Respondents are given a short list and a long list.
The long list differs from the short list in having one extra item—the sensitive item
We ask how many items in each list does a respondent agree with:
How many of these do you agree with:
Short list | Long list | “Effect” | |
---|---|---|---|
“2 + 2 = 4” | “2 + 2 = 4” | ||
“2 * 3 = 6” | “2 * 3 = 6” | ||
“3 + 6 = 8” | “Climate change is real” | ||
“3 + 6 = 8” | |||
Answer | Y(0) = 2 | Y(1) = 4 | Y(1) - Y(0) = 2 |
[Note: this is obviously not a good list. Why not?]
declaration_17.3 <-
declare_model(
N = 500,
control_count = rbinom(N, size = 3, prob = 0.5),
Y_star = rbinom(N, size = 1, prob = 0.3),
potential_outcomes(Y_list ~ Y_star * Z + control_count)
) +
declare_inquiry(prevalence_rate = mean(Y_star)) +
declare_assignment(Z = complete_ra(N)) +
declare_measurement(Y_list = reveal_outcomes(Y_list ~ Z)) +
declare_estimator(Y_list ~ Z, .method = difference_in_means,
inquiry = "prevalence_rate")
diagnosands <- declare_diagnosands(
bias = mean(estimate - estimand),
mean_CI_width = mean(conf.high - conf.low)
)
Design | Inquiry | Bias | Mean CI Width |
---|---|---|---|
declaration_17.3 | prevalence_rate | 0.00 | 0.32 |
(0.00) | (0.00) |
declaration_17.4 <-
declare_model(
N = N,
U = rnorm(N),
control_count = rbinom(N, size = 3, prob = 0.5),
Y_star = rbinom(N, size = 1, prob = 0.3),
W = case_when(Y_star == 0 ~ 0L,
Y_star == 1 ~ rbinom(N, size = 1, prob = proportion_hiding)),
potential_outcomes(Y_list ~ Y_star * Z + control_count)
) +
declare_inquiry(prevalence_rate = mean(Y_star)) +
declare_assignment(Z = complete_ra(N)) +
declare_measurement(Y_list = reveal_outcomes(Y_list ~ Z),
Y_direct = Y_star - W) +
declare_estimator(Y_list ~ Z, inquiry = "prevalence_rate", label = "list") +
declare_estimator(Y_direct ~ 1, inquiry = "prevalence_rate", label = "direct")
rho <- -.8
correlated_lists <-
declare_model(
N = 500,
U = rnorm(N),
control_1 = rbinom(N, size = 1, prob = 0.5),
control_2 = correlate(given = control_1, rho = rho, draw_binary, prob = 0.5),
control_count = control_1 + control_2,
Y_star = rbinom(N, size = 1, prob = 0.3),
potential_outcomes(Y_list ~ Y_star * Z + control_count)
) +
declare_inquiry(prevalence_rate = mean(Y_star)) +
declare_assignment(Z = complete_ra(N)) +
declare_measurement(Y_list = reveal_outcomes(Y_list ~ Z)) +
declare_estimator(Y_list ~ Z)
This is typically used to estimate average levels
However you can use it in the obvious way to get average levels for groups: this is equivalent to calculating group level heterogeneous effects
Extending the idea you can even get individual level estimates: for instance you might use causal forests
You can also use this to estimate the effect of an experimental treatment on an item that’s measured using a list, without requiring individual level estimates:
\[Y_i = \beta_0 + \beta_1Z_i + \beta_2Long_i + \beta_3Z_iLong_i\]
Note that here we looked at “hiders” – people not answering the direct question truthfully
See Li (2019) on bounds when the “no liars” assumption is threatened — this is about whether people respond truthfully to the list experimental question
Good questions studied well
Prospects and priorities
There is no foundationless answer to this question. So let’s take some foundations from the Belmont report and seek to ensure:
Unfortunately, operationalizing these requires further ethical theories. Let’s assume that (1) is operationalized by informed consent (a very liberal idea). We are a bit at sea for (2) and (3) (the Belmont report suggests something like a utilitarian solution).
The major focus on (1) by IRBs might follow from the view that if subjects consent, then they endorse the ethical calculations made for 2 and 3 — they think that it is good and fair.
This is a little tricky, though, since the study may not be good or fair because of implications for non-subjects.
The problem is that many (many) field experiments have nothing like informed consent.
For example, whether the government builds a school in your village, whether an ad appears on your favorite radio show, and so on.
Consider three cases:
Consider three cases:
In all cases, there is no consent given by subjects.
In 2 and 3, the treatment is possibly harmful for subjects, and the results might also be harmful. But even in case 1, there could be major unintended harmful consequences.
In cases 1 and 3, however, the “intervention” is within the sphere of normal activities for the implementer.
Sometimes it is possible to use this point of difference to make a “spheres of ethics” argument for “embedded experimentation.”
Spheres of Ethics Argument: Experimental research that involves manipulations that are not normally appropriate for researchers may nevertheless be ethical if:
Otherwise keep focus on consent and desist if this is not possible
Experimental researchers are deeply engaged in the movement towards more transparency social science research.
Experimental researchers are deeply engaged in the movement towards more transparency social science research.
Contentious issues (mostly):
Data. How soon should you make your data available? My view: as soon as possibe. Along with working papers and before publication. Before it affects policy in any case. Own the ideas not the data.
Where should you make your data available? Dataverse is focal for political science. Not personal website (mea culpa)
What data should you make available? Disagreement is over how raw your data should be. My view: as raw as you can but at least post cleaning and pre-manipulation.
Experimental researchers are deeply engaged in the movement towards more transparency social science research.
Should you register?: Hard to find reasons against. But case strongest in testing phase rather than exploratory phase.
Registration: When should you register? My view: Before treatment assignment. (Not just before analysis, mea culpa)
Registration: Should you deviate from a preanalysis plan if you change your mind about optimal estimation strategies. My view: Yes, but make the case and describe both sets of results.
File drawer bias (Publication bias)
Analysis bias (Fishing)
– Say in truth \(X\) affects \(Y\) in 50% of cases.
– Researchers conduct multiple excellent studies. But they only write up the 50% that produce “positive” results.
– Even if each individual study is indisputably correct, the account in the research record – that X affects Y in 100% of cases – will be wrong.
– Say in truth \(X\) affects \(Y\) in 50% of cases.
– Researchers conduct multiple excellent studies. But they only write up the 50% that produce “positive” results.
– Even if each individual study is indisputably correct, the account in the research record – that X affects Y in 100% of cases – will be wrong.
Exacerbated by:
– Publication bias – the positive results get published
– Citation bias – the positive results get read and cited
– Chatter bias – the positive results gets blogged, tweeted and TEDed.
– Say in truth \(X\) affects \(Y\) in 50% of cases.
– But say that researchers enjoy discretion to select measures for \(X\) or \(Y\), or enjoy discretion to select statistical models after seeing \(X\) and \(Y\) in each case.
– Then, with enough discretion, 100% of analyses may report positive effects, even if all studies get published.
– Say in truth \(X\) affects \(Y\) in 50% of cases.
– But say that researchers enjoy discretion to select measures for \(X\) or \(Y\), or enjoy discretion to select statistical models after seeing \(X\) and \(Y\) in each case.
– Then, with enough discretion, 100% of analyses may report positive effects, even if all studies get published.
– Try the exact fishy test An Exact Fishy Test (https://macartan.shinyapps.io/fish/)
– What’s the problem with this test?
When your conclusions do not really depend on the data
Eg – some evidence will always support your proposition – some interpretation of evidence will always support your proposition
Knowing the mapping from data to inference in advance gives a handle on the false positive rate.
Source: Gerber and Malhotra
Implications are:
Summary: we do not know when we can or cannot trust claims made by researchers.
[Not a tradition specific claim]
Simple idea:
Fishing can happen in very subtle ways, and may seem natural and justifiable.
Example:
Our journal review process is largely organized around advising researchers how to adjust analysis in light of findings in the data.
Frequentists can do it
Bayesians can do it too.
Qualitative researchers can also do it.
You can even do it with descriptive statistics
The key distinction is between prospective and retrospective studies.
Not between experimental and observational studies.
A reason (from the medical literature) why registration is especially important for experiments: because you owe it to subjects
A reason why registration is less important for experiments: because it is more likely that the intended analysis is implied by the design in an experimental study. Researcher degrees of freedom may be greatest for observational qualitative analyses.
Registration will produce some burden but does not require the creation of content that is not needed anyway
It does shift preparation of analyses forward
And it also can increase the burden of developing analyses plans even for projects that don’t work. But that is in part, the point.
Upside is that ultimate analyses may be much easier.
In neither case would the creation of a registration facility prevent exploration.
What it might do is make it less credible for someone to claim that they have tested a proposition when in fact the proposition was developed using the data used to test it.
Registration communicates when researchers are angage in exploration or not. We love exploration and should be proud of it.
Incentives and strategies
Inquiry | In the preanalysis plan | In the paper | In the appendix |
---|---|---|---|
Gender effect | X | X | |
Age effect | X |
Inquiry | Following A from the PAP | Following A from the paper | Notes |
---|---|---|---|
Gender effect | estimate = 0.6, s.e = 0.31 | estimate = 0.6, s.e = 0.25 | Difference due to change in control variables [provide cross references to tables and code] |