Same structures hold for mechanistic queries and non-mechanistic clues
The estimand is:
We often seek the probability of causation. Using potential outcomes notations: \(\Pr(Y(0)=0 | Y(1)=1)\).
Arguably this is an answer not an estimand. Still it’s our focus: we seek the answer that is defensible given the data and discuss the identifiability of this quantity.
Say that we have lots of data from a randomized experiment and we know that the effect of X on Y is 2/3. In particular we have infinite data supporting the following conditional distribution of \(Y\) given an application of \(X\):
Y = 0 | Y = 1 | |
---|---|---|
\(X=0\) | 2/3 | 1/3 |
\(X=1\) | 1/3 | 2/3 |
What is the probability that \(X\) caused \(Y\) for a case from this population? (an “exchangeable” case)
Y = 0 | Y = 1 | |
---|---|---|
\(X=0\) | 2/3 | 1/3 |
\(X=1\) | 1/3 | 2/3 |
From this data alone either of the following distributions of potential outcomes are possible:
PC is 1
in the former case and 0.5
in the latter case so bounds are [0.5, 1]
Sometimes distributions allow for tighter inferences:
Y = 0 | Y = 1 | |
---|---|---|
\(X=0\) | 1 | 0 |
\(X=1\) | 0.5 | 0.5 |
Here \(X=1\) is necessary for \(Y=1\). From this we know that if \(X=1, Y=1\) then \(X=1\) caused \(Y=1\)
But for a \(X=0, Y=0\) we don’t know if \(X=0\) caused \(Y=0\)
Sometimes distributions allow for tighter distributions:
Y = 0 | Y = 1 | |
---|---|---|
\(X=0\) | 0.5 | 0.5 |
\(X=1\) | 0 | 1 |
Here \(X=1\) is sufficient for \(Y=1\). From this we know that if \(X=0, Y=0\) then \(X=0\) caused \(Y=0\)
But for a \(X=1, Y=1\) we don’t know if \(X=1\) caused \(Y=1\). In fact POC = \(0.5\).
Say now that we could decompose the \(X\rightarrow Y\) process into a 2 step process. \(X\rightarrow M \rightarrow Y\).
For an \(X=1, Y=1\) case, is learning about \(M\) informative for the probability that \(X=1\) caused \(Y=1\)?
Imagine:
We learn nothing in the first case but might learn a lot in the second case.
Take the second case. Imagine we could compose this:
\(Y = 0\) | \(Y = 1\) | |
---|---|---|
\(X=0\) | 0.5 | 0.5 |
\(X=1\) | 0.25 | 0.75 |
with POC of \(\left[\frac13, \frac23\right]\), into:
\(M = 0\) | \(M = 1\) | |
---|---|---|
\(X=0\) | 1 | 0 |
\(X=1\) | 0.5 | 0.5 |
\(Y = 0\) | \(Y = 1\) | |
---|---|---|
\(M=0\) | 0.5 | 0.5 |
\(M=1\) | 0 | 1 |
Say now that we could decompose the \(X\rightarrow Y\) process into a 10 step process, with an effect of 0.9 at every step (\(0.9^{10} \approx 1/3\)).
\(M_{j+1} = 0\) | \(M_{j+1} = 1\) | |
---|---|---|
\(M_j=0\) | 0.95 | 0.05 |
\(M_j=1\) | 0.05 | 0.95 |
Then the upper bound at each step remains at 1. The lower bound is \(\frac{0.9}{0.95}\) which is about 95%.
However \(0.95^{10} < 0.60\) which means bounds are now [0.6, 1]
: so better than the [0.5, 1]
interval we had before but not much better.
Knowing a lot about many steps means that you have greater certainty at each step, but there are more sites for leakage and so the accumulation of confidence is not large.
Suppose all matrices are equal:
We find the largest and smallest upper and lower bounds from any complete mediation process, for different types of evidence.
The general idea for case level process tracing following data based model training can be generalized to:
See Humphreys and Jacobs Integrated Inferences
stan
structure used for estimationIn qualitative inference a “hoop” test is a search for a clue that, if absent, greatly reduces confidence in a theory.
Define a model with \(X\) causing \(Y\) through \(M\) but with confounding.
We imagine a real world in which there are in fact monotonic effects and no confounding, though this is not known. (The data suggests a process in which \(X\) is necessary for \(M\) and \(M\) sufficient for \(Y\))
Define the model, then update, and query.
Given | truth | prior | post.mean | sd |
---|---|---|---|---|
X==1 & Y==1 | 0.62 | 0.268 | 0.313 | 0.183 |
X==1 & Y==1 & M==1 | 0.70 | 0.250 | 0.354 | 0.206 |
X==1 & Y==1 & M==0 | 0.00 | 0.250 | 0.005 | 0.006 |
We see that we can find \(M\) informative about whether \(X\) caused \(Y\) in a case specifically when we see \(M=0\).
This is striking because:
We can do similarly for other Van Evera tests, specifically using moderators to generate “doubly decisive” tests.
Assume a world, like above, where in fact \(X \rightarrow M \rightarrow Y\), all effects strong (80%, 80%).
Query | Given | Using | mean | sd |
---|---|---|---|---|
Q 1 | - | posteriors | 0.4 | 0.09 |
Q 1 | M==0 | posteriors | 0.4 | 0.12 |
Q 1 | M==1 | posteriors | 0.4 | 0.13 |
This negative result holds even if we can exclude \(X \rightarrow Y\)
This example illustrates the Cartwright idea of no causes in, no causes out.
for a case with weak institutions and low growth (first column), the former likely caused the latter. Similarly for cases with strong institutions and growth (last column).
cases with weak institutions and high growth (and vice versa) relationship unlikely causal
in a strong institutions / high growth case, proximity to the equator increases confidence that the strong institutions helped: despite the fact that distance and institutions are complements for the average treatment effect
Mortality is informative about the effect of institutions on growth even if we already know the strength of institutions
Learned patterns of confounding are consistent with a world in which settlers responded to low mortality by building strong institutions specifically in those places where they rationally expected strong institutions to help.
Dawid, Philip; Humphreys, Macartan; Musio, Monica (2022) : Bounding Causes of Effects With Mediators, Sociological Methods & Research, ISSN 1552-8294, Sage, https://doi.org/10.1177/00491241211036161 https://www.econstor.eu/handle/10419/251185
Humphreys, Macartan; Jacobs, Alan (Forthcoming) : Integrated Inferences. Cambridge University Press. https://macartan.github.io/integrated_inferences/