The Use of Clues: On Moderators and Mediators

Macartan Humphreys

Plan

Roadmap

Fixing ideas: the possibility of empirically grounded process tracing
Some bad news: limited gains from details (\(\downarrow\) drawing on published work with Philip Dawid, Monica Musio)
Some good news: moderators can provide foundations for qualitative tests
Some bad news: nothing from nothing (\(\downarrow\) drawing on forthcoming work with Alan Jacobs)
Some good news: clues in many places
Some OK news: applications

Roadmap

Focus here on causes of effects (“explanation”)
Mechanisms as clues for explanation

Same structures hold for mechanistic queries and non-mechanistic clues

Fixing ideas

Say you want to learn about the cause of an effect from a chain

The estimand

The estimand is:

Is \(Y\) due to \(X\)? or
Would \(Y\) have been different if \(X\) had been different in this case?

We often seek the probability of causation. Using potential outcomes notations: \(\Pr(Y(0)=0 | Y(1)=1)\).

Arguably this is an answer not an estimand. Still it’s our focus: we seek the answer that is defensible given the data and discuss the identifiability of this quantity.

Intuition 1

Say that we have lots of data from a randomized experiment and we know that the effect of X on Y is 2/3. In particular we have infinite data supporting the following conditional distribution of \(Y\) given an application of \(X\):

	Y = 0	Y = 1
\(X=0\)	2/3	1/3
\(X=1\)	1/3	2/3

What is the probability that \(X\) caused \(Y\) for a case from this population? (an “exchangeable” case)

Intuition 1

	Y = 0	Y = 1
\(X=0\)	2/3	1/3
\(X=1\)	1/3	2/3

From this data alone either of the following distributions of potential outcomes are possible:

Positive effects for 2/3 of units; negative for 1/3
0 effect for 2/3 of units, positive for 1/3

PC is 1 in the former case and 0.5 in the latter case so bounds are [0.5, 1]

Intuition 1

Sometimes distributions allow for tighter inferences:

	Y = 0	Y = 1
\(X=0\)	1	0
\(X=1\)	0.5	0.5

Here \(X=1\) is necessary for \(Y=1\). From this we know that if \(X=1, Y=1\) then \(X=1\) caused \(Y=1\)

But for a \(X=0, Y=0\) we don’t know if \(X=0\) caused \(Y=0\)

Intuition 1

Sometimes distributions allow for tighter distributions:

	Y = 0	Y = 1
\(X=0\)	0.5	0.5
\(X=1\)	0	1

Here \(X=1\) is sufficient for \(Y=1\). From this we know that if \(X=0, Y=0\) then \(X=0\) caused \(Y=0\)

But for a \(X=1, Y=1\) we don’t know if \(X=1\) caused \(Y=1\). In fact POC = \(0.5\).

Learning from chains

Intuition 2

Say now that we could decompose the \(X\rightarrow Y\) process into a 2 step process. \(X\rightarrow M \rightarrow Y\).

For an \(X=1, Y=1\) case, is learning about \(M\) informative for the probability that \(X=1\) caused \(Y=1\)?

Intuition 2

Imagine:

\(X\) is sufficient for \(M\) and \(M\) is necessary for \(Y\)
\(X\) is necessary for \(M\) and \(M\) is sufficient for \(Y\)

We learn nothing in the first case but might learn a lot in the second case.

Intuition 2

Take the second case. Imagine we could compose this:

	\(Y = 0\)	\(Y = 1\)
\(X=0\)	0.5	0.5
\(X=1\)	0.25	0.75

with POC of \(\left[\frac13, \frac23\right]\), into:

	\(M = 0\)	\(M = 1\)
\(X=0\)	1	0
\(X=1\)	0.5	0.5

	\(Y = 0\)	\(Y = 1\)
\(M=0\)	0.5	0.5
\(M=1\)	0	1

POC: \(1 \times 0.5 = 0.5\) if \(M=1\); and 0 if \(M=0\).
Expected value is \(1/3\) (since \(\Pr(M=1|X=Y=1) = \frac23\)). Remarkably, knowledge of the process alone gives identification even absent observation of \(M\)!

Intuition 2

Say now that we could decompose the \(X\rightarrow Y\) process into a 10 step process, with an effect of 0.9 at every step (\(0.9^{10} \approx 1/3\)).

	\(M_{j+1} = 0\)	\(M_{j+1} = 1\)
\(M_j=0\)	0.95	0.05
\(M_j=1\)	0.05	0.95

Then the upper bound at each step remains at 1. The lower bound is \(\frac{0.9}{0.95}\) which is about 95%.

However \(0.95^{10} < 0.60\) which means bounds are now [0.6, 1]: so better than the [0.5, 1] interval we had before but not much better.

Intuition 2

Knowing a lot about many steps means that you have greater certainty at each step, but there are more sites for leakage and so the accumulation of confidence is not large.

Homogeneous transitions

Suppose all matrices are equal:

PC bounds (red for observed, blue for non observed mediators) tighten only modestly as the length of the homogeneous chain increases

Non-homogeneous transitions

We find the largest and smallest upper and lower bounds from any complete mediation process, for different types of evidence.

From Dawid, Humphreys, and Musio

all achievable by decompositions of length one or two

Comparison with other bounds

compare these results with possible bounds from monotonicity, and from knowledge of covariates.

Take homes

POC in general not identified
Knowledge of mediation processes can help; information on mediators can help further
Positive evidence on mediators is of quite limited value for inferring causes of effects
The best possible gains are achievable from short chains: long chains leak
Negative evidence can however be powerful

Some good news: clues in many places

Clues in many places

The general idea for case level process tracing following data based model training can be generalized to:

arbitrary DAGs
arbitrary queries
finite and incomplete data

See Humphreys and Jacobs Integrated Inferences

Strategy

Embed prior causal beliefs explicitly in a causal model
- What might directly cause what?
- Where might there be confounding?
- How likely are different causal effects?
Possible observations are nodes in the model
Probative value of observations emerges from properties of the model
Then gather data and update the model
Yields inferences from data that are consistent with explicit prior beliefs about how the world works

Generality of the approach

Basic structures described by Pearl (2009) yield conceptual basis/language to articulate possibly complex estimands
Flexible stan structure used for estimation

Clues in many places

Mediators
Moderators
Confounds
Colliders
Surrogates
Symptoms

Clues in many places

Illustration: Microfounding a hoop test

In qualitative inference a “hoop” test is a search for a clue that, if absent, greatly reduces confidence in a theory.

Define a model with \(X\) causing \(Y\) through \(M\) but with confounding.

Simulate performance

We imagine a real world in which there are in fact monotonic effects and no confounding, though this is not known. (The data suggests a process in which \(X\) is necessary for \(M\) and \(M\) sufficient for \(Y\))

Define the model, then update, and query.

Learning from a hoop clue
Given	truth	prior	post.mean	sd
X==1 & Y==1	0.62	0.268	0.313	0.183
X==1 & Y==1 & M==1	0.70	0.250	0.354	0.206
X==1 & Y==1 & M==0	0.00	0.250	0.005	0.006

Hoop illustration implications

We see that we can find \(M\) informative about whether \(X\) caused \(Y\) in a case specifically when we see \(M=0\).

This is striking because:

We have a non experimental design in which the effect of \(X\) on \(Y\) is not identified
The probability that \(Y\) is caused by \(X\) is also not identified.
We placed no restrictions on functional forms except those implied by the DAG (notably \(X\) works via \(M\))
We did not build “hoopiness” into the model, we learned about it from the model

We can do similarly for other Van Evera tests, specifically using moderators to generate “doubly decisive” tests.

Nothing from nothing

Say you had access to large amounts of observational data on \(X\), \(Y\) and \(M\)
You know the temporal order of \(X, Y, M\) only.
Can you figure out whether \(M\) is informative for \(X\) causes \(Y\) from this data?

Updating gives:

Assume a world, like above, where in fact \(X \rightarrow M \rightarrow Y\), all effects strong (80%, 80%).

Conditional inferences from an updated agnostic model given a true model in which X causes M and M causes Y
Query	Given	Using	mean	sd
Q 1	-	posteriors	0.4	0.09
Q 1	M==0	posteriors	0.4	0.12
Q 1	M==1	posteriors	0.4	0.13

This negative result holds even if we can exclude \(X \rightarrow Y\)

This example illustrates the Cartwright idea of no causes in, no causes out.

Some OK news: applications

Institutions and growth

Institutions and Growth Model

Updated

Case level inferences given possible observations of distance and mortality.

Implications

for a case with weak institutions and low growth (first column), the former likely caused the latter. Similarly for cases with strong institutions and growth (last column).
cases with weak institutions and high growth (and vice versa) relationship unlikely causal
in a strong institutions / high growth case, proximity to the equator increases confidence that the strong institutions helped: despite the fact that distance and institutions are complements for the average treatment effect
Mortality is informative about the effect of institutions on growth even if we already know the strength of institutions
Learned patterns of confounding are consistent with a world in which settlers responded to low mortality by building strong institutions specifically in those places where they rationally expected strong institutions to help.

Confounding

Correlated posteriors

Take aways

The probative value of clues can be derived rather than imposed
Causal attribution queries are generally not identified—that is, we do not nail them even with infinite data
Knowledge of mediators (and mediation processes) can narrow bounds—-but not all that much
Some structure is generally needed (“nothing from nothing”); e.g. as justified by experimentation
Given a causal structure however the integration of qualitative and quantitative inferences is quite simple
But inferences from formal integration likely to be less dramatic than researcher interpretations

References

Dawid, Philip; Humphreys, Macartan; Musio, Monica (2022) : Bounding Causes of Effects With Mediators, Sociological Methods & Research, ISSN 1552-8294, Sage, https://doi.org/10.1177/00491241211036161 https://www.econstor.eu/handle/10419/251185
Humphreys, Macartan; Jacobs, Alan (Forthcoming) : Integrated Inferences. Cambridge University Press. https://macartan.github.io/integrated_inferences/