Inquiries and identification

Macartan Humphreys

1 Inquiries

Well posed questions

1.1 Outline

Types of estimands
Principal strata
Identification
Backdoor
Frontdoor
dagitty

1.2 Estimands and inquiries

Your inquiry is your question and the estimand is the true (generally unknown) answer to the inquiry
The estimand is the thing you want to estimate
If you are estimating something you should be able to say what your estimand is
You are responsible for your estimand. Your estimator will not tell you what your estimand is
Just because you can calculate something does not mean that you have an estimand
You can test a hypothesis without having an estimand

Read: II ch 4, DD, ch 7

1.2.1 Estimands: ATE, ATT, ATC, S-, P-

ATE is Average Treatment Effect (all units)
ATT is Average Treatment Effect on the Treated
ATC is Average Treatment Effect on the Controls

1.2.2 Estimands: ATE, ATT, ATC, S-, P-

Say that units are randomly assigned to treatment in different strata (maybe just one); with fixed, though possibly different, shares assigned in each stratum. Then the key estimands and estimators are:

Estimand	Estimator
$\tau_{ATE} \equiv \mathbb{E}[\tau_i]$	$\widehat{\tau}_{ATE} = \sum\nolimits_{x} \frac{w_x}{\sum\nolimits_{j}w_{j}}\widehat{\tau}_x$
$\tau_{ATT} \equiv \mathbb{E}[\tau_i \| Z_i = 1]$	$\widehat{\tau}_{ATT} = \sum\nolimits_{x} \frac{p_xw_x}{\sum\nolimits_{j}p_jw_j}\widehat{\tau}_x$
$\tau_{ATC} \equiv \mathbb{E}[\tau_i \| Z_i = 0]$	$\widehat{\tau}_{ATC} = \sum\nolimits_{x} \frac{(1-p_x)w_x}{\sum\nolimits_{j}(1-p_j)w_j}\widehat{\tau}_x$

where $x$ indexes strata, $p_x$ is the share of units in each stratum that is treated, and $w_x$ is the size of a stratum.

1.2.3 Estimands: ATE, ATT, ATC, S-, P-, C-

In addition, each of these can be targets of interest:

for the population, in which case we refer to PATE, PATT, PATC and $\widehat{PATE}, \widehat{PATT}, \widehat{PATC}$
for a sample, in which case we refer to SATE, SATT, SATC, and $\widehat{SATE}, \widehat{SATT}, \widehat{SATC}$

And for different subgroups,

given some value on a covariate, in which case we refer to CATE (conditional average treatment effect)

1.2.4 Broader classes of estimands: LATE/CATE

The CATEs are conditional average treatment effects, for example the effect for men or for women. These are straightfoward.

However we might also imagine conditioning on unobservable or counterfactual features.

The LATE (or CACE: “complier average causal effect”) asks about the effect of a treatment ($X$) on an outcome ($Y$) for people that are responsive to an encouragement ($Z$)

\[LATE = \frac{1}{|C|}\sum_{j\in C}(Y_j(X=1) - Y_j(X=0))\] \[C:=\{j:X_j(Z=1) > X_j(Z=0) \}\]

We will return to these in the study of instrumental variables.

1.2.5 Quantile estimands

Other ways to condition on potential outcomes:

A quantile treatment effect: You might be interested in the difference between the median $Y(1)$ and the median $Y(0)$ (Imbens and Rubin (2015) 20.3.1)
or even be interested in the median $Y(1) - Y(0)$. Similarly for other quantiles.

1.2.6 Model estimands

Many inquiries are averages of individual effects, even if the groups are not known, but they do not have to be:

The RDD estimand is a statement about what effects would be at a threshold; it can be defined under a model even if no actual individuals are at the threshold. We imagine average potential outcomes as a function of treatment $Z$ and running variable $X$, $f(z, x)$ and define: \[\tau_{RDD} := f(1, x^*) - f(0, x^*)\]

1.2.7 Distribution estimands

Many inquiries are averages of individual effects, even if the groups are not known,

But they do not have to be:

Inquiries might relate to distributional quantities such as:
- The effect of treatment on the variance in outcomes: $var(Y(1)) - var(Y(0))$
- The variance of treatment effects: $var(Y(1) - Y(0))$
- Other inequality measures (e.g. Ginis; (Imbens and Rubin (2015) 20.3.2))

You might even be interested in $\min(Y_i(1) - Y_i(0))$.

1.2.8 Spillover estimands

There are lots of interesting “spillover” estimands.

Imagine there are three individuals and each person’s outcomes depends on the assignments of all others. For instance $Y_1(Z_1, Z_2, Z_3$, or more generally, $Y_i(Z_i, Z_{i+1 (\text{mod }3)}, Z_{i+2 (\text{mod }3)})$.

Then three estimands might be:

$\frac13\left(\sum_{i}{Y_i(1,0,0) - Y_i(0,0,0)}\right)$
$\frac13\left(\sum_{i}{Y_i(1,1,1) - Y_i(0,0,0)}\right)$
$\frac13\left(\sum_{i}{Y_i(0,1,1) - Y_i(0,0,0)}\right)$

Interpret these. What others might be of interest?

1.2.9 Differences in CATEs and interaction estimands

A difference in CATEs is a well defined estimand that might involve interventions on one node only:

$\mathbb{E}_{\{W=1\}}[Y(X=1) - Y(X=0)] - \mathbb{E}_{\{W=0\}}[Y(X=1) - Y(X=0)]$

It captures differences in effects.

An interaction is an effect on an effect:

$\mathbb{E}[Y(X=1, W=1) - Y(X=0, W=1)] - \mathbb{E}[Y(X=1, W=0) - Y(X=0, W=0)]$

Note in the latter the expectation is taken over the whole population.

1.2.10 Mediation estimands and complex counterfactuals

Say $X$ can affect $Y$ directly, or indirectly through $M$. then we can write potential outcomes as:

$Y(X=x, M=m)$
$M(X=x)$

We can then imagine inquiries of the form:

$Y(X=1, M=M(X=1)) - Y(X=0, M=M(X=0))$
$Y(X=1, M=1) - Y(X=0, M=1)$
$Y(X=1, M=M(X=1)) - Y(X=1, M=M(X=0))$

Interpret these. What others might be of interest?

1.2.11 Mediation estimands and complex counterfactuals

Again we might imagine that these are defined with respect to some group:

$A = \{i|Y_i(1, M(X=1)) > Y_i(0, M(X=0))\}$
$\frac{1}{|A|} \sum_{i\in A}(Y(1, 1) > Y(0, 1))$

here, among those for whom $X$ has a positive effect on $Y$, for what share would there be a positive effect if $M$ were fixed at 1?

1.2.12 Causes of effects and effects of causes

In qualitative research a particularly common inquiry is “did $X=1$ cause $Y=1$?

This is often given as a probability, the “probability of causation” (though at the case level we might better think of this probability as an estimate rather than an estimand):

\[\Pr(Y_i(0) = 0 | Y_i(1) = 1, X = 1)\]

1.2.13 Causes of effects and effects of causes

Intuition: What’s the probability $X=1$ caused $Y=1$ in an $X=1, Y=1$ case drawn from a large population with the following experimental distribution:

	Y=0	Y=1	All
X=0	1	0	1
X=1	0.25	0.75	1

1.2.14 Causes of effects and effects of causes

Intuition: What’s the probability $X=1$ caused $Y=1$ in an $X=1, Y=1$ case drawn from a large population with the following experimental distribution:

	Y=0	Y=1	All
X=0	0.75	0.25	1
X=1	0.25	0.75	1

1.2.15 Actual causation

Other inquiries focus on distinguishing between causes.

For the Billy Suzy problem (Hall 2004), Halpern (2016) focuses on “actual causation” as a way to distinguish between Suzy and Billy:

Imagine Suzy and Billy, simultaneously throwing stones at a bottle. Both are excellent shots and hit whatever they aim at. Suzy’s stone hits first, knocks over the bottle, and the bottle breaks. However, Billy’s stone would have hit had Suzy’s not hit, and again the bottle would have broken. Did Suzy’s throw cause the bottle to break? Did Billy’s?

1.2.16 Actual causation

Actual Causation:

$X=x$ and $Y=y$ both happened;
there is some set of variables, $\mathcal W$, such that if they were fixed at the levels that they actually took on in the case, and if $X$ were to be changed, then $Y$ would change (where $\mathcal W$ can also be an empty set);
no strict subset of $X$ satisfies 1 and 2 (there is no redundant part of the condition, $X=x$).

1.2.17 Actual causation

Suzy: Condition 2 is met if Suzy’s throw made a difference, counterfactually speaking—with the important caveat that, in determining this, we are permitted to condition on Billy’ stone not hitting the bottle.
Billy: Condition 2 is not met.

An inquiry: for what share in a population is a possible cause an actual cause?

1.2.18 Pearl’s ladder

Pearl (e.g. Pearl and Mackenzie (2018)) describes three types of inquiry:

Level	Activity	Inquiry
Association	“Seeing”	If I see $X=1$ should I expect $Y=1$?
Intervention	“Doing”	If I set $X$ to $1$ should I expect $Y=1$?
Counterfactual	“Imagining”	If $X$ were $0$ instead of 1, would $Y$ then be $0$ instead of $1$?

1.2.19 Pearl’s ladder

We can understand these as asking different types of questions about a causal model

Level	Activity	Inquiry
Association	“Seeing”	$\Pr(Y=1\|X=1)$
Intervention	“Doing”	$\mathbb{E}[\mathbb{I}(Y(1)=1)]$
Counterfactual	“Imagining”	$\Pr(Y(1)=1 \& Y(0)=0)$

The third is qualitatively different because it requires information about two mutually incompatible conditions for units. This is not (generally ) recoverable directly from knowledge of $\Pr(Y(1)=1)$ and $\Pr(Y(0)=0)$.

1.3 Inquiries as statements about principal strata

Given a causal model over nodes with discrete ranges, inquiries can generally be described as summaries of the distributions of exogenous nodes.

We already saw two instances of this:

The probability that $X$ has a positive effect on $Y$ in an $X \rightarrow Y$ model is $\lambda^Y_{01}$ (last lecture)
The share of “compliers” in an IV model $Z \rightarrow X \rightarrow Y \leftrightarrow X$ is $\lambda^X_{01}$

1.4 Identification

What it is. When you have it. What it’s worth.

1.4.1 Identification

Informally a quantity is “identified” if it can be “recovered” once you have enough data.

Say for example average wage is $x$ in some very large population. If I gather lots and lots of data on the wages of individuals and take the average then then my estimate will ultimately let be figure out $x$.

If $x$ is $1 then my estimate will end up centered on $1.
If it is $2 it will end up centered on $2.

1.4.2 Identification (Definition)

Identifiability Let $Q(M)$ be a query defined over a class of models $\mathcal M$, then $Q$ is identifiable if $P(M_1) = P(M_2) \rightarrow Q(M_1) = Q(M_1)$.
Identifiability with constrained data Let $Q(M)$ be a query defined over a class of models $\mathcal M$, then $Q$ is identifiable from features $F(M)$ if $F(M_1) = F(M_2) \rightarrow Q(M_1) = Q(M_1)$.

Based on Defn 3.2.3 in Pearl.

Essentially: Each underlying value produces a unique data distribution. When you see that distribution you recover the parameter.

1.4.3 Identification (Example without identification)

Informally a quantity is “identified” if it can be “recovered” once you have enough data.

Say for example average wage is $x^m$ for men and $x^w$ for women (in some very large population).
If I gather lots and lots of data on the wages of (male and female) couples, e.g. $x^c_i = x^m_i + x^w_i$ then, although this will be informative, it will never be sufficient to recover $x^m$ for men and $x^w$.
I can recover $x^c$, but there are too many combinations of possible values of $x^m$ and $x^w$ consistent with the observed data.

1.4.4 Identification : Goal

Our goal in causal inference is to estimate quantities such as:

\[\Pr(Y|\hat{x})\]

where $\hat{x}$ is interpreted as $X$ set to $x$ by “external” control. Equivalently: $do(X=x)$ or sometimes $X \leftarrow x$.

If this quantity is identifiable then we can recover it with infinite data.
If it is not identifiable, then, even in the best case, we are not guaranteed to get the right answer.

Are there general rules for determining whether this quantity can be identified? Yes.

1.4.5 Identification : Goal

Note first, identifying

\[\Pr(Y|x)\]

is easy.

But we are not always interested in identifying the distribution of $Y$ given observed values of $x$, but rather, the distribution of $Y$ if $X$ is set to $x$.

1.5 Levels and effects

If we can identify the controlled distribution we can calculate other causal quantities of interest.

For example for a binary $X, Y$ the causal effect of $X$ on the probability that $Y=1$ is:

\[\Pr(Y=1|\hat{x}=1) - \Pr(Y=1|\hat{x}=0)\]

Again, this is not the same as:

\[\Pr(Y=1|x=1) - \Pr(Y=1|x=0)\]

It’s the difference between seeing and doing.

1.5.1 When to condition? What to condition on?

The key idea is that you want to find a set of variables such that when you condition on these you get what you would get if you used a do operation.

Intuition:

You could imagine creating a “mutilated” graph by removing all the arrows leading out of X
Then select a set of variables, $Z$, such that $X$ and $Y$ are d-separated by $Z$ on the the mutilated graph
When you condition on these you are making sure that any covariation between $X$ and $Y$ is covariation that is due to the effects of $X$

1.5.2 Illustration

1.5.3 Illustration: Remove paths out

1.5.4 Illustration: Block backdoor path

1.5.5 Illustration: Why not like this?

1.5.6 Identification

Three results (“Graphical Identification Criteria”)
- Backdoor criterion
- Adjustment criterion
- Frontdoor criterion
There are more

1.5.7 Backdoor Criterion: (Pearl 1995)

The backdoor criterion is satisfied by $Z$ (relative to $X$, $Y$) if:

No node in $Z$ is a descendant of $X$
$Z$ blocks every backdoor path from $X$ to $Y$ (i.e. every path that contains an arrow into $X$)

In that case you can identify the effect of $X$ on $Y$ by conditioning on $Z$:

\[P(Y=y | \hat{x}) = \sum_z P(Y=y| X = x, Z=z)P(z)\] (This is eqn 3.19 in Pearl (2000))

1.5.8 Backdoor Criterion: (Pearl 1995)

\[P(Y=y | \hat{x}) = \sum_z P(Y=y| X = x, Z=z)P(z)\]

No notion of a linear control or anything like that; idea really is like blocking: think lots of discrete data and no missing patterns
Note this is a formula for a (possibly counterfactual) level; a counterfactual difference would be given in the obvious way by:

\[P(Y=y | \hat{x}) - P(Y=y | \hat{x}')\]

1.5.9 Backdoor Proof

Following Pearl (2009), Chapter 11. Let $T$ denote the set of parents of $X$: $T := pa(X)$, with (possibly vector valued) realizations $t$. These might not all be observed.

If the backdoor criterion is satisfied, we have:

$Y$ is independent of $T$, given $X$ and observed data, $Z$ (since $Z$ blocks backdoor paths)
$X$ is independent of $Z$ given $T$. (Since $Z$ includes only nondescendents)

Key idea: The intervention level relates to the observational level as follows: \[p(y|\hat{x}) = \sum_{t\in T} p(t)p(y|x, t)\]
Think of this as fully accounting for the (possibly unobserved) causes of $X$, $T$

1.5.10 Backdoor Proof

We want to get to:

\[p(y|\hat{x}) = \sum_{t\in T} p(t)p(y|x, t)\]

But of course we do not observe $T$, rather we observe $Z$. So we now need to write everything in terms of $Z$ rather than $T$.

We bring $Z$ into the picture by writing:

\[p(y|\hat{x}) = \sum_{t\in T} p(t) \sum_z p(y|x, t, z)p(z|x, t)\]

now we want to get rid of $T$…

1.5.11 Backdoor Proof

now we want to get rid of $T$…

Using the two conditions from the backdoor definition above:
1. replace $p(y|x, t, z)$ with $p(y | x, z)$
2. replace $p(z|x, t)$ with $p(z|t)$

This gives: \[p(y|\hat x) = \sum_{t \in T} p(t) \sum_z p(y|x, z)p(z|t)\]

Cleaning up, we can get rid of $T$:

\[p(y|\hat{x}) = \sum_z p(y|x, z)\sum_{t\in T} p(z|t)p(t) = \sum_z p(y| x, z)p(z)\]

1.5.12 Backdoor proof figure

For intuition:

We would be happy if we could condition on the parent $T$, but $T$ is not observed. However we can use $Z$ instead making use of the fact that:

$p(y|x, t, z) = p(y | x, z)$ (since $Z$ blocks)
$p(z|x, t) = p(z|t)$ (since $Z$ is upstream and blocked by parents, $T$)

1.5.13 Adjustment criterion

See Shpitser, VanderWeele, and Robins (2012)

The adjustment criterion is satisfied by $Z$ (relative to $X$, $Y$) if:

no element of $Z$ is a descendant in the mutilated graph of any variable $W\not\in X$ which lies on a proper causal path from $X$ to $Y$
$Z$ blocks all noncausal paths from $X$ to $Y$

Note:

mutilated graph: remove arrows pointing into $X$
proper pathway: A proper causal pathway from $X$ to $Y$ only intersects $X$ at the endpoint

1.5.14 These are different. Simple illustration.

Here $Z$ satisfies the adjustment criterion but not the backdoor criterion:

$Z$ is descendant of $X$ but it is not a descendant of a node on a path from $X$ to $Y$. No harm adjusting for $Z$ here, but not necessary either.

1.5.15 Frontdoor criterion (Pearl)

Consider this DAG:

The relationship between $X$ and $Y$ is confounded by $U$.
However the $X\rightarrow Y$ effect is the product of the $X\rightarrow M$ effect and the $M\rightarrow Y$ effect

Why?

1.5.16 Identification through the front door

If:

$M$ (possibly a set) blocks all directed paths from $X$ to $Y$
there is no backdoor path $X$ to $M$
$X$ blocks all backdoor paths from $M$ to $Y$ and
all ($m,z$) combinations arise with positive probability

Then $\Pr(y| \hat x)$ is identifiable and given by:

\[\Pr(y| \hat x) = \sum_m\Pr(m|x)\sum_{x'}\left(\Pr(y|m,x')\Pr(x')\right)\]

1.5.17 Frontdoor criterion (Proof)

We want to get $\Pr(y | \hat x)$

From the graph the joint distribution of variables is:

\[\Pr(x,m,y,u) = \Pr(u)\Pr(x|u)\Pr(m|x)\Pr(y|m,u)\] If we intervened on $X$ we would have ($\Pr(X = x |u)=1$):

\[\Pr(m,y,u | \hat x) = \Pr(u)\Pr(m|x)\Pr(y|m,u)\] If we sum up over $u$ and $m$ we get:

The first part is fine; the second part however involves $u$ which is unobserved. So we need to get the $u$ out of $\sum_u\left(\Pr(y|m,u)\Pr(u)\right)$.

1.5.18 Frontdoor criterion

Now, from the graph:

$M$ is d-separated from $U$ by $X$:

\[\Pr(u|m, x) = \Pr(u|x)\] 2. $X$ is d-separated from $Y$ by $M$, $U$

\[\Pr(y|x, m, u) = \Pr(y|m,u)\] That’s enough to get $u$ out of $\sum_u\left(\Pr(y|m,u)\Pr(u)\right)$

1.5.19 Frontdoor criterion

\[\sum_u\left(\Pr(y|m,u)\Pr(u)\right) = \sum_x\sum_u\left(\Pr(y|m,u)\Pr(u|x)\Pr(x)\right)\]

Using the 2 equalities we got from the graph:

\[\sum_u\left(\Pr(y|m,u)\Pr(u)\right) = \sum_x\sum_u\left(\Pr(y|x,m,u)\Pr(u|x,m)\Pr(x)\right)\]

So:

\[\sum_u\left(\Pr(y|m,u)\Pr(u)\right) = \sum_x\left(\Pr(y|m,x)\Pr(x)\right)\]

Intuitively: $X$ blocks the back door between $Z$ and $Y$ just as well as $U$ does

1.5.20 Frontdoor criterion

Substituting we are left with:

\[\Pr(y| \hat x) = \sum_m\Pr(m|x)\sum_{x'}\left(\Pr(y|m,x')\Pr(x')\right)\]

(The $'$ is to distinguish the $x$ in the summation from the value of $x$ of interest)

It’s interesting that $x$ remains in the right hand side in the calculation of the $m \rightarrow y$ effect, but this is because $x$ blocks a backdoor from $m$ to $y$

1.5.21 Front foor

Bringing all this together into a claim we have:

If:

$M$ (possibly a set) blocks all directed paths from $X$ to $Y$
there is no backdoor path $X$ to $M$
$X$ blocks all backdoor paths from $M$ to $Y$ and
all ($m,z$) combinations arise with positive probability

Then $\Pr(y| \hat x)$ is identifiable and given by:

\[\Pr(y| \hat x) = \sum_m\Pr(m|x)\sum_{x'}\left(\Pr(y|m,x')\Pr(x')\right)\]

1.5.22 Front foor

This is a very elegant and surprising result
There are not many obvious applications of it however
The conditions would be violated for example if unobserved third things caused both $M$ and $Y$

1.6 In code: Dagitty

There is a package (Textor et al. 2016) for figuring out what to condition on.

library(dagitty)

1.6.1 In code: Dagitty

Define a dag using dagitty syntax:

g <- dagitty("dag{X -> M -> Y ; Z -> X ; Z -> R -> Y}")

There is then a simple command to check whether two sets are d-separated by a third set:

dseparated(g, "X", "Y", "M")

[1] FALSE

dseparated(g, "X", "Y", c("Z","M"))

[1] TRUE

1.6.2 Dagitty: Find adjustment sets

And a simple command to identify the adjustments needed to identify the effect of one variable on another:

adjustmentSets(g, exposure = "X", outcome = "Y")

{ R }
{ Z }

1.6.3 Important Examples : Confounding

Example where $Z$ is correlated with $X$ and $Y$ and is a confounder

1.6.4 Confounding

Example where $Z$ is correlated with $X$ and $Y$ but it is not a confounder

1.6.5 Important Examples : Collider

But controlling can also cause problems. In fact conditioning on a temporally pre-treatment variable could cause problems. Who’d have thunk? Here is an example from Pearl (2005):

1.6.6 Illustration of identification failure from conditioning on a collider

U1 <- rnorm(10000);  U2 <- rnorm(10000)
Z  <- U1+U2
X  <- U2 + rnorm(10000)/2
Y  <- U1*2 + X

lm_robust(Y ~ X) |> tidy() |> kable(digits = 2)

term	estimate	std.error	statistic	p.value	conf.low	conf.high	df	outcome
(Intercept)	-0.02	0.02	-1.21	0.23	-0.06	0.01	9998	Y
X	1.02	0.02	56.52	0.00	0.98	1.05	9998	Y

lm_robust(Y ~ X + Z) |> tidy() |> kable(digits = 2)

term	estimate	std.error	statistic	p.value	conf.low	conf.high	df	outcome
(Intercept)	-0.01	0.01	-1.13	0.26	-0.03	0.01	9997	Y
X	-0.34	0.01	-34.98	0.00	-0.36	-0.32	9997	Y
Z	1.67	0.01	220.37	0.00	1.65	1.68	9997	Y

1.6.7 Let’s look at that in dagitty

g <- dagitty("dag{U1 -> Z  ; U1 -> y ; U2 -> Z ; U2 -> x  -> y}")
adjustmentSets(g, exposure = "x", outcome = "y")

{}

isAdjustmentSet(g, "Z", exposure = "x", outcome = "y")

[1] FALSE

isAdjustmentSet(g, NULL, exposure = "x", outcome = "y")

[1] TRUE

Which means, no need to condition on anything.

1.6.8 Collider & Confounder

A bind: from Pearl 1995.

For a solution for a class of related problems see Robins, Hernan, and Brumback (2000)

1.6.9 Let’s look at that in dagitty

g <- dagitty("dag{U1 -> Z  ; U1 -> y ; 
             U2 -> Z ; U2 -> x  -> y; 
             Z -> x}")
adjustmentSets(g, exposure = "x", outcome = "y")

{ U1 }
{ U2, Z }

which means you have to adjust on an unobservable. Here we double check that including or not including “Z” is enough:

isAdjustmentSet(g, "Z", exposure = "x", outcome = "y")

[1] FALSE

isAdjustmentSet(g, NULL, exposure = "x", outcome = "y")

[1] FALSE

1.6.10 Collider & Confounder

So we cannot identify the effect here. But can we still learn about it?

Hall, Ned. 2004. “Two Concepts of Causation.” Causation and Counterfactuals, 225–76.

Halpern, Joseph Y. 2016. Actual Causality. MIT Press.

Imbens, Guido W, and Donald B Rubin. 2015. Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press.

Pearl, Judea, and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect. Basic books.

Robins, James M, Miguel Angel Hernan, and Babette Brumback. 2000. “Marginal Structural Models and Causal Inference in Epidemiology.” LWW.

Shpitser, Ilya, Tyler VanderWeele, and James M Robins. 2012. “On the Validity of Covariate Adjustment for Estimating Causal Effects.” arXiv Preprint arXiv:1203.3515.

Textor, Johannes, Benito van der Zander, Mark S Gilthorpe, Maciej Liśkiewicz, and George TH Ellison. 2016. “Robust Causal Inference Using Directed Acyclic Graphs: The r Package ’Dagitty’.” International Journal of Epidemiology 45 (6): 1887–94. https://doi.org/10.1093/ije/dyw341.

Estimand	Estimator
\(\tau_{ATE} \equiv \mathbb{E}[\tau_i]\)	\(\widehat{\tau}_{ATE} = \sum\nolimits_{x} \frac{w_x}{\sum\nolimits_{j}w_{j}}\widehat{\tau}_x\)
\(\tau_{ATT} \equiv \mathbb{E}[\tau_i \| Z_i = 1]\)	\(\widehat{\tau}_{ATT} = \sum\nolimits_{x} \frac{p_xw_x}{\sum\nolimits_{j}p_jw_j}\widehat{\tau}_x\)
\(\tau_{ATC} \equiv \mathbb{E}[\tau_i \| Z_i = 0]\)	\(\widehat{\tau}_{ATC} = \sum\nolimits_{x} \frac{(1-p_x)w_x}{\sum\nolimits_{j}(1-p_j)w_j}\widehat{\tau}_x\)

Level	Activity	Inquiry
Association	“Seeing”	If I see \(X=1\) should I expect \(Y=1\)?
Intervention	“Doing”	If I set \(X\) to \(1\) should I expect \(Y=1\)?
Counterfactual	“Imagining”	If \(X\) were \(0\) instead of 1, would \(Y\) then be \(0\) instead of \(1\)?

Level	Activity	Inquiry
Association	“Seeing”	\(\Pr(Y=1\|X=1)\)
Intervention	“Doing”	\(\mathbb{E}[\mathbb{I}(Y(1)=1)]\)
Counterfactual	“Imagining”	\(\Pr(Y(1)=1 \& Y(0)=0)\)