Causality

Macartan Humphreys

1 The idea of counterfactual causation

1.1 Potential outcomes and the counterfactual approach

Causation as difference making

1.1.1 Motivation

The intervention based motivation for understanding causal effects:

  • We want to know if a particular intervention (like aid) caused a particular outcome (like reduced corruption).
  • We need to know:
    1. What happened?
    2. What would the outcome have been if there were no intervention?
  • The problem:
    1. … this is hard
    2. … this is impossible

The problem in 2 is that you need to know what would have happened if things were different. You need information on a counterfactual.

1.1.2 Potential Outcomes

  • For each unit, we assume that there are two post-treatment outcomes: \(Y_i(1)\) and \(Y_i(0)\).
    • \(Y(1)\) is the outcome that would obtain if the unit received the treatment.
    • \(Y(0)\) is the outcome that would obtain if it did not.
  • The causal effect of Treatment (relative to Control) is: \(\tau_i = Y_i(1) - Y_i(0)\)
  • Note:
    • The causal effect is defined at the individual level.
    • There is no “data generating process” or functional form.
    • The causal effect is defined relative to something else, so a counterfactual must be conceivable (did Germany cause the second world war?).
    • Are there any substantive assumptions made here so far?

1.1.3 Potential Outcomes

Idea: A causal claim is (in part) a claim about something that did not happen. This makes it metaphysical.

1.1.4 Potential Outcomes

Now that we have a concept of causal effects available, let’s answer two questions:

  • TRANSITIVITY: If for a given unit \(A\) causes \(B\) and \(B\) causes \(C\), does that mean that \(A\) causes \(C\)?

1.1.5 Potential Outcomes

Now that we have a concept of causal effects available, let’s answer two questions:

  • TRANSITIVITY: If for a given unit \(A\) causes \(B\) and \(B\) causes \(C\), does that mean that \(A\) causes \(C\)?

  • A boulder is flying down a mountain. You duck. This saves your life.

  • So the boulder caused the ducking and the ducking caused you to survive.

  • So: did the boulder cause you to survive?

1.1.6 Potential Outcomes

CONNECTEDNESS Say \(A\) causes \(B\) — does that mean that there is a spatiotemporally continuous sequence of causal intermediates?

1.1.7 Potential Outcomes

CONNECTEDNESS Say \(A\) causes \(B\) — does that mean that there is a spatiotemporally continuous sequence of causal intermediates?

  • Person A is planning some action \(Y\); Person B sets out to stop them; person X intervenes and prevents person B from stopping person A. In this case Person A may complete their action, producing Y, without any knowledge that B and X even exist; in particular B and X need not be anywhere close to the action. So: did X cause Y?

1.1.8 Causal claims: Contribution or attribution?

The counterfactual model is about contribution and attribution in a very specific sense.

  • Focus is on non-rival contributions
  • Focus is on conditional attribution. Not: “what caused \(Y\)?” or “What is the cause of \(Y\)?”, but “did \(X\) cause \(Y\) given all other factors were what they were?”

1.1.9 Causal claims: Contribution or attribution?

Consider an outcome \(Y\) that might depend on two causes \(X_1\) and \(X_2\):

\[Y(0,0) = 0\] \[Y(1,0) = 0\] \[Y(0,1) = 0\] \[Y(1,1) = 1\]

What caused \(Y\)? Which cause was most important?

1.1.10 Causal claims: Contribution or attribution?

The counterfactual model is about attribution in a very conditional sense.

  • This is problem for research programs that define “explanation” in terms of figuring out the things that cause \(Y\)

  • Real difficulties conceptualizing what it means to say one cause is more important than another cause. What does that mean?

1.1.11 Causal claims: Contribution or attribution?

Erdogan’s increasing authoritarianism was the most important reason for the attempted coup

  • More important than Turkey’s history of coups?
  • What does that mean?

1.1.12 Causal claims: No causation without manipulation

  • Some seemingly causal claims not admissible.
  • To get the definition off the ground, manipulation must be imaginable (whether practical or not)
  • This renders thinking about effects of race and gender difficult
  • What does it mean to say that Aunt Pat voted for Brexit because she is old?

1.1.13 Causal claims: No causation without manipulation

  • Some seemingly causal claims not admissible.
  • To get the definition off the ground, manipulation must be imaginable (whether practical or not)
  • This renders thinking about effects of race and gender difficult
  • Compare: What does it mean to say that Southern counties voted for Brexit because they have many old people?

1.1.14 Causal claims: No causation without manipulation

More uncomfortably:

What does it mean to say that the tides are caused by the moon? What exactly do we have to imagine…

1.1.15 Causal claims: Causal claims are everywhere

  • Jack exploited Jill

  • It’s Jill’s fault that bucket fell

  • Jack is the most obstructionist member of Congress

  • Melania Trump stole from Michelle Obama’s speech

  • Activists need causal claims

1.1.16 Causal claims: What is actually seen?

  • We have talked about what’s potential, now what do we observe?
  • Say \(Z_i\) indicates whether the unit \(i\) is assigned to treatment \((Z_i=1)\) or not \((Z_i=0)\). It describes the treatment process. Then what we observe is: \[ Y_i = Z_iY_i(1) + (1-Z_i)Y_i(0) \]

This is sometimes called a “switching equation”

In DeclareDesign \(Y\) is realised from potential outcomes and assignment in this way using reveal_outcomes

1.1.17 Causal claims: What is actually seen?

  • Say \(Z\) is a random variable, then this is a sort of data generating process. BUT the key thing to note is

    • \(Y_i\) is random but the randomness comes from \(Z_i\) — the potential outcomes, \(Y_i(1)\), \(Y_i(0)\) are fixed
    • Compare this to a regression approach in which \(Y\) is random but the \(X\)’s are fixed. eg: \[ Y \sim N(\beta X, \sigma^2) \text{ or } Y=\alpha+\beta X+\epsilon, \epsilon\sim N(0, \sigma^2) \]

1.1.18 Causal claims: The estimand and the rub

  • The causal effect of Treatment (relative to Control) is: \[\tau_i = Y_i(1) - Y_i(0)\]
  • This is what we want to estimate.
  • BUT: We never can observe both \(Y_i(1)\) and \(Y_i(0)\)!
  • This is the fundamental problem (Holland (1986))

1.1.19 Causal claims: The rub and the solution

  • Now for some magic. We really want to estimate: \[ \tau_i = Y_i(1) - Y_i(0)\]

  • BUT: We never can observe both \(Y_i(1)\) and \(Y_i(0)\)

  • Say we lower our sights and try to estimate an average treatment effect: \[ \tau = \mathbb{E} [Y(1)-Y(0)]\]

  • Now make use of the fact that \[\mathbb E[Y(1)-Y(0)] = \mathbb E[Y(1)]- \mathbb E [Y(0)] \]

  • In words: The average of differences is equal to the difference of averages.

  • The magic is that while we can’t hope to measure the differences; we are good at measuring averages.

1.1.20 Causal claims: The rub and the solution

  • So we want to estimate \(\mathbb{E} [Y(1)]\) and \(\mathbb{E} [Y(0)]\).
  • We know that we can estimate averages of a quantity by taking the average value from a random sample of units
  • To do this here we need to select a random sample of the \(Y(1)\) values and a random sample of the \(Y(0)\) values, in other words, we randomly assign subjects to treatment and control conditions.
  • When we do that we can in fact estimate: \[ \mathbb {E}_N[Y_i(1) | Z_i = 1) - \mathbb {E}_N(Y_i(0) | Z_i = 0]\] which in expectation equals: \[ \mathbb{E} [Y_i(1) | Z_i = 1 \text{ or } Z_i = 0] - \mathbb{E} [Y_i(0) | Z_i = 1 \text{ or } Z_i = 0]\]

1.1.21 Causal claims: The rub and the solution

  • This highlights a deep connection between random assignment and random sampling: when we do random assignment we are in fact randomly sampling from different possible worlds.

1.1.22 Causal claims: The rub and the solution

This provides a positive argument for causal inference from randomization, rather than simply saying with randomization “everything else is controlled for”

Let’s discuss:

  • Does the fact that an estimate is unbiased mean that it is right?
  • Can a randomization “fail”?
  • Where are the covariates?

1.1.23 Causal claims: The rub and the solution

Idea: random assignment is random sampling from potential worlds: to understand anything you find, you need to know the sampling weights

1.1.24 Reflection

Idea: We now have a positive argument for claiming unbiased estimation of the average treatment effect following random assignment

But is the average treatment effect a quantity of social scientific interest?

1.1.25 Potential outcomes: why randomization works

The average of the differences \(\approx\) difference of averages

1.1.26 Potential outcomes: heterogeneous effects

The average of the differences \(\approx\) difference of averages

1.1.27 Potential outcomes: heterogeneous effects

Question: \(\approx\) or \(=\)?

1.1.28 Exercise your potential outcomes 1

Consider the following potential outcomes table:

Unit Y(0) Y(1) \(\tau_i\)
1 4 3
2 2 3
3 1 3
4 1 3
5 2 3

Questions for us: What are the unit level treatment effects? What is the average treatment effect?

1.1.29 Exercise your potential outcomes 2

Consider the following potential outcomes table:

In treatment? Y(0) Y(1)
Yes 2
No 3
No 1
Yes 3
Yes 3
No 2

Questions for us: Fill in the blanks.

  • Assuming a constant treatment effect of \(+1\)
  • Assuming a constant treatment effect of \(-1\)
  • Assuming an average treatment effect of \(0\)

What is the actual treatment effect?

1.2 Pause

Take a short break!

1.3 Endogeneous subgroups

1.3.1 Endogeneous Subgroups

Experiments often give rise to endogenous subgroups. The potential outcomes framework can make it clear why this can cause problems.

We are going to look at three examples in which you might be tempted to condition on an endogeneous subgroup:

  1. focusing on “reporting” subgroups
  2. handling attrition
  3. analyzing multistage behavior

[We will split up and try to make sense of each of these in groups]

1.3.2 Endogeneous Subgroups

General problem:

  • is usually of the form: \(Y[X1, X2]\) — outcomes depend on multiple features, but you condition on the observed or revealed value of one

  • usually requires heterogeneity

  • involves conditioning on some post treatment feature that varies across types

1.3.3 Heterogeneous Effects with Endogeneous Categories

  • Problems arise in analyses of subgroups when the categories themselves are affected by treatment

  • Example from our work:

    • You want to know if an intervention affects reporting on violence against women
    • You measure the share of all subjects that experienced violence that file reports
    • The problem is that which subjects experienced violence is itself a function of treatment

1.3.4 Heterogeneous Effects with Endogeneous Categories

  • V(t): Violence(Treatment)
  • R(t, v): Reporting(Treatment, Violence)
V(0) V(1) R(0,1) R(1,1) R(0,0) R(1,0)
Type 1 (reporter) 1 1 1 1 0 0
Type 2 (non reporter) 1 0 0 0 0 0
  • Expected reporting given violence in control = Pr(Type 1) (explanation: both types see violence but only Type 1 reports)

  • Expected reporting given violence in treatment = 100% (explanation: only Type 1 sees violence and this type also reports)

So you might infer a large effect on violence reporting.

Question: What is the actual effect of treatment on the propensity to report violence?

1.3.5 Heterogeneous Effects with Endogeneous Categories

It is possible that in truth no one’s reporting behavior has changed, what has changed is the propensity of people with different propensities to report to experience violence:

Reporter No Violence Violence % Report
Control Yes
No
25
25
25
25
\(\frac{25}{25+25}=50\%\)
Treatment Yes
No
25
50
25
0
\(\frac{25}{25+0}=100\%\)

1.3.6 Heterogeneous Effects with Endogeneous Categories

This problem can arise as easily in seemingly simple field experiments. Example:

  • In one study we provided constituents with information about performance of politicians
  • We told politicians in advance so that they could take action
  • We wanted to see whether voters punished poorly performing politicians

What’s the problem?

1.3.7 Endogeneous Categories: Test yourself

Question for us:

  • Quotas for women are randomly placed in a set of constituencies in year 1. All winners in these areas are women; in other areas only some are.
  • In year 2 these quotas are then lifted.

Which problems face an endogenous subgroup issue?:

1.3.8 Endogeneous Categories: Test yourself

Which problems face an endogenous subgroup issue?:

  1. You want to estimate the likelihood that a woman will stand for reelection in treatment versus control areas in year 2.
  2. You want to estimate whether incumbents are more likely to be reelected in treatment versus control areas in year 2
  3. You want to estimate how much treatment areas have more re-elected incumbents in elections in year 2 compared to control

1.3.9 Endogeneous Categories: Responses

In such cases you can:

  • Examine the joint distribution of multiple outcomes
  • Condition on pretreatment features only
  • Engage in mediation analysis

1.3.10 Missing data can create an endogeneous subgroup problem

  • It is well known that missing data can undo the magic of random assignment.
  • One seemingly promising approach is to match into pairs ex ante and drop pairs together ex post.
  • Say potential outcomes looked like this (2 pairs of 2 units):
Pair I I II II
Unit 1 2 3 4 Average
Y(0) 0 0 0 0
Y(1) -3 1 1 1
\(\tau\) -3 1 1 1 0

1.3.11 Missing data

  • Say though that treated cases are likely to drop out of the sample if things go badly (e.g. they get a negative score or die)
  • Then you might see no attrition if those would-be attritors are not treated.
  • You might assume you have no problem (after all, no attrition).
  • No missing data when the normal cases happens to be selected
Pair I I II II
Unit 1 2 3 4 Average
Y(0) 0 0 0
Y(1) 1 1 1
\(\hat{\tau}\) 1

1.3.12 Missing data

  • But in cases in which you have attrition, dropping the pair doesn’t necessarily help.
  • The problem is potential missingness still depends on potential outcomes
  • The kicker is that the method can produce bias even if (in fact) there is no attrition!
  • But missing data when the vulnerable cases happens to be selected
Pair I I II II
Unit 1 2 3 4 Average
Y(0) [0] 0 0
Y(1) [-3] 1 1
\(\hat{\tau}\) 1

1.3.13 Missing data

Note: The right way to think about this is that bias is a property of the strategy over possible realizations of data and not normally a property of the estimator conditional on the data.

1.3.14 Multistage games

Multistage games can also present an endogenous group problem since collections of late stage players facing a given choice have been created by early stage players.

1.3.15 Multistage games

Question: Does visibility alter the extent to which subjects follow norms to punish antisocial behavior (and reward prosocial behavior)? Consider a trust game in which we are interested in how information on receivers affects their actions

Table 1: Return rates given investments under different conditions.
Return rates given investments under different conditions
Average % returned
Visibility Treatment % invested (average) ...when 10% invested ...when 50% invested
Control: Masked information on respondents 30% 20% 40%
Treatment: Full information on respondents 30% 0% 60%

What do we think? Does visibility make people react more to investments?

1.3.16 Multistage games

Imagine you could see all the potential outcomes, and they looked like this:

Table 2: Potential outcomes with (and without) identity protection.
Potential outcomes with (and without) identity protection
Responder’s return decision (given type)
Avg.
Offered behavior Nice 1 Nice 2 Nice 3 Mean 1 Mean 2 Mean 3
Invest 10% 60% 60% 60% 0% 0% 0% 30%
Invest 50% 60% 60% 60% 0% 0% 0% 30%

Conclusion: Both the offer and the information condition are completely irrelevant for all subjects.

1.3.17 Multistage games

Unfortunately you only see a sample of the potential outcomes, and that looks like this:

Table 3: Outcomes when respondent is visible.
Outcomes when respondent is visible
Responder’s return decision (given type)
Avg.
Offered behavior Nice 1 Nice 2 Nice 3 Mean 1 Mean 2 Mean 3
Invest 10% 0% 0% 0% 0%
Invest 50% 60% 60% 60% 60%

False Conclusion: When not protected, responders condition behavior strongly on offers (because offerers can select on type accurately)

In fact: The nice types invest more because they are nice. The responders return more to the nice types because they are nice.

1.3.18 Multistage games

Unfortunately you only see a (noisier!) sample of the potential outcomes, and that looks like this:

Table 4: Outcomes when respondent is not visible.
Outcomes when respondent is not visible
Responder’s return decision (given type)
Avg.
Offered behavior Nice 1 Nice 2 Nice 3 Mean 1 Mean 2 Mean 3
Invest 10% 60% 0% 0% 20%
Invest 50% 60% 60% 0% 40%

False Conclusion: When protected, responders condition behavior less strongly on offers (because offerers can select on type less accurately)

1.3.19 Multistage games

What to do?

Solutions?

  1. Analysis could focus on the effect of treatment on respondent behavior, directly.
    • This would get the correct answer but to a different question (Does information affect the share of contributions returned by subjects on average?)
  2. Strategy method can sometimes help address the problem, but note that that is (a) changing the question and (b) putting demands on respondent imagination and honesty
  3. First mover action could be directly manipulated, but unless deception is used that is also changing the question
  4. First movers could be selected because they act in predictable ways (bordering on deception?)

Take away: Proceed with extreme caution when estimating effects beyond the first stage.

2 Estimands

The inquiries you have…

2.1 Estimands and inquiries

  • Your inquiry is your question and the estimand is the true (generally unknown) answer to the inquiry
  • The estimand is the thing you want to estimate
  • If you are estimating something you should be able to say what your estimand is
  • You are responsible for your estimand. Your estimator will not tell you what your estimand is
  • Just because you can calculate something does not mean that you have an estimand
  • You can test a hypothesis without having an estimand

Read: II ch 4, DD, ch 7

2.1.1 Estimands: ATE, ATT, ATC, S-, P-

  • ATE is Average Treatment Effect (all units)
  • ATT is Average Treatment Effect on the Treated
  • ATC is Average Treatment Effect on the Controls

2.1.2 Estimands: ATE, ATT, ATC, S-, P-

Say that units are randomly assigned to treatment in different strata (maybe just one); with fixed, though possibly different, shares assigned in each stratum. Then the key estimands and estimators are:

Estimand Estimator
\(\tau_{ATE} \equiv \mathbb{E}[\tau_i]\) \(\widehat{\tau}_{ATE} = \sum\nolimits_{x} \frac{w_x}{\sum\nolimits_{j}w_{j}}\widehat{\tau}_x\)
\(\tau_{ATT} \equiv \mathbb{E}[\tau_i | Z_i = 1]\) \(\widehat{\tau}_{ATT} = \sum\nolimits_{x} \frac{p_xw_x}{\sum\nolimits_{j}p_jw_j}\widehat{\tau}_x\)
\(\tau_{ATC} \equiv \mathbb{E}[\tau_i | Z_i = 0]\) \(\widehat{\tau}_{ATC} = \sum\nolimits_{x} \frac{(1-p_x)w_x}{\sum\nolimits_{j}(1-p_j)w_j}\widehat{\tau}_x\)

where \(x\) indexes strata, \(p_x\) is the share of units in each stratum that is treated, and \(w_x\) is the size of a stratum.

2.1.3 Estimands: ATE, ATT, ATC, S-, P-, C-

In addition, each of these can be targets of interest:

  • for the population, in which case we refer to PATE, PATT, PATC and \(\widehat{PATE}, \widehat{PATT}, \widehat{PATC}\)
  • for a sample, in which case we refer to SATE, SATT, SATC, and \(\widehat{SATE}, \widehat{SATT}, \widehat{SATC}\)

And for different subgroups,

  • given some value on a covariate, in which case we refer to CATE (conditional average treatment effect)

2.1.4 Broader classes of estimands: LATE/CATE

The CATEs are conditional average treatment effects, for example the effect for men or for women. These are straightfoward.

However we might also imagine conditioning on unobservable or counterfactual features.

  • The LATE (or CACE: “complier average causal effect”) asks about the effect of a treatment (\(X\)) on an outcome (\(Y\)) for people that are responsive to an encouragement (\(Z\))

\[LATE = \frac{1}{|C|}\sum_{j\in C}(Y_j(X=1) - Y_j(X=0))\] \[C:=\{j:X_j(Z=1) > X_j(Z=0) \}\]

We will return to these in the study of instrumental variables.

2.1.5 Quantile estimands

Other ways to condition on potential outcomes:

  • A quantile treatment effect: You might be interested in the difference between the median \(Y(1)\) and the median \(Y(0)\) (Imbens and Rubin (2015) 20.3.1)
  • or even be interested in the median \(Y(1) - Y(0)\). Similarly for other quantiles.

2.1.6 Model estimands

Many inquiries are averages of individual effects, even if the groups are not known, but they do not have to be:

  • The RDD estimand is a statement about what effects would be at a threshold; it can be defined under a model even if no actual individuals are at the threshold. We imagine average potential outcomes as a function of treatment \(Z\) and running variable \(X\), \(f(z, x)\) and define: \[\tau_{RDD} := f(1, x^*) - f(0, x^*)\]

2.1.7 Distribution estimands

Many inquiries are averages of individual effects, even if the groups are not known,

But they do not have to be:

  • Inquiries might relate to distributional quantities such as:

    • The effect of treatment on the variance in outcomes: \(var(Y(1)) - var(Y(0))\)
    • The variance of treatment effects: \(var(Y(1) - Y(0))\)
    • Other inequality measures (e.g. Ginis; (Imbens and Rubin (2015) 20.3.2))

You might even be interested in \(\min(Y_i(1) - Y_i(0))\).

2.1.8 Spillover estimands

There are lots of interesting “spillover” estimands.

Imagine there are three individuals and each person’s outcomes depends on the assignments of all others. For instance \(Y_1(Z_1, Z_2, Z_3\), or more generally, \(Y_i(Z_i, Z_{i+1 (\text{mod }3)}, Z_{i+2 (\text{mod }3)})\).

Then three estimands might be:

  • \(\frac13\left(\sum_{i}{Y_i(1,0,0) - Y_i(0,0,0)}\right)\)
  • \(\frac13\left(\sum_{i}{Y_i(1,1,1) - Y_i(0,0,0)}\right)\)
  • \(\frac13\left(\sum_{i}{Y_i(0,1,1) - Y_i(0,0,0)}\right)\)

Interpret these. What others might be of interest?

2.1.9 Estimands for a continuous treatment

Say our treatment \(X\) varies continuous over \([0,1]\), and is randomly assigned.

What estimand captures “the average effect of \(X\) on \(Y\)?

2.1.10 Estimands for a continuous treatment

In class exercise:

  • Write down potential outcomes for two units that can take on up to 5 values of X.

  • Assume a non linear but (largely?) homogeneous relationship (up to a constant, for example)

  • Define your estimand in terms of the potential outcomes

  • Imagine the multiple plot results you might get from observing two units.

2.1.11 Differences in CATEs and interaction estimands

A difference in CATEs is a well defined estimand that might involve interventions on one node only:

  • \(\mathbb{E}_{\{W=1\}}[Y(X=1) - Y(X=0)] - \mathbb{E}_{\{W=0\}}[Y(X=1) - Y(X=0)]\)

It captures differences in effects.

An interaction is an effect on an effect:

  • \(\mathbb{E}[Y(X=1, W=1) - Y(X=0, W=1)] - \mathbb{E}[Y(X=1, W=0) - Y(X=0, W=0)]\)

Note in the latter the expectation is taken over the whole population.

Do not mix these up!

2.2 Interaction estimands

Consider a binary treatment D, randomized. and an interest in the way X – possibly randomized – moderates the effect of D.

2.3 Interaction estimands

The estimands can be seen from the following graphs:

  • Black points show actual marginal effects
  • The BLP (blue line) is positively sloped
  • The “average modification” effect is 0: this is the difference between the most left and most right marginal effect
  • The interprobe fitted values are in red
  • The CATES within bins are marked green
  • The effect at the lowest X is on the further left blackpoint

2.4 Interaction estimands

labels inquiry meaning formal_definition
ATE \(\text{ATE}\) Average treatment effect \(\mathbb{E}[Y_1 - Y_0]\)
BLP \(\text{BLP}\) Best linear predictor \(b \text{ from } \arg\min_{a, b} \mathbb{E}[((Y_1 - Y_0) - (a + bX))^2]\)
CATE_min \(\text{CATE}_{\min}\) CATE at min \(x\) \(\mathbb{E}[Y_1 - Y_0 \mid X = \min(X)]\)
CATE_L \(\text{CATE}_{\text{L}}\) CATE for the lower bound group \(\mathbb{E}[Y_1 - Y_0 \mid X \in \text{L}]\)
CATE_M \(\text{CATE}_{\text{M}}\) CATE for the medium group \(\mathbb{E}[Y_1 - Y_0 \mid X \in \text{M}]\)
CATE_H \(\text{CATE}_{\text{H}}\) CATE for the high group \(\mathbb{E}[Y_1 - Y_0 \mid X \in \text{H}]\)
D_CATE \(\Delta_{\text{CATE}}\) Difference in CATEs (H v. L) \(\mathbb{E}[Y_1 - Y_0 \mid X \in \text{H}] - \mathbb{E}[Y_1 - Y_0 \mid X \in \text{L}]\)
tau_CATE \(\tau_{\text{CATE}}\) Effect of group on group CATEs (H v. L) \(\mathbb{E}[(Y(1, x_H^*) - Y(0, x_H^*)) - (Y(1, x_L^*) - Y(0, x_L^*))]\)
ADC \(\text{ADC}\) Average difference in CATEs \(\frac{\mathbb{E}\left[(Y_1 - Y_0 \mid X = x + \delta)\right] - \mathbb{E}\left[(Y_1 - Y_0 \mid X = x)\right]}{\delta}\)
AIE \(\text{AIE}\) Average interaction effect \(\mathbb{E}\left[\frac{(Y(1, x+\delta) - Y(0, x+\delta)) - (Y(1, x) - Y(0, x))}{\delta}\right]\)

2.4.1 Mediation estimands and complex counterfactuals

Say \(X\) can affect \(Y\) directly, or indirectly through \(M\). then we can write potential outcomes as:

  • \(Y(X=x, M=m)\)
  • \(M(X=x)\)

We can then imagine inquiries of the form:

  • \(Y(X=1, M=M(X=1)) - Y(X=0, M=M(X=0))\)
  • \(Y(X=1, M=1) - Y(X=0, M=1)\)
  • \(Y(X=1, M=M(X=1)) - Y(X=1, M=M(X=0))\)

Interpret these. What others might be of interest?

2.4.2 Mediation estimands and complex counterfactuals

Again we might imagine that these are defined with respect to some group:

  • \(A = \{i|Y_i(1, M(X=1)) > Y_i(0, M(X=0))\}\)
  • \(\frac{1}{|A|} \sum_{i\in A}(Y(1, 1) > Y(0, 1))\)

here, among those for whom \(X\) has a positive effect on \(Y\), for what share would there be a positive effect if \(M\) were fixed at 1?

2.4.3 Causes of effects and effects of causes

In qualitative research a particularly common inquiry is “did \(X=1\) cause \(Y=1\)?

This is often given as a probability, the “probability of causation” (though at the case level we might better think of this probability as an estimate rather than an estimand):

\[\Pr(Y_i(0) = 0 | Y_i(1) = 1, X = 1)\]

2.4.4 Causes of effects and effects of causes

Intuition: What’s the probability \(X=1\) caused \(Y=1\) in an \(X=1, Y=1\) case drawn from a large population with the following experimental distribution:

Y=0 Y=1 All
X=0 1 0 1
X=1 0.25 0.75 1

2.4.5 Causes of effects and effects of causes

Intuition: What’s the probability \(X=1\) caused \(Y=1\) in an \(X=1, Y=1\) case drawn from a large population with the following experimental distribution:

Y=0 Y=1 All
X=0 0.75 0.25 1
X=1 0.25 0.75 1

2.4.6 Actual causation

Other inquiries focus on distinguishing between causes.

For the Billy Suzy problem (Hall 2004), Halpern (2016) focuses on “actual causation” as a way to distinguish between Suzy and Billy:

Imagine Suzy and Billy, simultaneously throwing stones at a bottle. Both are excellent shots and hit whatever they aim at. Suzy’s stone hits first, knocks over the bottle, and the bottle breaks. However, Billy’s stone would have hit had Suzy’s not hit, and again the bottle would have broken. Did Suzy’s throw cause the bottle to break? Did Billy’s?

2.4.7 Actual causation

Actual Causation:

  1. \(X=x\) and \(Y=y\) both happened;
  2. there is some set of variables, \(\mathcal W\), such that if they were fixed at the levels that they actually took on in the case, and if \(X\) were to be changed, then \(Y\) would change (where \(\mathcal W\) can also be an empty set);
  3. no strict subset of \(X\) satisfies 1 and 2 (there is no redundant part of the condition, \(X=x\)).

2.4.8 Actual causation

  • Suzy: Condition 2 is met if Suzy’s throw made a difference, counterfactually speaking—with the important caveat that, in determining this, we are permitted to condition on Billy’ stone not hitting the bottle.
  • Billy: Condition 2 is not met.

An inquiry: for what share in a population is a possible cause an actual cause?

2.4.9 Pearl’s ladder

Pearl (e.g. Pearl and Mackenzie (2018)) describes three types of inquiry:

Level Activity Inquiry
Association “Seeing” If I see \(X=1\) should I expect \(Y=1\)?
Intervention “Doing” If I set \(X\) to \(1\) should I expect \(Y=1\)?
Counterfactual “Imagining” If \(X\) were \(0\) instead of 1, would \(Y\) then be \(0\) instead of \(1\)?

2.4.10 Pearl’s ladder

We can understand these as asking different types of questions about a causal model

Level Activity Inquiry
Association “Seeing” \(\Pr(Y=1|X=1)\)
Intervention “Doing” \(\mathbb{E}[\mathbb{I}(Y(1)=1)]\)
Counterfactual “Imagining” \(\Pr(Y(1)=1 \& Y(0)=0)\)

The third is qualitatively different because it requires information about two mutually incompatible conditions for units. This is not (generally ) recoverable directly from knowledge of \(\Pr(Y(1)=1)\) and \(\Pr(Y(0)=0)\).

2.4.11 Recap: Things you need to know about causal inference

  1. A causal claim is a statement about what didn’t happen.
  2. If you know that \(A\) causes \(B\) and that \(B\) causes \(C\), this does not mean that you know that \(A\) causes \(C\).
  3. There is no causation without manipulation.
  4. There is a fundamental problem of causal inference.
  5. You can estimate average causal effects even if you cannot observe any individual causal effects.
  6. Estimating average causal effects via differences in means does not require that treatment and control groups are identical.
  7. Estimating average causal effects via differences in means is fraught when you condition on post treatment variables or on colliders.

3 References

Hall, Ned. 2004. “Two Concepts of Causation.” Causation and Counterfactuals, 225–76.
Halpern, Joseph Y. 2016. Actual Causality. MIT Press.
Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal of the American Statistical Association 81 (396): 945–60.
Imbens, Guido W, and Donald B Rubin. 2015. Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press.
Pearl, Judea, and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect. Basic books.