| Visibility Treatment | % invested (average) | ...when 10% invested | ...when 50% invested |
|---|---|---|---|
| Control: Masked information on respondents | 30% | 20% | 40% |
| Treatment: Full information on respondents | 30% | 0% | 60% |
Causation as difference making
The intervention based motivation for understanding causal effects:
The problem in 2 is that you need to know what would have happened if things were different. You need information on a counterfactual.
Idea: A causal claim is (in part) a claim about something that did not happen. This makes it metaphysical.
Now that we have a concept of causal effects available, let’s answer two questions:
Now that we have a concept of causal effects available, let’s answer two questions:
TRANSITIVITY: If for a given unit \(A\) causes \(B\) and \(B\) causes \(C\), does that mean that \(A\) causes \(C\)?
A boulder is flying down a mountain. You duck. This saves your life.
So the boulder caused the ducking and the ducking caused you to survive.
So: did the boulder cause you to survive?
CONNECTEDNESS Say \(A\) causes \(B\) — does that mean that there is a spatiotemporally continuous sequence of causal intermediates?
CONNECTEDNESS Say \(A\) causes \(B\) — does that mean that there is a spatiotemporally continuous sequence of causal intermediates?
The counterfactual model is about contribution and attribution in a very specific sense.
Consider an outcome \(Y\) that might depend on two causes \(X_1\) and \(X_2\):
\[Y(0,0) = 0\] \[Y(1,0) = 0\] \[Y(0,1) = 0\] \[Y(1,1) = 1\]
What caused \(Y\)? Which cause was most important?
The counterfactual model is about attribution in a very conditional sense.
This is problem for research programs that define “explanation” in terms of figuring out the things that cause \(Y\)
Real difficulties conceptualizing what it means to say one cause is more important than another cause. What does that mean?
Erdogan’s increasing authoritarianism was the most important reason for the attempted coup
More uncomfortably:
What does it mean to say that the tides are caused by the moon? What exactly do we have to imagine…
Jack exploited Jill
It’s Jill’s fault that bucket fell
Jack is the most obstructionist member of Congress
Melania Trump stole from Michelle Obama’s speech
Activists need causal claims
This is sometimes called a “switching equation”
In DeclareDesign \(Y\) is realised from potential outcomes and assignment in this way using reveal_outcomes
Say \(Z\) is a random variable, then this is a sort of data generating process. BUT the key thing to note is
Now for some magic. We really want to estimate: \[ \tau_i = Y_i(1) - Y_i(0)\]
BUT: We never can observe both \(Y_i(1)\) and \(Y_i(0)\)
Say we lower our sights and try to estimate an average treatment effect: \[ \tau = \mathbb{E} [Y(1)-Y(0)]\]
Now make use of the fact that \[\mathbb E[Y(1)-Y(0)] = \mathbb E[Y(1)]- \mathbb E [Y(0)] \]
In words: The average of differences is equal to the difference of averages.
The magic is that while we can’t hope to measure the differences; we are good at measuring averages.
This provides a positive argument for causal inference from randomization, rather than simply saying with randomization “everything else is controlled for”
Let’s discuss:
Idea: random assignment is random sampling from potential worlds: to understand anything you find, you need to know the sampling weights
Idea: We now have a positive argument for claiming unbiased estimation of the average treatment effect following random assignment
But is the average treatment effect a quantity of social scientific interest?
The average of the differences \(\approx\) difference of averages
The average of the differences \(\approx\) difference of averages
Question: \(\approx\) or \(=\)?
Consider the following potential outcomes table:
| Unit | Y(0) | Y(1) | \(\tau_i\) |
|---|---|---|---|
| 1 | 4 | 3 | |
| 2 | 2 | 3 | |
| 3 | 1 | 3 | |
| 4 | 1 | 3 | |
| 5 | 2 | 3 |
Questions for us: What are the unit level treatment effects? What is the average treatment effect?
Consider the following potential outcomes table:
| In treatment? | Y(0) | Y(1) |
|---|---|---|
| Yes | 2 | |
| No | 3 | |
| No | 1 | |
| Yes | 3 | |
| Yes | 3 | |
| No | 2 |
Questions for us: Fill in the blanks.
What is the actual treatment effect?
Take a short break!
Experiments often give rise to endogenous subgroups. The potential outcomes framework can make it clear why this can cause problems.
We are going to look at three examples in which you might be tempted to condition on an endogeneous subgroup:
[We will split up and try to make sense of each of these in groups]
General problem:
is usually of the form: \(Y[X1, X2]\) — outcomes depend on multiple features, but you condition on the observed or revealed value of one
usually requires heterogeneity
involves conditioning on some post treatment feature that varies across types
Problems arise in analyses of subgroups when the categories themselves are affected by treatment
Example from our work:
| V(0) | V(1) | R(0,1) | R(1,1) | R(0,0) | R(1,0) | |
|---|---|---|---|---|---|---|
| Type 1 (reporter) | 1 | 1 | 1 | 1 | 0 | 0 |
| Type 2 (non reporter) | 1 | 0 | 0 | 0 | 0 | 0 |
Expected reporting given violence in control = Pr(Type 1) (explanation: both types see violence but only Type 1 reports)
Expected reporting given violence in treatment = 100% (explanation: only Type 1 sees violence and this type also reports)
So you might infer a large effect on violence reporting.
Question: What is the actual effect of treatment on the propensity to report violence?
It is possible that in truth no one’s reporting behavior has changed, what has changed is the propensity of people with different propensities to report to experience violence:
| Reporter | No Violence | Violence | % Report | |
|---|---|---|---|---|
| Control | Yes No |
25 25 |
25 25 |
\(\frac{25}{25+25}=50\%\) |
| Treatment | Yes No |
25 50 |
25 0 |
\(\frac{25}{25+0}=100\%\) |
This problem can arise as easily in seemingly simple field experiments. Example:
What’s the problem?
Question for us:
Which problems face an endogenous subgroup issue?:
Which problems face an endogenous subgroup issue?:
In such cases you can:
| Pair | I | I | II | II | |
|---|---|---|---|---|---|
| Unit | 1 | 2 | 3 | 4 | Average |
| Y(0) | 0 | 0 | 0 | 0 | |
| Y(1) | -3 | 1 | 1 | 1 | |
| \(\tau\) | -3 | 1 | 1 | 1 | 0 |
| Pair | I | I | II | II | |
|---|---|---|---|---|---|
| Unit | 1 | 2 | 3 | 4 | Average |
| Y(0) | 0 | 0 | 0 | ||
| Y(1) | 1 | 1 | 1 | ||
| \(\hat{\tau}\) | 1 |
| Pair | I | I | II | II | |
|---|---|---|---|---|---|
| Unit | 1 | 2 | 3 | 4 | Average |
| Y(0) | [0] | 0 | 0 | ||
| Y(1) | [-3] | 1 | 1 | ||
| \(\hat{\tau}\) | 1 |
Note: The right way to think about this is that bias is a property of the strategy over possible realizations of data and not normally a property of the estimator conditional on the data.
Multistage games can also present an endogenous group problem since collections of late stage players facing a given choice have been created by early stage players.
Question: Does visibility alter the extent to which subjects follow norms to punish antisocial behavior (and reward prosocial behavior)? Consider a trust game in which we are interested in how information on receivers affects their actions
Average % returned
|
|||
|---|---|---|---|
| Visibility Treatment | % invested (average) | ...when 10% invested | ...when 50% invested |
| Control: Masked information on respondents | 30% | 20% | 40% |
| Treatment: Full information on respondents | 30% | 0% | 60% |
What do we think? Does visibility make people react more to investments?
Imagine you could see all the potential outcomes, and they looked like this:
Responder’s return decision (given type)
|
Avg.
|
||||||
|---|---|---|---|---|---|---|---|
| Offered behavior | Nice 1 | Nice 2 | Nice 3 | Mean 1 | Mean 2 | Mean 3 | |
| Invest 10% | 60% | 60% | 60% | 0% | 0% | 0% | 30% |
| Invest 50% | 60% | 60% | 60% | 0% | 0% | 0% | 30% |
Conclusion: Both the offer and the information condition are completely irrelevant for all subjects.
Unfortunately you only see a sample of the potential outcomes, and that looks like this:
Responder’s return decision (given type)
|
Avg.
|
||||||
|---|---|---|---|---|---|---|---|
| Offered behavior | Nice 1 | Nice 2 | Nice 3 | Mean 1 | Mean 2 | Mean 3 | |
| Invest 10% | 0% | 0% | 0% | 0% | |||
| Invest 50% | 60% | 60% | 60% | 60% | |||
False Conclusion: When not protected, responders condition behavior strongly on offers (because offerers can select on type accurately)
In fact: The nice types invest more because they are nice. The responders return more to the nice types because they are nice.
Unfortunately you only see a (noisier!) sample of the potential outcomes, and that looks like this:
Responder’s return decision (given type)
|
Avg.
|
||||||
|---|---|---|---|---|---|---|---|
| Offered behavior | Nice 1 | Nice 2 | Nice 3 | Mean 1 | Mean 2 | Mean 3 | |
| Invest 10% | 60% | 0% | 0% | 20% | |||
| Invest 50% | 60% | 60% | 0% | 40% | |||
False Conclusion: When protected, responders condition behavior less strongly on offers (because offerers can select on type less accurately)
What to do?
Solutions?
Take away: Proceed with extreme caution when estimating effects beyond the first stage.
The inquiries you have…
Say that units are randomly assigned to treatment in different strata (maybe just one); with fixed, though possibly different, shares assigned in each stratum. Then the key estimands and estimators are:
| Estimand | Estimator |
|---|---|
| \(\tau_{ATE} \equiv \mathbb{E}[\tau_i]\) | \(\widehat{\tau}_{ATE} = \sum\nolimits_{x} \frac{w_x}{\sum\nolimits_{j}w_{j}}\widehat{\tau}_x\) |
| \(\tau_{ATT} \equiv \mathbb{E}[\tau_i | Z_i = 1]\) | \(\widehat{\tau}_{ATT} = \sum\nolimits_{x} \frac{p_xw_x}{\sum\nolimits_{j}p_jw_j}\widehat{\tau}_x\) |
| \(\tau_{ATC} \equiv \mathbb{E}[\tau_i | Z_i = 0]\) | \(\widehat{\tau}_{ATC} = \sum\nolimits_{x} \frac{(1-p_x)w_x}{\sum\nolimits_{j}(1-p_j)w_j}\widehat{\tau}_x\) |
where \(x\) indexes strata, \(p_x\) is the share of units in each stratum that is treated, and \(w_x\) is the size of a stratum.
In addition, each of these can be targets of interest:
And for different subgroups,
The CATEs are conditional average treatment effects, for example the effect for men or for women. These are straightfoward.
However we might also imagine conditioning on unobservable or counterfactual features.
\[LATE = \frac{1}{|C|}\sum_{j\in C}(Y_j(X=1) - Y_j(X=0))\] \[C:=\{j:X_j(Z=1) > X_j(Z=0) \}\]
We will return to these in the study of instrumental variables.
Other ways to condition on potential outcomes:
Many inquiries are averages of individual effects, even if the groups are not known, but they do not have to be:
Many inquiries are averages of individual effects, even if the groups are not known,
But they do not have to be:
Inquiries might relate to distributional quantities such as:
You might even be interested in \(\min(Y_i(1) - Y_i(0))\).
There are lots of interesting “spillover” estimands.
Imagine there are three individuals and each person’s outcomes depends on the assignments of all others. For instance \(Y_1(Z_1, Z_2, Z_3\), or more generally, \(Y_i(Z_i, Z_{i+1 (\text{mod }3)}, Z_{i+2 (\text{mod }3)})\).
Then three estimands might be:
Interpret these. What others might be of interest?
Say our treatment \(X\) varies continuous over \([0,1]\), and is randomly assigned.
What estimand captures “the average effect of \(X\) on \(Y\)?
In class exercise:
Write down potential outcomes for two units that can take on up to 5 values of X.
Assume a non linear but (largely?) homogeneous relationship (up to a constant, for example)
Define your estimand in terms of the potential outcomes
Imagine the multiple plot results you might get from observing two units.
A difference in CATEs is a well defined estimand that might involve interventions on one node only:
It captures differences in effects.
An interaction is an effect on an effect:
Note in the latter the expectation is taken over the whole population.
Do not mix these up!
Consider a binary treatment D, randomized. and an interest in the way X – possibly randomized – moderates the effect of D.
The estimands can be seen from the following graphs:
| labels | inquiry | meaning | formal_definition |
|---|---|---|---|
| ATE | \(\text{ATE}\) | Average treatment effect | \(\mathbb{E}[Y_1 - Y_0]\) |
| BLP | \(\text{BLP}\) | Best linear predictor | \(b \text{ from } \arg\min_{a, b} \mathbb{E}[((Y_1 - Y_0) - (a + bX))^2]\) |
| CATE_min | \(\text{CATE}_{\min}\) | CATE at min \(x\) | \(\mathbb{E}[Y_1 - Y_0 \mid X = \min(X)]\) |
| CATE_L | \(\text{CATE}_{\text{L}}\) | CATE for the lower bound group | \(\mathbb{E}[Y_1 - Y_0 \mid X \in \text{L}]\) |
| CATE_M | \(\text{CATE}_{\text{M}}\) | CATE for the medium group | \(\mathbb{E}[Y_1 - Y_0 \mid X \in \text{M}]\) |
| CATE_H | \(\text{CATE}_{\text{H}}\) | CATE for the high group | \(\mathbb{E}[Y_1 - Y_0 \mid X \in \text{H}]\) |
| D_CATE | \(\Delta_{\text{CATE}}\) | Difference in CATEs (H v. L) | \(\mathbb{E}[Y_1 - Y_0 \mid X \in \text{H}] - \mathbb{E}[Y_1 - Y_0 \mid X \in \text{L}]\) |
| tau_CATE | \(\tau_{\text{CATE}}\) | Effect of group on group CATEs (H v. L) | \(\mathbb{E}[(Y(1, x_H^*) - Y(0, x_H^*)) - (Y(1, x_L^*) - Y(0, x_L^*))]\) |
| ADC | \(\text{ADC}\) | Average difference in CATEs | \(\frac{\mathbb{E}\left[(Y_1 - Y_0 \mid X = x + \delta)\right] - \mathbb{E}\left[(Y_1 - Y_0 \mid X = x)\right]}{\delta}\) |
| AIE | \(\text{AIE}\) | Average interaction effect | \(\mathbb{E}\left[\frac{(Y(1, x+\delta) - Y(0, x+\delta)) - (Y(1, x) - Y(0, x))}{\delta}\right]\) |
Say \(X\) can affect \(Y\) directly, or indirectly through \(M\). then we can write potential outcomes as:
We can then imagine inquiries of the form:
Interpret these. What others might be of interest?
Again we might imagine that these are defined with respect to some group:
here, among those for whom \(X\) has a positive effect on \(Y\), for what share would there be a positive effect if \(M\) were fixed at 1?
In qualitative research a particularly common inquiry is “did \(X=1\) cause \(Y=1\)?
This is often given as a probability, the “probability of causation” (though at the case level we might better think of this probability as an estimate rather than an estimand):
\[\Pr(Y_i(0) = 0 | Y_i(1) = 1, X = 1)\]
Intuition: What’s the probability \(X=1\) caused \(Y=1\) in an \(X=1, Y=1\) case drawn from a large population with the following experimental distribution:
| Y=0 | Y=1 | All | |
|---|---|---|---|
| X=0 | 1 | 0 | 1 |
| X=1 | 0.25 | 0.75 | 1 |
Intuition: What’s the probability \(X=1\) caused \(Y=1\) in an \(X=1, Y=1\) case drawn from a large population with the following experimental distribution:
| Y=0 | Y=1 | All | |
|---|---|---|---|
| X=0 | 0.75 | 0.25 | 1 |
| X=1 | 0.25 | 0.75 | 1 |
Other inquiries focus on distinguishing between causes.
For the Billy Suzy problem (Hall 2004), Halpern (2016) focuses on “actual causation” as a way to distinguish between Suzy and Billy:
Imagine Suzy and Billy, simultaneously throwing stones at a bottle. Both are excellent shots and hit whatever they aim at. Suzy’s stone hits first, knocks over the bottle, and the bottle breaks. However, Billy’s stone would have hit had Suzy’s not hit, and again the bottle would have broken. Did Suzy’s throw cause the bottle to break? Did Billy’s?
Actual Causation:
An inquiry: for what share in a population is a possible cause an actual cause?
Pearl (e.g. Pearl and Mackenzie (2018)) describes three types of inquiry:
| Level | Activity | Inquiry |
|---|---|---|
| Association | “Seeing” | If I see \(X=1\) should I expect \(Y=1\)? |
| Intervention | “Doing” | If I set \(X\) to \(1\) should I expect \(Y=1\)? |
| Counterfactual | “Imagining” | If \(X\) were \(0\) instead of 1, would \(Y\) then be \(0\) instead of \(1\)? |
We can understand these as asking different types of questions about a causal model
| Level | Activity | Inquiry |
|---|---|---|
| Association | “Seeing” | \(\Pr(Y=1|X=1)\) |
| Intervention | “Doing” | \(\mathbb{E}[\mathbb{I}(Y(1)=1)]\) |
| Counterfactual | “Imagining” | \(\Pr(Y(1)=1 \& Y(0)=0)\) |
The third is qualitatively different because it requires information about two mutually incompatible conditions for units. This is not (generally ) recoverable directly from knowledge of \(\Pr(Y(1)=1)\) and \(\Pr(Y(0)=0)\).