Causality

Macartan Humphreys

1 Causality. What’s a cause?

1.1 Potential outcomes and the counterfactual approach

Causation as difference making

1.1.1 Motivation

The intervention based motivation for understanding causal effects:

  • We want to know if a particular intervention (like aid) caused a particular outcome (like reduced corruption).
  • We need to know:
    1. What happened?
    2. What would the outcome have been if there were no intervention?
  • The problem:
    1. … this is hard
    2. … this is impossible

The problem in 2 is that you need to know what would have happened if things were different. You need information on a counterfactual.

1.1.2 Potential Outcomes

  • For each unit, we assume that there are two post-treatment outcomes: \(Y_i(1)\) and \(Y_i(0)\).
    • \(Y(1)\) is the outcome that would obtain if the unit received the treatment.
    • \(Y(0)\) is the outcome that would obtain if it did not.
  • The causal effect of Treatment (relative to Control) is: \(\tau_i = Y_i(1) - Y_i(0)\)
  • Note:
    • The causal effect is defined at the individual level.
    • There is no “data generating process” or functional form.
    • The causal effect is defined relative to something else, so a counterfactual must be conceivable (did Germany cause the second world war?).
    • Are there any substantive assumptions made here so far?

1.1.3 Potential Outcomes

Idea: A causal claim is (in part) a claim about something that did not happen. This makes it metaphysical.

1.1.4 Potential Outcomes

Now that we have a concept of causal effects available, let’s answer two questions:

  • TRANSITIVITY: If for a given unit \(A\) causes \(B\) and \(B\) causes \(C\), does that mean that \(A\) causes \(C\)?

1.1.5 Potential Outcomes

Now that we have a concept of causal effects available, let’s answer two questions:

  • TRANSITIVITY: If for a given unit \(A\) causes \(B\) and \(B\) causes \(C\), does that mean that \(A\) causes \(C\)?

  • A boulder is flying down a mountain. You duck. This saves your life.

  • So the boulder caused the ducking and the ducking caused you to survive.

  • So: did the boulder cause you to survive?

1.1.6 Potential Outcomes

CONNECTEDNESS Say \(A\) causes \(B\) — does that mean that there is a spatiotemporally continuous sequence of causal intermediates?

1.1.7 Potential Outcomes

CONNECTEDNESS Say \(A\) causes \(B\) — does that mean that there is a spatiotemporally continuous sequence of causal intermediates?

  • Person A is planning some action \(Y\); Person B sets out to stop them; person X intervenes and prevents person B from stopping person A. In this case Person A may complete their action, producing Y, without any knowledge that B and X even exist; in particular B and X need not be anywhere close to the action. So: did X cause Y?

1.1.8 Causal claims: Contribution or attribution?

The counterfactual model is about contribution and attribution in a very specific sense.

  • Focus is on non-rival contributions
  • Focus is on conditional attribution. Not: what caused \(Y\): but what is the effect of \(X\)?

1.1.9 Causal claims: Contribution or attribution?

Consider an outcome \(Y\) that might depend on two causes \(X_1\) and \(X_2\):

\[Y(0,0) = 0\] \[Y(1,0) = 0\] \[Y(0,1) = 0\] \[Y(1,1) = 1\]

What caused \(Y\)? Which cause was most important?

1.1.10 Causal claims: Contribution or attribution?

The counterfactual model is about attribution in a very conditional sense.

  • This is problem for research programs that define “explanation” in terms of figuring out the things that cause \(Y\)

  • Real difficulties conceptualizing what it means to say one cause is more important than another cause. What does that mean?

1.1.11 Causal claims: Contribution or attribution?

Erdogan’s increasing authoritarianism was the most important reason for the attempted coup

  • More important than Turkey’s history of coups?
  • What does that mean?

1.1.12 Causal claims: No causation without manipulation

  • Some seemingly causal claims not admissible.
  • To get the definition off the ground, manipulation must be imaginable (whether practical or not)
  • This renders thinking about effects of race and gender difficult
  • What does it mean to say that Aunt Pat voted for Brexit because she is old?

1.1.13 Causal claims: No causation without manipulation

  • Some seemingly causal claims not admissible.
  • To get the definition off the ground, manipulation must be imaginable (whether practical or not)
  • This renders thinking about effects of race and gender difficult
  • Compare: What does it mean to say that Southern counties voted for Brexit because they have many old people?

1.1.14 Causal claims: No causation without manipulation

More uncomfortably:

What does it mean to say that the tides are caused by the moon? What exactly do we have to imagine…

1.1.15 Causal claims: Causal claims are everywhere

  • Jack exploited Jill

  • It’s Jill’s fault that bucket fell

  • Jack is the most obstructionist member of Congress

  • Melania Trump stole from Michelle Obama’s speech

  • Activists need causal claims

1.1.16 Causal claims: What is actually seen?

  • We have talked about what’s potential, now what do we observe?
  • Say \(Z_i\) indicates whether the unit \(i\) is assigned to treatment \((Z_i=1)\) or not \((Z_i=0)\). It describes the treatment process. Then what we observe is: \[ Y_i = Z_iY_i(1) + (1-Z_i)Y_i(0) \]

This is sometimes called a “switching equation”

In DeclareDesign \(Y\) is realised from potential outcomes and assignment in this way using reveal_outcomes

1.1.17 Causal claims: What is actually seen?

  • Say \(Z\) is a random variable, then this is a sort of data generating process. BUT the key thing to note is

    • \(Y_i\) is random but the randomness comes from \(Z_i\) — the potential outcomes, \(Y_i(1)\), \(Y_i(0)\) are fixed
    • Compare this to a regression approach in which \(Y\) is random but the \(X\)’s are fixed. eg: \[ Y \sim N(\beta X, \sigma^2) \text{ or } Y=\alpha+\beta X+\epsilon, \epsilon\sim N(0, \sigma^2) \]

1.1.18 Causal claims: The estimand and the rub

  • The causal effect of Treatment (relative to Control) is: \[\tau_i = Y_i(1) - Y_i(0)\]
  • This is what we want to estimate.
  • BUT: We never can observe both \(Y_i(1)\) and \(Y_i(0)\)!
  • This is the fundamental problem (Holland (1986))

1.1.19 Causal claims: The rub and the solution

  • Now for some magic. We really want to estimate: \[ \tau_i = Y_i(1) - Y_i(0)\]

  • BUT: We never can observe both \(Y_i(1)\) and \(Y_i(0)\)

  • Say we lower our sights and try to estimate an average treatment effect: \[ \tau = \mathbb{E} [Y(1)-Y(0)]\]

  • Now make use of the fact that \[\mathbb E[Y(1)-Y(0)] = \mathbb E[Y(1)]- \mathbb E [Y(0)] \]

  • In words: The average of differences is equal to the difference of averages; here, the average treatment effect is equal to the difference in average outcomes in treatment and control units.

  • The magic is that while we can’t hope to measure the differences; we are good at measuring averages.

1.1.20 Causal claims: The rub and the solution

  • So we want to estimate \(\mathbb{E} [Y(1)]\) and \(\mathbb{E} [Y(0)]\).
  • We know that we can estimate averages of a quantity by taking the average value from a random sample of units
  • To do this here we need to select a random sample of the \(Y(1)\) values and a random sample of the \(Y(0)\) values, in other words, we randomly assign subjects to treatment and control conditions.
  • When we do that we can in fact estimate: \[ \mathbb {E}_N[Y_i(1) | Z_i = 1) - \mathbb {E}_N(Y_i(0) | Z_i = 0]\] which in expectation equals: \[ \mathbb{E} [Y_i(1) | Z_i = 1 \text{ or } Z_i = 0] - \mathbb{E} [Y_i(0) | Z_i = 1 \text{ or } Z_i = 0]\]
  • This highlights a deep connection between random assignment and random sampling: when we do random assignment we are in fact randomly sampling from different possible worlds.

1.1.21 Causal claims: The rub and the solution

This provides a positive argument for causal inference from randomization, rather than simply saying with randomization “everything else is controlled for”

Let’s discuss:

  • Does the fact that an estimate is unbiased mean that it is right?
  • Can a randomization “fail”?
  • Where are the covariates?

1.1.22 Causal claims: The rub and the solution

Idea: random assignment is random sampling from potential worlds: to understand anything you find, you need to know the sampling weights

1.1.23 Reflection

Idea: We now have a positive argument for claiming unbiased estimation of the average treatment effect following random assignment

But is the average treatment effect a quantity of social scientific interest?

1.1.24 Potential outcomes: why randomization works

The average of the differences \(\approx\) difference of averages

1.1.25 Potential outcomes: heterogeneous effects

The average of the differences \(\approx\) difference of averages

1.1.26 Potential outcomes: heterogeneous effects

Question: \(\approx\) or \(=\)?

1.1.27 Exercise your potential outcomes 1

Consider the following potential outcomes table:

Unit Y(0) Y(1) \(\tau_i\)
1 4 3
2 2 3
3 1 3
4 1 3
5 2 3

Questions for us: What are the unit level treatment effects? What is the average treatment effect?

1.1.28 Exercise your potential outcomes 2

Consider the following potential outcomes table:

In treatment? Y(0) Y(1)
Yes 2
No 3
No 1
Yes 3
Yes 3
No 2

Questions for us: Fill in the blanks.

  • Assuming a constant treatment effect of \(+1\)
  • Assuming a constant treatment effect of \(-1\)
  • Assuming an average treatment effect of \(0\)

What is the actual treatment effect?

1.2 Pause

Take a short break!

1.3 Endogeneous subgroups

1.3.1 Endogeneous Subgroups

Experiments often give rise to endogenous subgroups. The potential outcomes framework can make it clear why this can cause problems.

1.3.2 Heterogeneous Effects with Endogeneous Categories

  • Problems arise in analyses of subgroups when the categories themselves are affected by treatment

  • Example from our work:

    • You want to know if an intervention affects reporting on violence against women
    • You measure the share of all subjects that experienced violence that file reports
    • The problem is that which subjects experienced violence is itself a function of treatment

1.3.3 Heterogeneous Effects with Endogeneous Categories

  • Violence(Treatment)
  • Reporting(Treatment, Violence)
V(0) V(1) R(0,1) R(1,1) R(0,0) R(1,0)
Type 1 (reporter) 1 1 1 1 0 0
Type 2 (non reporter) 1 0 0 0 0 0

Expected reporting given violence in control = Pr(Type 1)

Expected reporting given violence in treatment = 100%

Question: What is the actual effect of treatment on the propensity to report violence?

1.3.4 Heterogeneous Effects with Endogeneous Categories

It is possible that in truth no one’s reporting behavior has changed, what has changed is the propensity of people with different propensities to report to experience violence:

Reporter No Violence Violence % Report
Control Yes
No
25
25
25
25
\(\frac{25}{25+25}=50\%\)
Treatment Yes
No
25
50
25
0
\(\frac{25}{25+0}=100\%\)

1.3.5 Heterogeneous Effects with Endogeneous Categories

This problem can arise as easily in seemingly simple field experiments. Example:

  • In one study we provided constituents with information about performance of politicians
  • We told politicians in advance so that they could take action
  • We wanted to see whether voters punished poorly performing politicians

What’s the problem?

1.3.6 Endogeneous Categories: Test yourself

Question for us:

  • Quotas for women are randomly placed in a set of constituencies in year 1. All winners in these areas are women; in other areas only some are.
  • In year 2 these quotas are then lifted.

Which problems face an endogenous subgroup issue?:

1.3.7 Endogeneous Categories: Test yourself

Which problems face an endogenous subgroup issue?:

  1. You want to estimate the likelihood that a woman will stand for reelection in treatment versus control areas in year 2. You want to estimate how much incumbents are more likely to be reelected in treatment versus control areas in year
  2. You want to estimate how much treatment areas have more re-elected incumbents in elections in year 2 compared to control.

1.3.8 Endogeneous Categories: Responses

In such cases you can:

  • Examine the joint distribution of multiple outcomes
  • Condition on pretreatment features only
  • Engage in mediation analysis

1.3.9 Missing data can create an endogeneous subgroup problem

  • It is well known that missing data can undo the magic of random assignment.
  • One seemingly promising approach is to match into pairs ex ante and drop pairs together ex post.
  • Say potential outcomes looked like this (2 pairs of 2 units):
Pair I I II II
Unit 1 2 3 4 Average
Y(0) 0 0 0 0
Y(1) -3 1 1 1
\(\tau\) -3 1 1 1 0

1.3.10 Missing data

  • Say though that treated cases are likely to drop out of the sample if things go badly (e.g. they get a negative score or die)
  • Then you might see no attrition if those would-be attritors are not treated.
  • You might assume you have no problem (after all, no attrition).
  • No missing data when the normal cases happens to be selected
Pair I I II II
Unit 1 2 3 4 Average
Y(0) 0 0 0
Y(1) 1 1 1
\(\hat{\tau}\) 1

1.3.11 Missing data

  • But in cases in which you have attrition, dropping the pair doesn’t necessarily help.
  • The problem is potential missingness still depends on potential outcomes
  • The kicker is that the method can produce bias even if (in fact) there is no attrition!
  • But missing data when the vulnerable cases happens to be selected
Pair I I II II
Unit 1 2 3 4 Average
Y(0) [0] 0 0
Y(1) [-3] 1 1
\(\hat{\tau}\) 1

1.3.12 Missing data

Note: The right way to think about this is that bias is a property of the strategy over possible realizations of data and not normally a property of the estimator conditional on the data.

1.3.13 Multistage games

Multistage games can also present an endogenous group problem since collections of late stage players facing a given choice have been created by early stage players.

1.3.14 Multistage games

Question: Does visibility alter the extent to which subjects follow norms to punish antisocial behavior (and reward prosocial behavior)? Consider a trust game in which we are interested in how information on receivers affects their actions

Table 1: Return rates given investments under different conditions.
Return rates given investments under different conditions
Average % returned
Visibility Treatment % invested (average) ...when 10% invested ...when 50% invested
Control: Masked information on respondents 30% 20% 40%
Treatment: Full information on respondents 30% 0% 60%

What do we think? Does visibility make people react more to investments?

1.3.15 Multistage games

Imagine you could see all the potential outcomes, and they looked like this:

Table 2: Potential outcomes with (and without) identity protection.
Potential outcomes with (and without) identity protection
Responder’s return decision (given type)
Avg.
Offered behavior Nice 1 Nice 2 Nice 3 Mean 1 Mean 2 Mean 3
Invest 10% 60% 60% 60% 0% 0% 0% 30%
Invest 50% 60% 60% 60% 0% 0% 0% 30%

Conclusion: Both the offer and the information condition are completely irrelevant for all subjects.

1.3.16 Multistage games

Unfortunately you only see a sample of the potential outcomes, and that looks like this:

Table 3: Outcomes when respondent is visible.
Outcomes when respondent is visible
Responder’s return decision (given type)
Avg.
Offered behavior Nice 1 Nice 2 Nice 3 Mean 1 Mean 2 Mean 3
Invest 10% 0% 0% 0% 0%
Invest 50% 60% 60% 60% 60%

False Conclusion: When not protected, responders condition behavior strongly on offers (because offerers can select on type accurately)

In fact: The nice types invest more because they are nice. The responders return more to the nice types because they are nice.

1.3.17 Multistage games

Unfortunately you only see a (noisier!) sample of the potential outcomes, and that looks like this:

Table 4: Outcomes when respondent is not visible.
Outcomes when respondent is not visible
Responder’s return decision (given type)
Avg.
Offered behavior Nice 1 Nice 2 Nice 3 Mean 1 Mean 2 Mean 3
Invest 10% 60% 0% 0% 20%
Invest 50% 60% 60% 0% 40%

False Conclusion: When protected, responders condition behavior less strongly on offers (because offerers can select on type less accurately)

1.3.18 Multistage games

What to do?

Solutions?

  1. Analysis could focus on the effect of treatment on respondent behavior, directly.
    • This would get the correct answer but to a different question (Does information affect the share of contributions returned by subjects on average?)
  2. Strategy method can sometimes help address the problem, but note that that is (a) changing the question and (b) putting demands on respondent imagination and honesty
  3. First mover action could be directly manipulated, but unless deception is used that is also changing the question
  4. First movers could be selected because they act in predictable ways (bordering on deception?)

Take away: Proceed with extreme caution when estimating effects beyond the first stage.

1.4 Pause

Take a short break!

1.5 DAGs

Directed Acyclic Graphs

1.5.1 Key insight

The most powerful results from the study of DAGs give procedures for figuring out when conditioning aids or hinders causal identification.

  • You can read off a confounding variable from a DAG.
    • You figure out what to condition on for causal identification.
  • You can read off “colliders” from a DAG
    • Sometimes you have to avoid conditioning on these
  • Sometimes a variable might be both, so
    • you have to condition on it
    • you have to avoid conditioning on it
    • Ouch.

1.5.2 Key resource

1.5.3 Challenge for us

  • Say you don’t like graphs. Fine.

  • Consider this causal structure:

    • \(Z = f_1(U_1, U_2)\)
    • \(X = f_2(U_2)\)
    • \(Y = f_3(X, U_1)\)

Say \(Z\) is temporally prior to \(X\); it is correlated with \(Y\) (because of \(U_1\)) and with \(X\) (because of \(U_2\)).

Question: Would it be useful to “control” for \(Z\) when trying to estimate the effect of \(X\) on \(Y\)?

1.6 Challenge for us

  • Say you don’t like graphs. Fine.

  • Consider this causal structure:

    • \(Z = f_1(U_1, U_2)\)
    • \(X = f_2(U_2)\)
    • \(Y = f_3(X, U_1)\)

Question: Would it be useful to “control” for \(Z\) when trying to estimate the effect of \(X\) on \(Y\)?

Answer: Hopefully by the end of today you should see that the answer is obviously (or at least, plausibly) “no.”

1.7 Conditional independence and graph structure

  • What DAGs do is tell you when one variable is independent of another variable given some third variable.
  • Intuitively:
    • what variables “shield off” the influence of one variable on another
    • e.g. If inequality causes revolution via discontent, then inequality and revolution should be related to each other overall, but not related to each other among those that are content or among those that are discontent

1.7.1 Conditional independence

Variable sets \(A\) and \(B\) are conditionally independent, given \(C\) if for all \(a\), \(b\), \(c\):

\[\Pr(A = a | C = c) = \Pr(A = a | B = b, C = c)\]

Informally; given \(C\), knowing \(B\) tells you nothing more about \(A\).

1.7.2 Conditional distributions: Factorization

  • Consider a situation with variables \(X_1, X_2, \dots X_n\)
  • The probability of outcome \(x\) can always be written in the form \(P(X_1 = x_1)P(X_2 = x_2|X_1=x_1)(X_3 = x_3|X_1=x_1, X_2 = x_2)\dots\).
  • This can be done with any ordering of variables.
  • However the representation can be greatly simplified if you can make use of a set of “parentage” relationships
  • Given an ordering of variables, the Markovian parents of variable \(X_j\) are the minimal set of variables such that when you condition on these, \(X_j\) is independent of all other prior variables in the ordering
  • We want to get to the point where we can can write: \(P(x) = \prod_j(x_j | pa_j)\)

1.7.3 Conditional distributions: Factorization from graphs

  • We want to use causal graphs to represent these relations of conditional independence.
  • Informally, an arrow, \(A \rightarrow B\) means that \(A\) is a cause of \(B\): that is, under some conditions, a change in \(A\) produces a change in \(B\).
    • Arrows carry no information about the type of effect; e.g. sign, size, or whether different causes are complements or substitutes
  • We say that arrows point from parents to children, and by extension from ancestors to descendants.
  • These are parents on the graph; but we will connect them to Markovian parents in a probability distribution \(P\).

1.7.4 Conditional distributions: Markov condition

  • A DAG is just a graph in which some or all nodes are connected by directed edges (arrows) and there are no cyclical paths along these directed edges.
  • Consider a DAG, \(G\), and consider the ancestry relations implied by \(G\): the distribution \(P\) is Markov relative to the graph \(G\) if every variable is independent of its nondescendants (in \(G\)) conditional on its parents (in \(G\)).
    • Markov condition: conditional on its parents, a variable is independent of its non-descendants.

Now we have what we need to simplify: if the Markov condition is satisfied, then instead of writing the full probability as \(P(x) = P(x_1)P(x_2|x_1)P(x_3|x_1, x_2)\) we can write \(P(x) = \prod_i P(x_i |pa_i)\).

1.7.5 Illustration

If \(P(a,b,c)\) is Markov relative to this graph then: \(C\) is independent of \(A\) given \(B\)

And instead of

\[\Pr(a,b,c) = \Pr(a)\Pr(a|b)\Pr(c|a, b)\]

we could now write:

\[\Pr(a,b,c) = \Pr(a)\Pr(a|b)\Pr(c|b)\]

1.7.6 Conditional distributions and interventions

We want the graphs to be able to represent the effects of interventions.

Pearl uses do notation to capture this idea.

\[\Pr(X_1, X_2,\dots | do(X_j = x_j))\] or

\[\Pr(X_1, X_2,\dots | \hat{x}_j)\]

denotes the distribution of \(X\) when a particular node (or set of nodes) is intervened upon and forced to a particular level, \(x_j\).

1.7.7 Conditional distributions given do operations

Note, in general: \[\Pr(X_1, X_2,\dots | do(X_j = x_j')) \neq \Pr(X_1, X_2,\dots | X_j = x_j')\] as an example we might imagine a situation where:

  • for men, binary \(X\) always causes \(Y=1\)
  • for women, \(Y=0\) regardless of \(X\)
  • \(X=1\) for men only.

In that case \(\Pr(Y=1 | X = 1) = 1\) but \(\Pr(Y=1 | do(X = 1)) = .5\)

1.7.8 Conditional distributions given do operations

A DAG is “causal Bayesian network” or “Causal DAG” if (and only if) the probability distribution resulting from setting some set \(X_i\) to \(\hat{x'}_i\) (i.e. do(X=x')) is:

\[P_{\hat{x}_i}: P(x_1,x_2,\dots x_n|\hat{x}_i) = \mathbb{I}(x_i = x_i')\prod_{-i}P(x_j|pa_j)\]

This means that there is only probability mass on vectors in which \(x_i = x_i'\) (reflecting the success of control) and all other variables are determined by their parents, given the values that have been set for \(x_i\).

1.7.9 Conditional distributions given do operations

Illustration, say we have binary \(X\) causes binary \(M\) which cases binary \(Y\); say we intervene and set \(M=1\). Then what is the distribution of \((x,m,y)\)?

It is:

\[\Pr(x,m,y) = \Pr(x)\mathbb I(M = 1)\Pr(y|m)\]

1.8 Graphical reading of Conditional Independence

1.8.1 Conditional Independence and \(d\)-separation

  • We now have a well defined sense in which the arrows on a graph represent a causal structure and capture the conditional independence relations implied by the causal structure.

  • Of course any graph might represent many different probability distributions \(P\)

  • We can now start reading off from a graph when there is or is not conditional independence between sets of variables

1.8.2 Conditional independence on paths graphs

Three elemental relations of conditional independence.

1.8.3 Conditional independence from graphs

\(A\) and \(B\) are conditionally independent, given \(C\) if on every path between \(A\) and \(B\):

  • there is some chain (\(\bullet\rightarrow \bullet\rightarrow\bullet\) or \(\bullet\leftarrow \bullet\leftarrow\bullet\)) or fork (\(\bullet\leftarrow \bullet\rightarrow\bullet\)) with the central element in \(C\),

or

  • there is an inverted fork (\(\bullet\rightarrow \bullet\leftarrow\bullet\)) with the central element (and its descendants) not in \(C\)

Notes:

  • In this case we say that \(A\) and \(B\) are d-separated by \(C\).
  • \(A\), \(B\), and \(C\) can all be sets
  • Note that a path can involve arrows pointing any direction \(\bullet\rightarrow \bullet\rightarrow \bullet\leftarrow \bullet\rightarrow\bullet\)

1.8.4 Test yourself

Are A and D unconditionally independent:

  • if you do not condition on anything?
  • if you condition on B?
  • if you condition on C?
  • if you condition on B and C?

1.8.5 Back to this example

  • \(Z = f_1(U_1, U_2)\)
  • \(X = f_2(U_2)\)
  • \(Y = f_3(X, U_1)\)
  1. Let’s graph this
  2. Now: say we removed the arrow from \(X\) to \(Y\)
    • Would you expect to see a correlation between \(X\) and \(Y\) if you did not control for \(Z\)
    • Would you expect to see a correlation between \(X\) and \(Y\) if you did control for \(Z\)

1.8.6 Back to this example

Now: say we removed the arrow from \(X\) to \(Y\) - Would you expect to see a correlation between \(X\) and \(Y\) if you did not control for \(Z\) - Would you expect to see a correlation between \(X\) and \(Y\) if you did control for \(Z\)

1.9 Causal models

1.9.1 From graphs to Causal Models

A “causal model” is:

1.Variables

  • An ordered list of \(n\) endogenous nodes, \(\mathcal{V}= (V^1, V^2,\dots, V^n)\), with a specification of a range for each of them
  • A list of \(n\) exogenous nodes, \(\Theta = (\theta^1, \theta^2,\dots , \theta^n)\)
  1. A list of \(n\) functions \(\mathcal{F}= (f^1, f^2,\dots, f^n)\), one for each element of \(\mathcal{V}\) such that each \(f^i\) takes as arguments \(\theta^i\) as well as elements of \(\mathcal{V}\) that are prior to \(V^i\) in the ordering

  2. A probability distribution over \(\Theta\)

1.9.2 From graphs to Causal Models

A simple causal model in which high inequality (\(I\)) affects democratization (\(D\)) via redistributive demands (\(R\)) and mass mobilization (\(M\)), which is also a function of ethnic homogeneity (\(E\)). Arrows show relations of causal dependence between variables.

1.9.3 Effects on a DAG

  • Learning about effects given a model means learning about \(F\) and also the distribution of shocks (\(\Theta\)).

  • For discrete data this can be reduced to a question about learning about the distribution of \(\Theta\) only.

1.9.4 Effects on a DAG

For instance the simplest model consistent with \(X \rightarrow Y\):

  • Endogenous Nodes = \(\{X, Y\}\), both with range \(\{0,1\}\)

  • Exogenous Nodes = \(\{\theta^X, \theta^Y\}\), with ranges \(\{\theta^X_0, \theta^X_1\}\) and \(\{\theta^Y_{00}\theta^Y_{01}, \theta^Y_{10}, \theta^Y_{11}\}\)

  • Functional equations:

    • \(f_Y\): \(\theta^Y =\theta^Y_{ij} \rightarrow \{Y = i \text{ if } X=0; Y = j \text{ if } X=1\}\)
    • \(f_X\): \(\theta^X =\theta^X_{i} \rightarrow \{X = i\}\)
  • Distributions on \(\Theta\): \(\Pr(\theta^i = \theta^i_k) = \lambda^i_k\)

1.9.5 Effects as statement about exogeneous variables

What is the probability that \(X\) has a positive causal effect on \(Y\)?

  • This is equivalent to: \(\Pr(\theta^Y =\theta^Y_{01}) = \lambda^Y_{01}\)

  • So we want to learn about the distributions of the exogenous nodes

1.9.6 Recap:

  • DAGs
  • Causality

1.9.7 Recap: Key features of graphs

  • Directed
  • Acyclic
  • The missing arcs are the really important ones
  • Implicitly there are shocks going into every node
  • These graphs represent Nonparametric structural equation models NPSEMs
  • But you cannot read off the size or direction of effects from a DAG

1.9.8 Recap: Things you need to know about causal inference

  1. A causal claim is a statement about what didn’t happen.
  2. There is a fundamental problem of causal inference.
  3. You can estimate average causal effects even if you cannot observe any individual causal effects.
  4. If you know that \(A\) causes \(B\) and that \(B\) causes \(C\), this does not mean that you know that \(A\) causes \(C\).
  5. \(X\) can cause \(Y\) even if there is no “causal path” connecting \(X\) and \(Y\).
  6. \(X\) can cause \(Y\) even if \(X\) is not a necessary condition or a sufficient condition for \(Y\).
  7. Estimating average causal effects does not require that treatment and control groups are identical.
  8. There is no causation without manipulation.

http://egap.org/resources/guides/causality/

Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal of the American Statistical Association 81 (396): 945–60.
Pearl, Judea. 2009. Causality. Cambridge university press.
Pearl, Judea, and Azaria Paz. 1985. Graphoids: A Graph-Based Logic for Reasoning about Relevance Relations. University of California (Los Angeles). Computer Science Department.