Chapter 4 Causal Queries

We describe major families of causal queries and illustrate how these can all be described as queries about the values of nodes in a causal model.


Although scholars share a broad common interest in causality, there is tremendous heterogeneity in the kinds of causal questions that scholars ask. Consider the relationship between inequality and democratization. We might seek to know inequality’s average impact on democratization across some broad set of cases. Alternatively, we might be interested in a particular case—say, Mongolia in 1995—and want to know whether inequality would have an effect in this case. Or we might wonder whether the level of democracy in Mongolia in 1995 is due to the level of inequality in that case—yet another distinct question (in the same way that establishing that poison would make you sick does not imply that you are sick because of poison). In a different vein, we might be interested in how causal effects unfold, inquiring about the pathway or mechanism through which inequality affects democratization—a question we can also ask at two levels. We can ask whether inequality affected democratization in Mongolia through mobilization of the masses; or we can ask how commonly, across a broad set of cases, inequality affects democratization through mobilization of the masses. Pushing further, we might ask a counterfactual question of the form: would inequality have produced democratization had mobilization been prevented from occurring?

Distinct methodological literatures have been devoted to the study of average causal effects, the analysis of case-level causal effects and explanations, and the identification of causal pathways. Fortunately, each of these questions can be readily captured as specific queries asked of (and answerable from) a causal model. As described by Pearl (2010), the goal is to deploy an “algorithm that receives a model M as an input and delivers the desired quantity Q(M) as the output.” More specifically, we demonstrate how, given a model as described in Chapter 2, a causal query can be represented as a question about the exogenous nodes on a causal graph (\(\theta\)). When we assimilate our causal questions into a causal model, we are placing what we want to know in formal relation to both what we already know and what we can potentially observe. As we will see in later chapters, this move allows us then to deploy a model to generate strategies of inference: to determine which observations, if we made them, would be likely to yield the greatest leverage on our query, given our prior knowledge about the way the world works. And by the same logic, once we see the evidence, this integration allows us to “update” on our query—to figure out in systematic fashion what we have learned—in a manner that takes background knowledge into account.

In the remainder of this chapter, we walk through the conceptualization and causal-model interpretation of four key causal queries:

  • Case-level causal effects

  • Case-level causal attribution

  • Average causal effects

  • Causal pathways

In addition we give a treatment of “* Actual causation” inquiries in the chapter appendix. These queries are in no way exhaustive of the causal questions that can be captured in causal graphs, but they are among the more common foci of social scientific investigation.

4.1 Case-level causal effects

The simplest causal question is whether some causal effect operates in an individual case. Does \(X\) have an effect on \(Y\) in this case? For instance, is Yemen in 1995 a case in which a change in economic inequality would produce a change in whether or not the country democratizes? We could put the question more specifically as a query about a causal effect in a particular direction, for instance: Does inequality have a positive effect on democratization in the case of Yemen in 1995?

In counterfactual terms, a query about case-level causation is a question about what would happen if we could manipulate a variable in the case: if we could hypothetically intervene to change \(X\)’s value in the case, (how) would \(Y\)’s value change? To ask, more specifically, whether a positive or negative effect operates for a case is to ask whether a particular counterfactual relation holds in that case. If we assume a setup with binary variables for simplicity, to ask whether inequality has a positive effect on democratization is to ask: if we set \(I\) to \(0\) would \(D\) take on a value of \(0\), and if we set \(I\) to \(1\), would \(D\) take on a value of \(1\)? (Both of these conditions must hold for \(I\) to have a positive effect on \(D\).)

We can easily represent this kind of query in the context of a causal model. We show the DAG for such a model in Figure 4.1. As introduced in Chapter 2, \(\theta^Y\) here represents the nodal type characterizing \(Y\)’s response to \(X\) and, if \(X\) and \(Y\) are binary, it can take on one of four values in this models: \(\theta^Y_{10}\), \(\theta^Y_{01}\), \(\theta^Y_{00}\), and \(\theta^Y_{11}\) (which map onto our \(a, b, c\) and \(d\) types, respectively). Importantly, this setup allows for \(\theta^Y\)—the causal effect of \(X\) on \(Y\)—to vary across cases. Thus, \(X\) may have a positive effect on \(Y\) in one case (with \(\theta^Y=\theta^Y_{01}\)), and a negative effect (\(\theta^Y=\theta^Y_{10}\)) or no effect (\(\theta^Y=\theta^Y_{00}\) or \(\theta^Y_{11}\)) on \(Y\) in other cases.

This DAG is a graphical representation of the simple causal setup in which the effect of $X$ on $Y$ in a given case depends on the case's nodal type, represented by $\theta^Y$. With a single binary causal variable of interest, we let $\theta^Y$ take on values $\theta^Y_{ij}$, with $i$ representing the value $Y$ takes on if $X=0$ and $j$ representing the value $Y$ takes on if $X=1$. With a binary outcome node, $\theta^Y$ ranges over the four values: $\theta^Y_{00}$, $\theta^Y_{10}$, $\theta^Y_{01}$ and $\theta^Y_{11}$.

Figure 4.1: This DAG is a graphical representation of the simple causal setup in which the effect of \(X\) on \(Y\) in a given case depends on the case’s nodal type, represented by \(\theta^Y\). With a single binary causal variable of interest, we let \(\theta^Y\) take on values \(\theta^Y_{ij}\), with \(i\) representing the value \(Y\) takes on if \(X=0\) and \(j\) representing the value \(Y\) takes on if \(X=1\). With a binary outcome node, \(\theta^Y\) ranges over the four values: \(\theta^Y_{00}\), \(\theta^Y_{10}\), \(\theta^Y_{01}\) and \(\theta^Y_{11}\).

In this model, then, the query, “What is \(X\)’s causal effect in this case?” simply becomes a question about the value of the nodal type \(\theta^Y\). If \(\theta^Y=\theta^Y_{10}\), for instance, this implies that \(X\) has a negative effect on \(Y\) in this case. If \(\theta^Y=\theta^Y_{00}\), this implies that \(X\) has no effect on \(Y\) in this case and that \(Y\) will always be \(0\).

We can also pose probabilistic versions of a case-level causal effect query. For instance, we can ask, “What is the probability that \(X\) has a positive effect on \(Y\) in this case?” Answering this question requires estimating the probability that \(\theta^Y = \theta^Y_{01}\).35 We can also ask ask, “What is the probability that \(X\) matters for \(Y\) in this case?” Answering this question involves adding the probability that \(X\) has a positive effect to the probability that it has a negative effect. That is, it involves adding the probability that \(\theta^Y = \theta^Y_{01}\) to the probability that \(\theta^Y = \theta^Y_{10}\). We can also ask “What is the expected effect of \(X\) on \(Y\) in this case?” To answer this question, we need to estimate the probability that \(X\) has a positive effect minus the probability that it has a negative effect.

In sum, when posing probabilistic questions about case-level causal effects, we are still asking about the value of a \(\theta\) term in our model—but we are asking about the probability of the \(\theta\) term taking on some value or set of values. In practice, we will most often be posing case-level causal-effect queries in probabilistic form.

We can conceptualize questions about case-level causal effects as questions about \(\theta\) terms even if our model involves more complex relations between \(X\) and \(Y\). The question itself does not depend on the model having any particular form. For instance, consider a mediation model of the form \(X\rightarrow M \rightarrow Y\). In this model, a positive effect of \(X\) on \(Y\) can emerge in two ways. A positive \(X \rightarrow Y\) effect can emerge from a positive effect of \(X\) on \(M\) followed by a positive effect of \(M\) on \(Y\). Yet we will also get a positive \(X \rightarrow Y\) effect from a sequence of negative intermediate effects. Consider: if an increase in \(X\) causes a decrease in \(M\), while a decrease in \(M\) causes an increase in \(Y\), then an increase in \(X\) will yield an increase in \(Y\).

Thus, there are two chains of intermediate effects that will generate a positive effect of \(X\) on \(Y\). So in this model, the question, “What is the probability that \(X\) has a positive effect on \(Y\) in this case?” is asking whether either of those combinations of intermediate effects is operation. Specifically, we are asking about the following probability:

\[\begin{equation} \Pr((\theta^M = \theta^M_{01} \& \theta^Y = \theta^Y_{01}) \text{ OR } (\theta^M = \theta^M_{10} \& \theta^Y = \theta^Y_{10})) \end{equation}\]

A negative effect of \(X\) on \(Y\) can emerge from a chain of opposite-signed effects: either positive \(X \rightarrow M\) and then negative \(M \rightarrow Y\) or negative \(X \rightarrow M\) and then positive \(M \rightarrow Y\). Thus, to ask, “What is the probability that \(X\) has a negative effect on \(Y\) in this case?” is to ask about the following probability:

\[\begin{equation} \Pr((\theta^M = \theta^M_{01} \& \theta^Y = \theta^Y_{10}) \text{ OR } (\theta^M = \theta^M_{10} \& \theta^Y = \theta^Y_{01})) \end{equation}\]

To ask about the expected effect of \(X\) on \(Y\) in a case is to ask about the first probability minus the second.

Notice that working with this more complex mediation model required us first to figure out which combinations of intermediate causal effects would generate the overall effect of \(X\) on \(Y\) that we were interested in. Mapping from sets of \(X \rightarrow M\) and \(M \rightarrow Y\) effects to the \(X \rightarrow Y\) effects that they yield allowed us to figure out which \(\theta^M\) and \(\theta^Y\) values correspond to the overall effect that we are asking about. We will make use of these kinds of mappings at many points in this book. But for now the key point is that, regardless of the complexity of a model, we can always pose questions about case-level causal effects as questions about a case’s nodal types or about the probability of it having a given set of nodal types.

4.2 Case-level causal attribution

A query about causal attribution is related to, but different from, a query about a case-level causal effect. When asking about \(X\)’s case-level effect, we are asking, “Would a change in \(X\) cause a change in \(Y\) in this case?” The question of causal attribution asks: “Did \(X\) cause \(Y\) to take on the value it did in this case?” More precisely, we are asking, “Given the values that \(X\) and \(Y\) in fact took on in this case, would \(Y\)’s value have been different if \(X\)’s value had been different?”

Consider an example. We know that inequality in Taiwan was relatively low and that Taiwan democratized in 1996, but was low inequality a cause of Taiwan’s democratization in 1996? Equivalently: Given low economic inequality and democratization in Taiwan in 1996, would the outcome in this case have been different if inequality had been high?

A query about causal attribution goes beyond asking whether Taiwan is a case in which inequality has a causal effect on democratization. Whereas a case-level causal effect is defined in terms of the \(\theta\) nodes on endogenous variables, we define a causal-attribution query in terms of a larger set of nodes. To attribute \(Y\)’s value in a case to \(X\), we need to know not only whether this is the kind of case in which \(X\) could have an effect on \(Y\) but also whether the context is such that \(X\)’s value in fact made a difference.

To see the distinction, consider the generic setup in Figure 4.2. Here, \(Y\) is a function of two variables, \(X_1\) and \(X_2\). This means that \(\theta^Y\) is somewhat more complicated than in a setup with one causal variable: \(\theta^Y\) must here define \(Y\)’s response to all possible combinations of \(X_1\) and \(X_2\), including interactions between them.

This DAG is a graphical representation of the simple causal setup in which $Y$ depends on two variables $X1$ and $X2$. How $Y$ responds to X1 and X2 depnds on $\theta^Y$, the DAG itself does not provide information on whether or how X1 and X2 interact with each other.

Figure 4.2: This DAG is a graphical representation of the simple causal setup in which \(Y\) depends on two variables \(X1\) and \(X2\). How \(Y\) responds to X1 and X2 depnds on \(\theta^Y\), the DAG itself does not provide information on whether or how X1 and X2 interact with each other.

We examined the set of nodal types for a set up like this in Chapter 2 (see Table 2.3). In the table, there are four column headings representing the four possible combinations of \(X_1\) and \(X_2\) values. Each row represents one possible pattern of \(Y\) values as \(X1\) and \(X2\) move through their four combinations.

One way to conceptualize the size of the nodal-type “space” is to note that \(X_1\) can have any of our four causal effects (the four binary types) on \(Y\) when \(X_2=0\); and \(X_1\) can have any of four causal effects when \(X_2=1\). Likewise, \(X_2\)’s effect on \(Y\) can be of any of the four types when \(X_1=0\) and of any of the four types when \(X_1=1\). This yields 16 possible response patterns to combinations of \(X_1\) and \(X_2\) values.

A query about causal attribution—whether \(X_1 = 1\) caused \(Y=1\)—for the model in Figure 4.2, needs to be defined not just in terms of \(\theta^Y\), but also in terms of \(X_2\). Parallel to our Taiwan example, suppose that we have a case in which \(Y=1\) and in which \(X_1\) was also 1, and we want to know whether \(X_1\) caused \(Y\) to take on the value it did. Answering this question requires knowing whether the case’s type is such that \(X_1\) would have had a positive causal effect on \(Y\), given the value of \(X_2\)—which we can think of as part of the context. Recall also that \(X_2\)’s value is set by \(\theta^{X_2}\). Thus, given that we start with knowledge of \(X_1\)’s and \(Y\)’s values, our query about causal attribution amounts to a query about two nodal types: (a) \(\theta^{X_2}\) (which gives \(X_2\)’s value) and (b) \(\theta^Y\), specifically whether its value is such that \(X_1\) has a positive causal effect given \(X_2\)’s value.

Suppose, for instance, that we were to observe \(X_2=1\) (meaning that \(\theta^{X_2} = \theta^{X_2}_1\)). We then need to ask whether the nodal type, \(\theta^Y\), is such that \(X_1\) has a positive effect when \(X_2=1\). Consider \(\theta^Y_{0111}\) (type 8 in Table 2.3).36 This is a nodal type in which \(X_1\) has a positive effect when \(X_2=0\) but no effect when \(X_2=1\). Put differently, \(X_2=1\) is a sufficient condition for \(Y=1\), meaning that \(X_1\) makes no difference to the outcome when \(X_2=1\) under this nodal type.

In all we have four qualifying \(Y\)-nodal-types: \(\theta^Y_{0001}\), \(\theta^Y_{1001}\), \(\theta^Y_{0101}\), \(\theta^Y_{1101}\). In other words, we can attribute a \(Y=1\) outcome to \(X_1=1\) if we are in the context \(\theta^{X_2} = \theta^{X_2}_1\) and \(\theta^Y\) is one of these four nodal types.

Thus, a question about causal attribution is a question about the joint value of a set of nodal types: about whether the combination of context and the nodal type(s) governing effects is such that changing the causal factor of interest would have changed the outcome.

4.3 Average causal effects

While the queries we have considered so far operate at the case level, we can also pose causal queries at the level of populations. One of the most common population-level queries is a question about an average causal effect. In counterfactual terms, a question about average causal effects is: if we manipulated the value of \(X\) for all cases in the population—first setting \(X\) to one value for all cases, then changing it to another value for all cases—by how much would the average value of \(Y\) in the population change? Like other causal queries, a query about an average causal effect can be conceptualized as learning about a node in a causal model.

We can do this by conceiving of any given case as being a member of a population composed of different nodal types. When we seek to estimate an average causal effect, we seek information about the proportions or shares of these nodal types in the population.

More formally and adapted from Humphreys and Jacobs (2015), we can use \(\lambda^Y_{ij}\) to refer to the share of cases in a population that has nodal type \(\theta^Y_{ij}\). Thus, given our four nodal types in a two-variable binary setup, \(\lambda^Y_{10}\) is the proportion of cases in the population with negative effects; \(\lambda_{01}\) is the proportion of cases with positive effects; and so on. One nice feature of this setup, with both \(X\) and \(Y\) as binary, is that the average causal effect can be simply calculated as the share of positive-effect cases minus the share of negative-effect cases: \(\lambda^Y_{01} - \lambda^Y_{10}\).

Graphically, we can represent this setup by including \(\lambda^Y\) in a more complex causal graph as in Figure 4.3. As in our setup for case-level causal effects, \(X\)’s effect on \(Y\) in a case depends on (and only on) the case’s nodal type, \(\theta^Y\). The key difference is that we now model the case’s type not as exogenously given, but as a function of two additional variables: the distribution of nodal types in a population and a random process through which the case’s type is “drawn” from that distribution. We represent the type distribution as \(\lambda^Y\): a vector of values for the proportions \(\lambda^Y_{10}, \lambda^Y_{01}, \lambda^Y_{00}, \lambda^Y_{11}\). We represent the random process drawing a case’s \(\theta^Y\) value from that distribution as \(U^\theta\).

In practice, it is the components of \(\lambda^Y\)—the shares of different nodal types in the population—that will be of substantive interest. In this model, our causal query—about \(X\)’s average causal effect—is defined by the shares of negative- and positive-causal-effect cases, respectively, in the population. “What is \(X\)’s average effect on \(Y\)?” amounts to asking: what are the values of \(\lambda^Y_{10}\) and \(\lambda^Y_{01}\)? As with \(\theta^Y\), \(\lambda^Y\) is not directly observable. And so the empirical challenge—to which we devote later parts of this book—is to figure out what we can observe that would allow us to learn about \(\lambda^Y\)’s component values?37

This DAG is a graphical representation of a causal setup in which cases are drawn from a population composed of different nodal types. As before, $X$'s effect on $Y$ is a function of a causal-type variable, $\theta^Y$. Yet here we explicitly model the process through which the case's type is drawn from a distribution of types in a population. Here $\lambda$ represents the multinomial distribution of nodal types in the population while $U^\theta$ is a random variable representing the draw of each case from the distribution defined by $\lambda$. A case's nodal type, $\theta^Y$, is thus a joint function of $\lambda^Y$ and $U^{\theta^Y}$.

Figure 4.3: This DAG is a graphical representation of a causal setup in which cases are drawn from a population composed of different nodal types. As before, \(X\)’s effect on \(Y\) is a function of a causal-type variable, \(\theta^Y\). Yet here we explicitly model the process through which the case’s type is drawn from a distribution of types in a population. Here \(\lambda\) represents the multinomial distribution of nodal types in the population while \(U^\theta\) is a random variable representing the draw of each case from the distribution defined by \(\lambda\). A case’s nodal type, \(\theta^Y\), is thus a joint function of \(\lambda^Y\) and \(U^{\theta^Y}\).

We can, of course, likewise pose queries about other population-level causal quantities. For instance, we could ask for what proportion of cases in the population \(X\) has a positive effect? This would be equivalent to asking the value of \(\lambda^Y_{01}\), one element of the \(\lambda^Y\) vector. Or we could ask about the proportion of cases in which \(X\) has no effect, which would be asking about \(\lambda^Y_{00} + \lambda^Y_{11}\), both of which involve zero effect.

4.4 Causal Paths

To develop richer causal understandings, researchers often seek to describe the causal path or paths through which effects propagate. Consider the DAG in Figure 4.4, in which \(X\) can affect \(Y\) through two possible pathways: directly and via \(M\). Assume again that all variables are binary, taking on values of \(0\) or \(1\). Here we have nodal types defining \(M\)’s response to \(X\) (\(\theta^M\)) and nodal types defining \(Y\)’s response to both \(X\) (directly) and \(M\) (\(\theta^Y\)).

Suppose that we observe \(X=1\) and \(Y=1\) in a case. Suppose, further, that we have reasonable confidence that \(X\) has had a positive effect on \(Y\) in this case. We may nonetheless be interested in knowing whether that causal effect ran through \(M\). We will refer to this as a query about a causal path. Importantly, a causal path query is not satisfied simply by asking whether some mediating event along the path occurred. We cannot, for instance, establish that the top path in Figure 4.4 was operative simply by determining the value of \(M\) in this case—though that will likely be useful information.

Rather, the question of whether the mediated (via \(M\)) causal path is operative is a composite question of two parts: First, does \(X\) have an effect on \(M\) in this case? Second, does that effect—the difference in \(M\)’s value caused by a change in \(X\)—in turn cause a change in \(Y\)’s value? In other words, what we want to know is whether the effect of \(X\) on \(Y\) depends on—that is, will not operate without—the effect of \(X\) on \(M\).38 In other words, asking whether a causal effect operated via a given path is in fact asking about a specific set of causal effects lying along that path.

$X$ affects $Y$ both directly and indirectly through $M$.

Figure 4.4: \(X\) affects \(Y\) both directly and indirectly through \(M\).

As we can show, we can define this causal-path query as a question about specific nodes on a causal graph. In particular, a causal path can be defined in terms of the values of \(\theta\) nodes: specifically, in the present example, in terms of \(\theta^M\) and \(\theta^Y\). To see why, let us first note that there are two sequences of effects that would allow \(X\)’s positive effect on \(Y\) to operate via \(M\): (1) \(X\) has a positive effect on \(M\), which in turn has a positive effect on \(Y\); or (2) \(X\) has a negative effect on \(M\), which in turn has a negative effect on \(Y\).

Thus, in establishing whether \(X\) affects \(Y\) through \(M\), the first question is whether \(X\) affects \(M\) in this case. Whether or not it does is a question about the value of \(\theta^M\). We know that \(\theta^M\) can take on four possible values corresponding to the four possible responses to \(X\): \(\theta^M_{10}, \theta^M_{01}, \theta^M_{00}, \theta^M_{11}\). For sequence (1) to operate, \(\theta^M\) must take on the value \(\theta^M_{01}\), representing a positive effect of \(X\) on \(M\). For sequence (2) to operate, \(\theta^M\) must take on the value \(\theta^M_{10}\), representing a negative effect of \(X\) on \(M\).

Next, note that \(\theta^Y\) defines \(Y\)’s response to different combinations of two other variables—here, \(X\) and \(M\)—since both of these variables point directly into \(Y\). Given that \(Y\) has two binary parents, there are 16 possible values for \(\theta^Y\)—again as shown earlier in Table 2.3, simply substituting \(M\) and \(X\) for \(X_1\) and \(X_2\). Note that these 16 nodal types capture the full range of causal possibilities. For instance, they allow for \(M\) to affect \(Y\) and, thus, to potentially pass on a mediated effect of \(X\). They allow for \(X\) to have a direct, unmediated effect on \(Y\). And there are nodal types in which \(X\) and \(M\) interact in affecting \(Y\).

Another way to think about this last point is that \(M\) is not just a possible mediator of \(X\)’s indirect effect; \(M\) is also a potential moderator of \(X\)’s direct effect. A mediator of an effect is a variable through which the effect operates: \(M\) mediates \(X\)’s effect if \(X\) has a causal effect on \(M\) that has a causal effect on \(Y\) (a more formal opertionalization provided below). A moderator is a variable that can change the effect of another variable on the outcome: \(M\) moderates \(X\)’s effect if \(M\)’s value can influence that effect. (In a regression context, we would equivalently say that there is an interaction between \(M\) and \(X\), but we will use the language of moderation in this book.) For instance, if \(X\) has a positive direct effect on \(Y\) when \(M=0\) but no effect when \(M=1\), then \(M\) moderates \(X\)’s effect. In this causal model, \(M\) can play both of these roles: mediating part of \(X\)’s effect (the part that runs through \(M\)) and moderating part of \(X\)’s effect (the part that runs directly from \(X\) to \(Y\)).

What values of \(\theta^Y\) then are compatible with the operation of a causal path from \(X\) to \(Y\) via \(M\)? Let us first consider this question with respect to sequence (1), in which \(X\) has a positive effect on \(M\), and that positive effect is necessary for \(X\)’s positive effect on \(Y\) to occur. For this sequence to operate, as we have said, \(\theta^M\) must take on the value of \(\theta^M_{01}\). When it comes to \(\theta^Y\), then, what we need to look for types in which \(X\)’s effect on \(Y\) depends on \(M\)’s taking on the values it does as a result of \(X\)’s positive effect on \(M\).

We are thus looking for nodal types for \(Y\) that capture two kinds of counterfactual causal relations operating on nodes. First, \(X\) must have a positive effect on \(Y\) when \(M\) undergoes the change that results from \(X\)’s positive effect on \(M\). This condition ensures simply that \(X\) has the required effect on \(Y\) in the presence of \(X\)’s effect on \(M\). Second, that change in \(M\), generated by a change in \(X\), must be necessary for \(X\)’s positive effect on \(Y\) to operate. This condition specifies the path, ensuring that \(X\)’s effect actually runs through (i.e., depends on) its effect on \(M\).

Assuming a positive effect of \(X\) on \(M\) (\(\theta^M=\theta^M_{01}\)), we check for these two conditions by applying the following set of queries to \(\theta^Y\):39

  1. Is \(X=1\) a counterfactual cause of \(Y=1\), given \(X\)’s positive effect on \(M\)? Establishing this positive effect of \(X\) involves two queries:

    1. Where \(X=0\), does \(Y=0\)? As we are assuming \(X\) has a positive effect on \(M\), if \(X=0\) then \(M=0\) as well. So, in the potential-outcomes table, the column of interest is the \(X=0, M=0\) column. We can look down this column and eliminate those types in which we do not observe \(Y=0\). This eliminates types \(9\) through \(16\).

    2. Where \(X=1\), does \(Y=1\)? Given \(X\)’s assumed positive effect on \(M\), \(M=1\) under this condition. So, looking down the \(X=1, M=1\) column, we eliminate those types where we do not see \(Y=1\). We retain only types \(2, 4, 6,\) and \(8\).

  2. Is \(X\)’s effect on \(M\) necessary for \(X\)’s positive effect on \(Y\)? That is, do we see \(Y=1\) only if \(M\) takes on the value that \(X=1\) generates (which is \(M=1\))? To determine this, we inspect the counterfactual condition in which \(X=1\) and \(M=0\), and we ask: does \(Y=0\)? It is only if \(Y=0\) when \(X=1\) but \(M=0\) that we know that \(M\) changing to \(1\) when \(X\) goes to \(1\) is necessary for \(X\)’s effect on \(Y\) to operate (i.e., that the effect operates through the \(M\) path). Of the four remaining types, only \(2\) and \(6\) pass this test.

Under these and only these two values of \(\theta^Y\)\(\theta^Y_{0001}\) and \(\theta^Y_{0101}\)—we will see a positive effect of \(X\) on \(Y\) for which the \(M\)-mediated path is causally necessary, given a positive effect of \(X\) on \(M\). While both of these \(\theta^Y\) values meet the two conditions, they are also different from one another in a subtle and interesting way. For type \(\theta^Y_{0101}\), \(X\)’s effect on \(Y\) runs strictly through \(M\): if \(M\) were to change from \(0\) to \(1\) without \(X\) changing, \(Y\) would still change from \(0\) to \(1\). \(X\) is causally important for \(Y\) only insofar as it affects \(M\). In a case of type \(\theta^Y_{0101}\), then, anything else that similarly affects \(M\) would generate the same effect on \(Y\) as \(X\) does. In type \(\theta^Y_{0001}\), in contrast, both \(X\)’s change to \(1\) and the resulting change in \(M\) are necessary to generate \(Y\)’s change to \(1\): \(X\)’s causal effect thus requires the operation of both the mediated and the unmediated pathway. Moreover, here \(X\) itself matters in the counterfactual sense; for a case of type \(\theta^Y_{0001}\), some other cause of \(M\) would not generate the same effect on \(Y\).

We can undertake the same exercise for sequence (2), in which \(X\) first has a negative effect on \(M\), or \(\theta^M=\theta^M_{10}\). Here we adjust the three queries for \(\theta^Y\) to take account of this negative effect. Thus, we adjust query 1a so that we are looking for \(Y=0\) when \(X=0\) and \(M=1\). In query 1b, we look for \(Y=1\) when \(X=1\) and \(M=0\). And for query 2, we want types in which \(Y\) fails to shift to \(1\) when \(X\) shifts to \(1\) but \(M\) stays at \(1\). Types \(\theta_{0010}\) and \(\theta_{1010}\) pass these three tests.

In sum, we can define a query about causal paths as a query about the value of \(\theta\) terms on the causal graph. For the graph in Figure 4.4, asking whether \(X\)’s effect runs via the \(M\)-mediated path is asking whether one of four combinations of \(\theta^M\) and \(\theta^Y\) hold in case:

  • \(\theta^M=\theta^M_{01}\) and (\(\theta^Y=\theta_{0001}\) or \(\theta_{0101}\))
  • \(\theta^M=\theta^M_{10}\) and (\(\theta^Y=\theta_{0010}\) or \(\theta_{1010}\))

It is worth noting how different this formulation of the task of identifying causal pathways is from widespread understandings of process tracing. Scholars commonly characterize process tracing as a method in which we determine whether a mechanism was operating by establishing whether the events lying along that path occurred. As a causal-model framework makes clear, finding out that \(M=1\) (or \(M=0\), for that matter) does not establish what was going on causally. Observing this intervening step does not by itself tell us what value \(M\) would have taken on if \(X\) had taken on a different value, or whether this would have changed \(Y\)’s value. We need instead to conceive of the problem of identifying pathways as one of figuring out the counterfactual response patterns of the variables along the causal chain.

4.5 Conclusion

For each of the causal queries we have described in this chapter, we have discussed several types causal questions that social scientists often pose to their data and have shown how we can define these queries in terms of collections of nodal types on a causal graph. In the chapter appendix, we show how the mapping from causal questions to nodal types can be generalized (as the CausalQueries software package does). In making this analytic move, we prepare the ground for developing an empirical research design. Nodal types cannot themselves be directly observed. However, as we will demonstrate later in the book, defining causal queries as summaries of causal types links observable elements of a causal model to the unobservable objects of inquiry—allowing us to use the former to draw inferences about the latter.

4.6 Chapter appendix

4.6.1 Actual causes

In the main text we dealt with causes in the standard counterfactual sense: antecedent conditions a change in which would have produced a different outcome. Sometimes, however, we are interested in identifying antecedent conditions that were not counterfactual difference-makers but that nonetheless generated or produced the outcome.

Though conceptually complex queries of this form may be quite important for historical and legal applications and so we give an overview of them here though point to Halpern (2016) for an authoritative treatment of these ideas.

We will focus on situations in which an outcome was overdetermined: multiple conditions were present, each of which on their own, could have generated the outcome. Then none of these conditions caused the outcome in the counterfactual sense; yet one or more of them may have been distinctively important in producing the outcome. The concept of an actual cause can be useful in putting a finer point on this kind of causal question.

A motivating example used in much of the literature on actual causes (e.g. N. Hall 2004) imagines two characters, Suzy and Billy, simultaneously throwing stones at a bottle. Both are excellent shots and hit whatever they aim at. Suzy’s stone hits first, and so the bottle breaks. However, Billy’s stone would have hit had Suzy’s not hit, and would have broken the bottle. Did Suzy’s throw cause the bottle to break? Did Billy’s?

By the usual definition of causal effects, neither Suzy’s nor Billy’s action had a causal effect: without either throw, the bottle would still have broken. We commonly encounter similar situations in the social world. We observe, for instance, the onset of an economic crisis and the breakout of war—either of which would be sufficient to cause the government’s downfall—but with (say) the economic crisis occurring first and toppling the government before the war could do so. In this situation, neither economic crisis nor war in fact made a difference to the outcome: take away either one and the outcome remains the same.

To return to the bottle example, while neither Suzy’s nor Billy’s throw is a counterfactual cause, it just seems obvious that Suzy broke the bottle, and Billy did not. We can formalize this intuition by defining Suzy’s throw as the actual cause of the outcome. Using the definition provided by (Halpern 2015), building on (Halpern and Pearl 2005) and others, we say that a condition (\(X\) taking on some value \(x\)) was an actual cause of an outcome (of \(Y\) taking on some value \(y\)), where \(x\) and \(y\) may be collections of events, if:

  1. \(X=x\) and \(Y=y\) both happened
  2. there is some set of variables, \(\mathcal W\), such that if they were fixed at the levels that they actually took on in the case, and if \(X\) were to be changed, then \(Y\) would change (where \(\mathcal W\) can also be an empty set)
  3. no strict subset of \(X\) satisfies 1 and 2 (there is no redundant part of the condition, \(X=x\))

The definition thus describes a condition that would have been a counterfactual cause of the outcome if we were to imagine holding constant some set of events that in fact occurred (and that, in reality, might not have been constant if the actual cause had not in fact occurred).

Let us now apply these three conditions to the Suzy and Billy example. Conditions 1 and 3 are easily satisfied, since Suzy throw and the bottle break (Condition 1), and “Suzy threw” has no strict subsets (Condition 3).

Condition 2 is met if Suzy’s throw made a difference, counterfactually speaking—with the important caveat that, in determining this, we are permitted to condition on (to fix in the counterfactual comparison) any event or set of events that actually happened (or on none at all). To see why Condition 2 is satisfied, we have to think of there being three steps in the process: (1) Suzy and Billy throw, (2) Suzy’s or Billy’s rock hits the bottle, and (3) the bottle breaks. In actuality, Billy’s stone did not hit the bottle, so we are allowed to condition on that fact in determining whether Suzy’s throw was a counterfactual cause (even though we know that Billy’s stone would have hit if Suzy’s hadn’t). Conditional on Billy’s stone not hitting, the bottle would not have broken had Suzy not thrown.

From the perspective of counterfactual causation, it may seem odd to condition on Billy’s stone not hitting the bottle when thinking about Suzy not throwing the stone—since Suzy’s throwing the stone was the very thing that prevented Billy’s from hitting the bottle. It feels close to conditioning on the bottle not being broken! Yet Halpern argues that this is an acceptable thought experiment for establishing the importance of Suzy’s throw since conditioning is constrained to the actual facts of the case. Moreover, the same logic shows why Billy is not an actual cause. The reason is that Billy’s throw is only a cause in those conditions in which Suzy did not hit the bottle. But because Suzy actually hit the bottle, we are not permitted to condition on Suzy not hitting the bottle in determining actual causation. We thus cannot—even through conditioning on actually occurring events—construct any counterfactual comparison in which Billy’s throw is a counterfactual cause of the bottle’s breaking.

The striking result here is that there can be grounds to claim that a condition was the actual cause of an outcome even though, under the counterfactual definition, the effect of that condition on the outcome is 0. (At the same time, all counterfactual causes are automatically actual causes; they meet Condition 2 by conditioning on nothing at all, an empty set \(\mathcal W\).) One immediate methodological implication follows: since actual causes need not be causes, there are risks in research designs that seek to understand causal effects by tracing back actual causes—i.e., the way things actually happened. If we traced back from the breaking of the bottle, we might be tempted to identify Suzy’s throw as the cause of the outcome. We would be right only in an actual-causal sense, but wrong in the standard, counterfactual causal sense. Chains of events that appear to “generate” an outcome are not always causes in the counterfactual sense.40

As with other causal queries, the question “Was \(X=x\) the actual cause of \(Y=y\)?” can be redefined as a question about which combinations of nodal types produce conditions under which \(X\) could have made a difference. To see how, let us run through the Billy and Suzy example again, but formally in terms of a model. Consider Figure 4.5, where we represent Suzy’s throw (\(S\)), Billy’s throw (\(B\)), Suzy’s rock hitting the bottle (\(H^S\)), Billy’s rock hitting the bottle (\(H^B\)), and the bottle cracking (\(C\)). Each endogenous variable has a \(\theta\) term associated with it, capturing its nodal type. We capture the possible “preemption” effect with the arrow pointing from \(H^S\) to \(H^B\), allowing whether Suzy’s rock hits to affect whether Billy’s rock hits.41

For Suzy’s throw to be an actual cause of the bottle’s cracking, we need first to establish that Suzy threw (\(\theta^S=\theta^S_1\)) and that the bottle cracked (\(C=1\)) (Condition 1). Condition 3 is automatically satisfied in that \(\theta^S=\theta^S_1\) has no strict subsets. Turning now to Condition 2, we need Suzy’s throw to be a counterfactual cause of the bottle cracking if we condition on the value of some set of nodes remaining fixed at the values they in fact took on. As discussed above, we know that we can meet this criterion if we condition on Billy’s throw not hitting. To make this work, we need to ensure, first, that Suzy’s throw hits if and only if she throws: so \(\theta^{H^S}=\theta^{H^S}_{01}\). Next, we need to ensure that Billy’s throw does not hit whenever Suzy’s does: this corresponds to any of the four nodal types for \(H^B\) that take the form \(\theta^{H^B}_{xx00}\). Those last two zeroes in the subscript mean simply that \(H^B=0\) whenever \(H^S=1\). Note that the effect of Billy throwing on Billy hitting when Suzy has not thrown—the first two terms in the nodal-type’s subscript—does not matter since we have already assumed that Suzy does indeed throw.

Finally, we need \(\theta^C\) to take on a value such that \(H^S\) has a positive effect on \(C\) when \(H^B=0\) (Billy doesn’t hit) since this is the actual circumstance on which we will be conditioning. This is satisfied by any of the four nodal types of the form \(\theta^C_{0x1x}\). This includes, for instance, a \(\theta^C\) value in which Billy’s hitting has no effect on the bottle (perhaps Billy doesn’t throw hard enough!): e.g., \(\theta^C_{0011}\). Here, Suzy’s throw is a counterfactual cause of the bottle’s cracking. And, as we have said, all counterfactual causes are actual causes. They are, simply, counterfactual causes when we hold nothing fixed (\(\mathcal W\) in Condition 2 is just the empty set).

Notably, we do not need to specify the nodal type for \(B\): given the other nodal types identified, Suzy’s throw will be the actual cause regardless of whether or not Billy throws. If Billy does not throw, then Suzy’s throw is a simple counterfactual cause (given the other nodal types).

The larger point is that actual cause queries can, like all other causal queries, be defined as questions about the values of nodes in a causal model. When we pose the query, “Was Suzy’s throw an actual cause of the bottle cracking?”, we are in effect asking whether the case’s combination of nodal types (or its causal type) matches \(\theta^S_1, \theta^B_x, \theta^{H^B}_{xx00}, \theta^{H^S}_{01}, \theta^C_{0x1x}\).

Likewise, if want to ask how often Suzy’s throw is an actual cause, in a population of throwing rounds, we can address this query as a question about the joint distribution of nodal types. We are then asking how common the qualifying combinations of nodal types are in the population given the distribution of types at each node.

This DAG is a graphical representation of the simple causal setup in which the effect of $X$ on $Y$ in a given case depends on the case's nodal type for $Y$, represented by $\theta^Y$.

Figure 4.5: This DAG is a graphical representation of the simple causal setup in which the effect of \(X\) on \(Y\) in a given case depends on the case’s nodal type for \(Y\), represented by \(\theta^Y\).

Actual causes are conceptually useful whenever there are two sufficient causes for an outcome, but one preempts the operation of the other. For instance, we might posit that both the United States’ development of the atomic bomb was a sufficient condition for U.S. victory over Japan in World War II, and that U.S. conventional military superiority was also a sufficient condition and would have operated via a land invasion of Japan. Neither condition was a counterfactual cause of the outcome because both were present. However, holding constant the absence of a land invasion, the atomic bomb was a difference-maker, rendering it an actual cause. The concept of actual cause thus helps capture the sense in which the atomic bomb distinctively contributed to the outcome, even if it was not a counterfactual cause.

An extended notion (Halpern 2016, p 81) of actual causes restricts the imagined counterfactual deviations to states that are more likely to arise (more “normal”) than the factual state. We will call this notion a “notable cause.” We can say that one cause, \(A\), is “more notable” than another cause, \(B\), if a deviation in \(A\) from its realized state is (believed to be) more likely than a deviation in \(B\) from its realized state.

For intuition, we might wonder why a Republican was elected to the presidency in a given election. In looking at some minimal winning coalition of states that voted Republican, we might distinguish between a set of states that always vote Republican and a set of states that usually go Democratic but voted Republican this time. If the coalition is minimal winning, then every state that voted Republican is a cause of the outcome in the standard (difference-making) sense. However, only the states that usually vote Democratic are notable causes since it is only for them that the counterfactual scenario (voting Democratic) was more likely to arise than the factual scenario. In a sense, we take the “red” states’ votes for the Republican as given—placing them, as it were, in the causal background—and identify as “notable” those conditions that mattered and easily could have gone differently. By the same token, we can say that, among those states that voted Republican this time, those that more commonly vote Democratic are more notable causes than those that less commonly vote Democratic.

How notable a counterfactual cause is can be expressed as a claim about the distribution of a set of nodal types. For instance, if we observe \(R_j=1\) for state \(j\) (it voted Republican), then the notability of this vote directly increases with our belief about the probability that \(\theta^{R_j}=\theta_0^{R_j}\)—that is, with the probability that the state’s vote could have gone the other way. The higher the probability that state could have voted Democratic, the more notable a cause we consider its voting Republican.

4.6.2 General procedure for mapping queries to causal types

In the next parts of this appendix, we describe a general method for mapping from queries to causal types. In particular, we describe the algorithm used by the CausalQueries software package to define queries and a walk-through of how to use CausalQueries to identify the causal types associated with different queries.

The algorithm calculates the full set of outcomes on all nodes, given each possible causal type and a collection of controlled conditions (“do operations”). Then each causal type is marked as satisfying the query or not. This in turn then tells us the set of types that satisfy a query. Quantitative queries, such as the probability of a query being satisfied, or the average treatment effect, can then be calculated by taking the measure of the set of causal types that satisfies the query.

First, some notation.

Let \(n\) denote the number of nodes. Label the nodes \(V_1, \dots V_n\) subject to the requirement that each node’s parents precede it in the ordering. Let \(pa_j\) denote the set of values of the parents of node \(j\) and let \(V_j(pa_j, \theta_t)\) denote the value of node \(j\) given the values of its parents and the causal type \(\theta_t\).

The primitives of a query are questions about the values of outcomes, \(V\), given some set of controlled operations \(x\).

  • let \(x = (x_1, \dots x_n)\) denote a set of do operations where each \(x_i\) takes on a value in \(\{-1,0,1\}\). here -1 indicates “not controlled”, 0 means set to 0 and 1 means set to 1 (this set can be expanded if \(V\) is not binary)
  • let \(V(x, \theta_t)\) denote the values \(V\) (the full set of nodes) takes given \(\theta_t\)
  • a “simple query” is a function \(q(V(x, \theta_t))\) which returns TRUE if \(V(x, \theta_t)\) satisfies some condition and FALSE otherwise.

Queries are summaries of simple queries. For instance, for nodes \(X\) and \(Y\):

  • Query \(Q_1:\mathbb{1}(Y(X=1)=1))\) asks whether \(Y=1\) when \(X\) is set to 1. This requires evaluating one simple query.
  • Query \(Q_2:\mathbb{1}(Y(X=1)=1) \& \mathbb{1}(Y(X=0)=0))\) is composed of two simple queries: the first returns true if \(Y\) is 1 when \(X\) is set to 1, the second returns true if \(Y\) is 0 when \(X\) is set to 0; both conditions holding corresponds to a positive effect on a unit.
  • Query \(Q_3:E((\mathbb{1}(Y(X=1)=1) \& (Y(X=0)=0)) - (\mathbb{1}(Y(X=1)=0) \& \mathbb{1}(Y(X=0)=1))\) asks for the average treatment effect, represented here using four simple queries: the expected difference between positive and negative effects. This query involves weighting by the probability of the causal types.

Then to calculate \(V(x, \theta_t)\):

  1. Calculate \(v_1\), the realized value of the first node, \(V_1\), given \(\theta_t\). This is given by \(v_1 = x_1\) if \(x_1 \neq -1\) and by \(\theta_t^{V_1}\) otherwise.
  2. For each \(j \in 2...n\) calculate \(v_j\) using either \(v_j = x_j\) if \(x_j \neq -1\) and \(V_{j}(pa_j, \theta_t)\) otherwise, where the values in \(pa_j\) are determined in the previous steps.

We now have the outcomes, \(V\), for all nodes given the operations \(x\) and so can determine \(q(V(x))\). From there we can calculate summaries of simple queries across causal types.

A last note on conditional queries. Say we are interested in an attribution query of the form: what is the probability that \(X\) causes \(Y\) in a case in which \(X=1\) and \(Y=1\). In this case define simple query \(q_1\) which assesses whether \(X\) causes \(Y\) for a given \(\theta_t\) and simple query \(q_2\) which assesses whether \(X=1\) and \(Y=1\) under \(\theta_t\). We then calculate the conditional query by conditioning on the set of \(\theta\)s for which \(q_2\) is true and evaluating the share of these for which \(q_2\) is true (weighting by the probability of the causal types).

4.6.3 Identifying causal types for queries with CausalQueries

We first demonstrate how queries are calculated using the CausalQueries package for a chain model of the form \(X \rightarrow M \rightarrow Y\) and then generalize.

Imagine first a chain model of this form in which we assume no negative effects of \(M\) on \(X\) or \(M\) on \(Y\). We will also suppose that in fact \(X=1\), always. Doing this keeps the parameter space a little smaller for this demonstration but also serves to demonstrate that a causal model can make use of the counterfactual possibility that a node takes on a particular value even if it never does in fact.

We then ask two questions:

  • Q1. What is the probability that \(X\) has a positive effect on \(Y\)? (“POS”)
  • Q2. What is the probability that \(X=1\) causes \(Y=1\) in cases in which \(X=1\) and \(Y=1\)? (“POC”)

To answer these two queries we define simple query \(q_1\) which assesses whether \(X\) causes \(Y\) for each \(\theta\) and a second simple query \(q_2\) which assesses whether \(X=1\) and \(Y=1\) for each \(\theta\). In this example the first simple query involves some do operations, the second does not.

Code to answer these two simple queries is shown below and the output is shown in Table 4.1 (one row for each causal type).

model <- make_model("X -> M -> Y") |>
         set_restrictions("X[]==0") |>
         set_restrictions("M[X=1] < M[X=0]") |>
         set_restrictions("Y[M=1] < Y[M=0]")   

q1 <- "Y[X = 1] > Y[X = 0]"
q2 <- "X == 1 & Y == 1"

df <- data.frame(
  a1 = get_query_types(model, q1)$types,
  a2 = get_query_types(model, q2)$types,
  p  = get_type_prob(model))
Table 4.1: Set of causal types in the model that satisfy q1 and q2 along with the probability of the type.
a1 a2 p
X1.M00.Y00 FALSE FALSE 0.111
X1.M01.Y00 FALSE FALSE 0.111
X1.M11.Y00 FALSE FALSE 0.111
X1.M00.Y01 FALSE FALSE 0.111
X1.M01.Y01 TRUE TRUE 0.111
X1.M11.Y01 FALSE TRUE 0.111
X1.M00.Y11 FALSE TRUE 0.111
X1.M01.Y11 FALSE TRUE 0.111
X1.M11.Y11 FALSE TRUE 0.111

The answer to the overall queries are then (1) the expected value of (the answers to) \(q_1\) given weights \(p\) and (2) the expected value of (the answers to) \(q_1\) given \(q_0\) and weights \(p\). See Table 4.2.

df |> summarize(
  POS = weighted.mean(a1, p),
  POC = weighted.mean(a1[a2], p[a2])
  )
Table 4.2: Calculated answers to two queries.
POS POC
0.111 0.2

Given the equal weighting on causal types, these answers reflect the fact that for five of nine causal types we expect to see \(X=1\) and \(Y=1\) but that; the causal effect is present for only one of nine causal types and for one of the five causal types that exhibit \(X=1\) and \(Y=1\).

In practice querying is done in one step. Like this for an unconditional query:

#POS
query_model(model, query = "Y[X = 1] > Y[X = 0]")

Like this for a conditional query:

# POC
query_model(model, query = "Y[X = 1] > Y[X = 0]", 
              given = "X == 1 & Y == 1")

The same procedure can be used to identify any set of types that correspond to a particular query. Table 4.3 illustrates, showing syntax for model definition and queries along with implied types identified using get_query_types.

Table 4.3: Examples of queries and corresponding causal types. The probability of the query is the probability of the causal types that imply the theory.
Model Query Given Interpretation Types
X -> Y Y[X=1] > Y[X=0] Probability that X has a positive effect on Y X0.Y01, X1.Y01
X -> Y Y[X=1] < Y[X=0] X == 1 Probability that X has a negative effect on Y among those for whom X=1 X1.Y10
X -> Y Y[X=1] > Y[X=0] X==1 & Y==1 Probability that Y=1 is due to X=1 (Attribution) X1.Y01
X -> Y <- W Y[X=1] > Y[X=0] W == 1 Probability that X has a positive effect on Y for a case in which W = 1 (where W is possibly defined post treatment) W1.X0.Y0001, W1.X1.Y0001, W1.X0.Y1001, W1.X1.Y1001, W1.X0.Y0011, W1.X1.Y0011, W1.X0.Y1011, W1.X1.Y1011
X -> Y <- W Y[X=1, W = 1] > Y[X=0, W = 1] W==0 Probability that X has a positive effect on Y if W were set to 1 for cases for which in fact W=0 W0.X0.Y0001, W0.X1.Y0001, W0.X0.Y1001, W0.X1.Y1001, W0.X0.Y0011, W0.X1.Y0011, W0.X0.Y1011, W0.X1.Y1011
X -> Y <- W Y[X=1] > Y[X=0] Y[W=1] > Y[W=0] Probability that X has a positive effect on Y for a case in which W has a positive effect on Y W0.X0.Y0110, W1.X1.Y0001, W1.X1.Y1001, W0.X0.Y0111
X -> Y <- W (Y[X=1, W = 1] > Y[X=0, W = 1]) > (Y[X=1, W = 0] > Y[X=0, W = 0]) W==1 & X==1 Probability of a positive interaction between W and X for Y; the probability that the effect of X on Y is stronger when W is larger W1.X1.Y0001, W1.X1.Y1001, W1.X1.Y1011
X -> M -> Y <- X Y[X = 1, M = M[X=1]] > Y[X = 0, M = M[X=1]] X==1 & M==1 & Y==1 The probability X would have a positive effct on Y if M were controlled to be at the level it would take if X were 1 for units for which in fact M==1 X1.M01.Y0001, X1.M11.Y0001, X1.M01.Y1001, X1.M11.Y1001, X1.M01.Y0101, X1.M11.Y0101, X1.M01.Y1101, X1.M11.Y1101
X -> M -> Y <- X (Y[M = 1] > Y[M = 0]) & (M[X = 1] > M[X = 0]) Y[X=1] > Y[X=0] & M==1 The probability that X causes M and M causes Y among units for which M = 1 and X causes Y X1.M01.Y0001, X1.M01.Y0011

All of these queries correspond to the probability of some set of types. We might call these simple queries. Other complex queries (including the average treatment effect) can be thought of as operations on the simple queries.

For instance:

  • the average treatment effect, Y[X=1] - Y[X=0] is the difference between the simple queries Y[X=1] > Y[X=0] and Y[X=1] < Y[X=0], or more simply the difference between the queries Y[X=1]==1 and Y[X=0]==1

  • the interaction query Q = (Y[X = 1, W = 1] - Y[X = 0, W = 1]) - (Y[X = 1, W = 0] - Y[X = 0, W = 0]) is similarly a combination of the simple queries (Y[X = 1, W = 1] ==1, Y[X = 0, W = 1]==1, Y[X = 1, W = 0]==1, and Y[X = 0, W = 0]==1.

For linear complex queries like this we can proceed by identifying a set of positive or negative coefficients for each causal type that can be used to combine the probabilities of the types.

For instance for the interaction query, Q, get_query_types(model, Q) would identify a set of positive or negative coefficients for each causal type that range from -2 to 2, with a 2, for instance corresponding to a type for which a change in \(W\) changes the effect of \(X\) from -1 to 1. See Table 4.4 for weights on types when \(X=1\) and \(W =1\).

Table 4.4: Coefficients on causal types for an interaction query
weight cases
-2 W1.X1.Y0110
-1 W1.X1.Y0100, W1.X1.Y0010, W1.X1.Y1110, W1.X1.Y0111
0 W1.X1.Y0000, W1.X1.Y1100, W1.X1.Y1010, W1.X1.Y0101, W1.X1.Y0011, W1.X1.Y1111
1 W1.X1.Y1000, W1.X1.Y0001, W1.X1.Y1101, W1.X1.Y1011
2 W1.X1.Y1001

References

Hall, Ned. 2004. “Two Concepts of Causation.” Causation and Counterfactuals, 225–76.
Halpern, Joseph Y. 2015. A modification of the Halpern-Pearl definition of causality.” arXiv Preprint arXiv:1505.00162.
———. 2016. Actual Causality. MIT Press.
Halpern, Joseph Y, and Judea Pearl. 2005. “Causes and Explanations: A Structural-Model Approach. Part i: Causes.” The British Journal for the Philosophy of Science 56 (4): 843–87.
Humphreys, Macartan, and Alan M Jacobs. 2015. “Mixing Methods: A Bayesian Approach.” American Political Science Review 109 (04): 653–73.
Imai, Kosuke, Luke Keele, and Dustin Tingley. 2010. “A General Approach to Causal Mediation Analysis.” Psychological Methods 15 (4): 309.
Menzies, Peter. 1989. “Probabilistic Causation and Causal Processes: A Critique of Lewis.” Philosophy of Science, 642–63.
———. 2010. “An Introduction to Causal Inference.” The International Journal of Biostatistics 6 (2): 1–62.

  1. A little more carefully: insofar as we believe the effect is eithe rpositive or it is not, the true answer to the question—the estimand—is a yes or a no; the probability is the answer we provide which captures our beliefs about the estimand.↩︎

  2. A reminder that, with two-parent nodes, the nodal-type subscript ordering is \(Y|(X_1=0, X_2=0); Y|(X_1=1, X_2=0); Y|(X_1=0, X_2=1); Y|(X_1=1, X_2=1)\).↩︎

  3. Note also that \(\lambda^Y\) can be thought of as itself drawn from a distribution, such as a Dirichlet. The hyperparameters of this underlying distribution of \(\lambda\) would then represent our uncertainty over \(\lambda\) and hence over average causal effects in the population.↩︎

  4. A very similar question is taken up in work on mediation where the focus is on understanding quantities such as the “indirect effect” of \(X\) on \(Y\) via \(M\). Formally, the indirect effect would be \[Y(X=1, M = M(X=1,\theta^M), \theta^Y) - Y(X = 1, M = M(X=0, \theta^M), \theta^Y))\], which captures the difference to \(Y\) if \(M\) were to change in the way that it would change due to a change in \(X\), but without an actual change in \(X\) Imai, Keele, and Tingley (2010).↩︎

  5. Using standard potential outcomes notation, we can express the overall query, conditioning on a positive effect of \(X\) on \(M\), via the inequality \(Y(1, M(1)) - Y(0, M(0)) > Y(1, M(0)) - Y(0, M(0))\). The three specific queries formulated below simply correspond to the three unique elements of this expression. We can also readily map the path query that we are defining here—does the positive effect of \(X\) on \(Y\) depend on \(X\)’s effect on \(M\)—onto a query posed in terms of indirect effects. For instance, in our binary setup, conditioning our path query on a positive causal effect of \(X\) on \(Y\), a positive effect of \(X\) on \(M\), and an imagined change from \(X=0\) to \(X=1\) generates precisely the same result (identifies the same \(\theta^Y\) types) as asking which \(\theta^Y\) types are consistent with a positive indirect effect of \(X\) on \(Y\), conditioning on a positive total effect and \(X=1\).↩︎

  6. Perhaps more surprising, it is possible that the expected causal effect is negative but that \(X\) is an actual cause in expectation. For instance, suppose that 10% of the time Suzy’s shot intercepts Billy’s shot but without hitting the bottle. In that case the average causal effect of Suzy’s throw on bottle breaking is \(-0.1\) yet 90% of the time Suzy’s throw is an actual cause of bottle breaking (and 10% of the time it is an actual cause of non-breaking). For related discussions, see Menzies (1989).↩︎

  7. We do not need an arrow in the other direction because Suzy throws first.↩︎