Chapter 1 What and why

The CausalQueries package is designed to make it easy to build, update, and query causal models defined over binary variables.

The causal models we use are of the form described by Pearl (2009), with Bayesian updating on the causal models similar to that described by Cowell et al. (1999). Though drawing heavily on the Pearlian framework, the approach specifies parameters using potential outcomes (specifically, using principal stratificaton (Frangakis and Rubin 2002)) and uses the stan framework (Carpenter et al. 2017) to implement updating.

We will illustrate how to use these models for a number of inferential tasks that are sometimes difficult to implement:

  • Bayesian process tracing: How to figure out what to believe about a case-level causal claim given one or more pieces of evidence about a case (Bennett and Checkel 2015).
  • Mixed methods: How to combine within-case (“qualitative”) evidence with cross-case (“quantitative”) evidence to make causal claims about cases or groups (Humphreys and Jacobs 2015).
  • Counterfactual reasoning: How to estimate the probability that an outcome is due to a cause and other counterfactual queries (effects of causes, probability of sufficiency, explanation) (Tian and Pearl 2000).
  • Inference in the absence of causal identification: What inferences can you draw when causal quantities are not identified? How can you figure out whether a causal quantity is identified (Manski 1995)?
  • Extrapolation, data fusion: How to draw inferences from mixtures of observational and experimental data? How to draw out-of-sample inferences given a theory of how one place differs from another (Bareinboim and Pearl 2016)?

The functions in the package allow these different kinds of questions to be answered using the same three basic steps — make_model, update_model, and query_model — without recourse to specific estimators for specific estimands.

The approach, however, requires thinking about causal inference differently from how many in the experimental tradition are used to.

1.1 Two approaches to inference

We contrast the two approaches to causal inference using the simplest problem: the analysis of data from a two-arm experimental trial to determine the average effect of a treatment on an outcome.

In the canonical experimental trial, a treatment, \(X\), is randomly assigned and outcomes, \(Y\), are measured in both treatment and control groups. The usual interest is in understanding the average effect of \(X\) on \(Y\) (which we will assume are both binary).

The classic approach to answering this question is to take the difference between outcomes in treatment and control groups as an estimate of this average treatment effect. Estimates of uncertainty about this answer can also be generated using information on variation in the treatment and control groups.

It is also possible, however, to answer this question—along with a rich variety of other questions—using a causal models approach (Pearl 2009).

For intuition for how a causal model approach works, say instead of simply taking the differences between treatment and control one were to:

  1. construct a simple model in which (a) \(X\) is randomly assigned, with 50% probability and (b) we specify some prior beliefs about how \(Y\) responds to \(X\), which could be positively, negatively, or not at all, and possibly different for each unit

  2. update beliefs about how \(Y\) responds to \(X\) given the data on \(X\) and \(Y\)

Though very simple, this \(X \rightarrow Y\) model adds a great deal of structure to the situation It is, in fact, more of a model than you need if all you want to do is estimate average treatment effects. But it is nevertheless enough of a model to let you estimate quantities—such as causal attribution—that you could not estimate without a model, even given random assignment.

1.2 Recovering the ATE with Difference in Means

To illustrate, imagine that in actuality (but unknown to us) \(X\) shifts \(Y\) from 0 to 1 in 50% of cases while in the rest \(Y\) is 0 or 1 regardless.

We imagine we have access to some data in which treatment, \(X\), has been randomly assigned:

The classic experimental approach to causal inference is to estimate the effect of \(X\) on \(Y\) using differences in means: taking the difference between the average outcome in treatment and the average outcome in control. Thanks to randomization, in expectation that difference is equal to the average of the differences in outcomes that units would exhibit if they were in treatment versus if they were in control—that is, the causal effect.

Table 1.1: Inferences on the ATE from differences in means
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
X 0.484 0.028 17.48 0 0.43 0.538 996.4

This approach gets us an accurate and precise answer, and it’s simply done (here using a function from the estimatr package).

1.3 Recovering the ATE with a Causal Model

The model-based approach takes a few more lines of code and is implemented in CausalQueries as follows.

First we define a model, like this:

Implicit in the definition of the model is a set of parameters and priors over these parameters. We discuss these in much more detail later, but for now we will just say that priors are uniform over possible causal relations.

Second, we update the model, like this:

Third, we query the model like this:

Table 1.2: Inferences on the ATE from updated model
Query Given Using mean sd
Q 1 - posteriors 0.48 0.028

We see that the answers we get from the differences-in-means approach and the causal-model approach are about the same, as one would hope.

1.4 Going further

In principle, however, the causal models approach can let you do more. The third section of this guide is full of examples, but for a simple one consider the following: say, instead of wanting to know the average effect of \(X\) on \(Y\) you wanted to know, “What is the probability that \(X\) caused \(Y\) in a case in which \(X=Y=1\)?”

This is a harder question. Differences-in-means is an estimation strategy tied to a particular estimand, and it does not provide a good answer to this question. However, with a causal model in hand, we can ask whatever causal question we like, and get an answer plus estimates of uncertainty around the answer.

Here is the answer we get:

Table 1.3: Causes of effects estimand
Query Given Using mean sd
Q 1 X==1 & Y==1 posteriors 0.798 0.095

In this case we are asking for those cases in which \(X=1\) and \(Y=1\), what are the chances that \(Y\) would have been \(0\) if \(X\) were \(0\)?

Note, however, that while the model gave a precise answer to the ATE question, the answer for the causes-of-effects estimand is not precise. Moreover, more data won’t reduce the uncertainty substantially. The reason is that this estimand is not identified.

This then is a situation in which we can ask a question about a quantity that is not identified and still learn a lot. We will encounter numerous examples like this as we explore different causal models.


Bareinboim, Elias, and Judea Pearl. 2016. “Causal Inference and the Data-Fusion Problem.” Proceedings of the National Academy of Sciences 113 (27): 7345–52.

Bennett, Andrew, and Jeffrey T Checkel. 2015. Process Tracing. Cambridge University Press.

Carpenter, Bob, Andrew Gelman, Matthew D Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li, and Allen Riddell. 2017. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 76 (1).

Cowell, Robert G, Philip Dawid, Steffen L Lauritzen, and David J Spiegelhalter. 1999. Probabilistic Networks and Expert Systems. Springer.

Frangakis, Constantine E, and Donald B Rubin. 2002. “Principal Stratification in Causal Inference.” Biometrics 58 (1): 21–29.

Humphreys, Macartan, and Alan M Jacobs. 2015. “Mixing Methods: A Bayesian Approach.” American Political Science Review 109 (04): 653–73.

Manski, Charles F. 1995. Identification Problems in the Social Sciences. Harvard University Press.

Pearl, Judea. 2009. Causality. Cambridge university press.

Tian, Jin, and Judea Pearl. 2000. “Probabilities of Causation: Bounds and Identification.” Annals of Mathematics and Artificial Intelligence 28 (1-4): 287–313.