Book on causal models for qualititative and mixed methods research

Alan Jacobs and I have been working on figuring out principles for simultaneously drawing inferences from qualitative and quantitative data. The key idea is that when scholars use qualitative inference they update beliefs about causal effects (or more, generally about their model of the world, \(M\)) by making inferences using data about many facts of a given case (\(D_1\)). They estimate a posterior \(\Pr(M \vert D_1)\). Quantitative scholars update beliefs about causal effects by making inferences using data about a few facts about many cases (\(D_2\)), forming posterior \(\Pr(M \mid D_2)\). From there it’s not such a huge thing to make integrated inferences of the form \(\Pr(M \vert D_1\&D_2)\).

Simple as that sounds, people do not do this, but doing it opens up many insights about how we learn from cases and how we aggregate knowledge. The broad approach becomes considerably more powerful when causal models are used to justify beliefs on data patterns.

The R package CausalQueries can be used to make, update, and query causal models. Users provide a causal statement of the form X -> M <- Y; M <-> Y which is interpreted as a structural causal model over a collection of binary variables. CausalQueries can then (1) identify the set of principal strata—causal types—required to characterize all possible types of causal relations between nodes consistent with the causal statement (2) determine a set of parameters needed to characterize distributions over these types (3) update beliefs over the distribution of causal types, using a stan model and (4) pose a wide range of causal queries of the model, using either the prior distribution, the posterior distribution, or a user-specified candidate vector of parameters

Process tracing is a strategy for inferring within-case causal effects from observable implications of causal processes. Bayesian nets, developed by computer scientists and used now in many disciplines, provide a natural framework for describing such processes, characterizing causal estimands, and assessing the value added of additional information for understanding different causal estimands. We describe how these tools can be used by scholars of process tracing to justify inference strategies with reference to lower level theories and to assess the probative value of new within-case information.

"I give an illustration of a simple problem in which a Bayesian researcher can choose between random assignment of a treatment or delegating assignment to an informed—but motivated—agent. In effect she compares between learning from an RCT or from an observational study. She evaluates designs according to an expected squared error criterion (ESE). I show that for a small problem (n “ 2) if she starts with a prior expectation of no treatment effect but believes that the agent is an advocate of treatment with probability q (and otherwise an opponent) then for all values of q she does at least as well delegating assignment as she does from an RCT and she does strictly better as long as q ≠0.5. For other priors on treatment effects, randomization can dominate delegation or be dominated by it. As n grows the expected errors from an RCT design fall but errors from delegated assignment do not. Although there is always some prior such that a delegated procedure beats randomized assignment, the converse is not true. For a given prior there may be no delegated procedure that trumps an RCT. With uniform priors for example the RCT dominates delegated assignment for all beliefs on agent motivations when n ≥4. "

In many contexts, treatment assignment probabilities differ across strata or are correlated with some observable third variables. Regression with covariate adjustment is often used to account for these features. It is known however that in the presence of heterogeneous treatment effects this approach does not yield unbiased estimates of average treatment effects. But it is not well known how estimates generated in this way diverge from unbiased estimates of average treatment effects. Here I show that biases can be large, even in large samples. However I also find conditions under which the usual approach provides interpretable estimates and I identify a monotonicity condition that ensures that least squares estimates lie between estimates of the average treatment effects for the treated and the average treatment effects for the controls. The monotonicity condition can be satisfied for example with Roy-type selection and is guaranteed in the two stratum case.