Chapter 10 Integrated Inferences Application
We extend the analysis of Chapter 8 using a model that lets us update about causal processes from data on inequality and democratization from many cases and data on causal processes—mobilization and pressures—from a subset of cases. We then use the updated model to draw both populationlevel and caselevel inferences.
In Chapter 8 we took the model as given and sought to draw inferences individual cases given data on those cases. In this chapter the model becomes an object that we both learn from and learn about. In essence, we use the data on many cases to update our beliefs about the general model and then use this “trained” model to make inferences about cases.
Instead of positing a belief over the nodal types for a given case, \(\theta\), we now need to posit a belief over the distribution of nodal types—that is, over \(\lambda\). Whereas in the simple processtracing model we specified the shares of nodal types for \(M\) (for instance), we now specify a prior distribution over the nodal type shares. , We do the same, of course, for all nodes. Because we set a prior distribution over nodal types (rather than fixed proportions), we can now update on these populationlevel distributions as the model confronts data.
The same applies to beliefs about confounding. Recall that we allow for unobserved confounding by allowing \(\lambda\) to include beliefs about the joint distributions of nodal types, and we set priors on these joint distributions as well. In the application below, we focus on potential confounding in the relationship between inequality and mobilization: the possibility that inequality may be more or less likely in places where inequality would induce mobilization. Here we do not express informed prior beliefs about the direction or magnitude of such confounding; we set up the parameter matrix to allow for the possibility of confounding and set a flat prior over its direction and magnitude. We can, in turn, learn about confounding from the data.
10.1 Training a Model of Inequality and Democratization
We begin with the same basic model as we used in Chapter 8, with inequality (\(I\)) potentially affecting democratization (\(D\)) both through a direct pathway and through an indirect pathway mediated by mobilization (\(M\)). International pressure (\(P\)) is also a “parent” of democratization.
Further, we impose the same set of qualitative restrictions, ruling out a negative effect of inequality on mobilization, a direct positive effect of inequality on democratization, a negative effect of mobilization on democracy, and a negative effect of pressure on democratization. Note that this setup allows for inequality to have a positive (through mobilization) effect on democratization, a negative (direct) effect on democratization, or no effect at all.
Finally, we allow for confounding. The theoretical intuition we want to capture in the model is that the level of inequality could be endogenous to inequality’s effect on mobilization. In particular, in places where mobilization would pose a mobilizational threat, governments may work harder to reduce inequality. To allow for this possibility, we need to create distinct elements of \(\lambda\) representing the conditional distribution of \(I\)’s nodal types given \(M\)’s: one parameter for \(\theta^I\)’s distribution when \(M\)’s nodal type is \(\theta^M_{01}\), and another parameter for \(\theta^I\)’s distribution when \(M\)’s nodal type is something else.
This model, with confounding, is represented graphically as in Figure 10.1. The possibility of confounding is represented with the bidirected edge, connecting \(I\) and \(M\).
10.1.1 Data
To train the model, we add data.
As in Chapter 8, we will confront the model with data drawn from our coding of the case narratives in the Supplementary Material for Haggard and Kaufman (2012). However, rather than implementing the analysis casebycase, we now derive leverage from the joint distribution of the data available across all cases.
Table 10.1 gives a snapshot of the data.
Case  P  I  M  D 

Afghanistan  1  0  
Albania  0  0  1  1 
Algeria  0  0  
Angola  1  0  
Argentina  0  0  1  1 
Bangladesh  0  0  0  1 
Note that this is not a rectangular dataset in that Haggard and Kaufman’s collection of clues was conditional on the outcome, \(D=1\): they gathered qualitative data on the presence of international pressure and the presence of massmobilization only for those cases that democratized. This is not an uncommon caseselection principle. The analyst often reasons that more can be learned about how an outcome arises by focusing in on cases where the outcome of interest has in fact occurred. (We assess this caseselection intuition, in the context of modelbased inferences, in Chapter 13.)
The raw correlations between variables is shown in Table 10.2. Some correlations are missing because, as mentioned, data on some variables were only gathered conditional on the values of others. For those quantities where we do see correlations, they are not especially strong. There is, in particular, a weak overall relationship between inequality and democratization—though, of course, this is consistent with inequality having heterogeneous effects across the sample. The strongest correlation in the data is between \(P\) and \(M\), which are assumed to be uncorrelated in the model, though this correlation is also quite weak.
P  I  M  D  

P  1.000  0.157  0.177  
I  0.157  1.000  0.114  0.154 
M  0.177  0.114  1.000  
D  0.154  1.000 
10.1.2 Case level queries
With data and model in hand, we can now update our model to get posteriors on the distribution \(\lambda\) from which we can generate beliefs over all causal relations.
What do we find?
10.1.2.1 Did inequality cause democratization?
We have used the data to update on \(\lambda\): our beliefs about the distributions of nodal types, including about their joint distributions (i.e., confounding). We first use this to make claims about types, similar to what we did in Chapter 8 but now with a model that has been trained on data.
The results are shown in Figure 10.2. Looking at the \(I=0, D=1\) column first, we ask if low inequality caused democratization. Looking at the \(I=1, D=1\) column first, we ask whether high inequality caused democratization. In each case we assess how our answers would differ based on what we might observe regarding mobilization and international pressure.
Overall we are more convinced that inequality mattered in those cases in which inequality was low. Knowledge regarding mobilization and pressure affect these beliefs, as before; with greatest confidence that low inequality mattered in those cases in which we do not also see mobilization or pressure – in essence alternative explanations are ruled out. Thus lower inequality seems a better explanation for Mexico than for Nicaragua. For those cases in which we do see inequality and democratization, we again are more confidence of a relationship when there is low external pressure – but also where there is mobilization. Absence of mobilization rules out this cause in the model (as before). Of the examples, Sierra Leone 1996 is the most likely case where democratization was due to inequality, but even in this case the chances are quite low.
10.1.2.2 Did inequality prevent democracy?
As before we can also ask questions about causes that have not democratized—even though we have no additional data about these cases in particular (9)and so, from this data set, no examplars for these cases).
We see answers in the last columns of Figure 10.2. Overall we think inequality was more likely to have mattered in those cases in which there is inequality and no democratization. Among cases with low inequality, we infer that low inequality was a cause of non democratization especially if we do see external pressure but we do not see mobilization (second row). If we do see mobilization we do not attribute the failure to democratize to the low inequality—greater inequality would not have made a difference via mobilization since populations were mobilized already. Among cases with high inequality and no democratization we think that the inequality was preventing the democratization especially in cases where we see mobilization and pressure—factors that would otherwise have favored democratization.
Overall the patterns are very similar here to what we saw in Chapter 8, though slightly stronger in the case of low inequality and slightly weaker in the case of higher inequality.
10.1.3 Population level queries
One set of questions we can ask of the updated model is about the probability that high inequality causes democratization. We can pose this question at different levels of conditioning. For instance, we can ask:
For all cases. For what proportion of cases in the population does inequality have a positive effect on democratization?
For all cases displaying a given causal state and outcome. Looking specifically at those cases that in fact had high inequality and democratized, for what proportion was the high inequality a cause of democratization?
For cases displaying a given causal state and outcome, and with additional clues present or absent. What if we have also collected clues on mediating or moderating nodes? For instance, for what proportion of highinequality, democratizing cases with mobilization did inequality cause the outcome? For what proportion without mobilization? Likewise for the presence or absence of international pressure? Importantly, comparing a given estimate with and without a given clue amounts to an assessment of the clue’s probative value.
We answer this question by using the posterior distributions for each of these quantities. We can define our queries quite simply in terms of the causal types that correspond to the effect of interest and then take the conditional probability of these. Results are in Figure 10.3.
In Figure 10.3, we graph posterior distributions for these queries.
We can see that the share of cases overall in which inequality causes democratization is estimated to be very low, with a good deal of confidence. The proportion is considerably higher for those cases that in fact experienced high inequality and democratization. The proportion of positive causal effects is believed to be even higher for those in which mobilization occurred. Moreover, the proportion of \(I=1, D=1\) cases with a positive effect of inequality on democratization is even higher when an alternative cause—international pressure—is absent, though our uncertainty about this share is also very high.
We also see that the absence of mobilization tells us for certain that democratization was not caused by inequality. Interestingly, however, this result derives purely from the model restrictions, rather than from the data: under the restrictions we imposed, a positive effect of inequality can operate only through mobilization.
Turning now to the cases in which democratization did not occur, the second column of Figure 10.3 asks for what proportion of cases overall inequality has a negative effect on democratization; for what proportion of \(I=1, D=0\) cases inequality prevented democratization; and this latter query conditional on different clue realizations.
We see that inequality appears, overall, more commonly to prevent democratization than to cause it. We are, moreover, most confident that inequality played a preventive role in those cases in which there was mobilization and international pressure—both of which could have generated democratization—but still no democratization occurred.
10.1.4 Explorations: How much do we get from the model vs. the data?
We might wonder, at the same time, how much we are in fact learning from the data, as compared to what we built into the model at the outset, including through the monotonicity restrictions that we imposed. To examine this, in Figure 10.4 we plot our prior and our posterior on the same set of queries. We see that there is almost no shift in beliefs for the positiveeffect queries (lower set), with larger, but nevertheless modest, shifts in means for the negativeeffect queries (upper set). However, our uncertainty about negative effects shrinks some, though remains quite high.
By comparing the prior and posterior estimates given \(M=1\) and given \(M=0\), we can also assess whether we have learned about \(M\)’s informativeness. We see that, in this example, we see that the difference in beliefs between each estimate given \(M=1\) and that estimate given \(M=0\) remains about the same in our posteriors as it was in our priors—especially for positive effects.
10.2 Training a Model of Institutions and Growth
We now return to our model of institutions and growth from Chapter 8. Rather than presupposing the probability of different causal types, however, we seek to build up those beliefs from data from a large set of cases, using the trained model to then answer a set of both population and caselevel queries.
The structural causal model that we use (shown in Figure 10.5) is the same model that we used in Chapter 8. However, we build in weaker assumptions given that we aim to learn about our model from the data. Specifically, we drop two of the monotonicity assumptions: we no longer assume that growth (\(Y\)) is monotonic in institutions or in mortality. The only monotonicity assumption that we retain is with respect to the instrument, mortality (\(M\)): its effect on institutions (\(R\)) cannot be positive. Otherwise, we form flat priors over all nodal types in the model — building in no assumptions other than the causal structure and monotonicity of \(M\)’s effects. Moreover, as in Chapter 8, we allow for confounding between institutions and growth, allowing for other unobserved common causes of these variables.
10.2.1 Data
We draw our data from the supplementary material for Rodrik, Subramanian, and Trebbi (2004)’s paper on the longrun economic effects of institutions. We dichotomize all variables at their sample median, and so are working with somewhat coarser data than used in the original paper. Table 10.3 provides a snippet of the dataset.
Country  Distance (D)  Mortality (M)  Institutions (R)  Growth (Y) 

Angola  0  1  0  0 
Argentina  1  0  1  1 
Australia  1  0  1  1 
Burundi  0  1  0  0 
Benin  0  1  0  0 
Burkina Faso  0  1  0  0 
Unlike in the inequality application, the data here form a rectangular dataset: Rodrik, Subramanian, and Trebbi (2004) collected measures for all variables for all cases, rather than gathering more detailed evidence only on a subset of cases (as Haggard and Kaufman (2012) did in processtracing only the democratizing cases).
The raw correlations between variables is shown in Table 10.4. These basic correlations are in general much stronger (despite the coarsening) than in the data used in inequality and democracy application. One thing you might notice that \(M\) is, in fact, more strongly correlated with \(Y\) than \(R\) is, which might give pause about the exclusion restriction (which assumes \(M\)’s effect on \(Y\) runs only through \(R\)). Also \(M\) and \(D\) are quite strongly correlated. We will return to this issue when we consider model evaluation in Chapter 16.
D  M  R  Y  

D  1.000  0.373  0.240  0.291 
M  0.373  1.000  0.369  0.572 
R  0.240  0.369  1.000  0.494 
Y  0.291  0.572  0.494  1.000 
10.2.2 Queries
With the data in hand, we now update our model to get posteriors on the distribution of model parameters, from which we can generate beliefs over all causal relations in the model and answer any causal query about the model.^{76}
Before looking at more specific case and populationlevel queries, we first ask whether our beliefs on the effect of institutions on growth (possibly conditional on mortality and distance from the equator) have shifted. The results in Table 10.5 suggest that, based on data alone we think that \(R\) has a positive effect on \(Y\), raising the probability of good development outcomes by around 15 percentage points. Knowledge of \(M\) does not affect these beliefs (since the ATE query does not condition on \(R\) and so \(M\) is separated from \(Y\) in the model). Knowledge of \(D\) does however as we are now more confident that \(R\) matters in cases that are distant from the equator. In this sense \(D\) and \(R\) are complements—a feature that can be seen immediately from regression analysis also.
Using  Given  mean  sd  conf.low  conf.high 

priors 

0.00  0.10  0.19  0.19 
posteriors 

0.15  0.07  0.01  0.30 
posteriors  M==0  0.15  0.07  0.01  0.30 
posteriors  M==1  0.15  0.07  0.01  0.30 
posteriors  D==0  0.13  0.10  0.06  0.32 
posteriors  D==1  0.18  0.10  0.00  0.36 
10.2.2.1 Case level queries
We now turn to case level inference and look at the four different possible combinations of growth and institutional quality, in each case asking whether the a cause plausibly explains the outcome and how beliefs about effects would change when we learn about settler mortality (\(M\)) or distance from the equator (\(D\)) for different cases. As we did before, we imagine each of the 16 data types and report beliefs condition on the clues that we might incorporate.
The figure includes the prior beliefs we would have from our model before updating, where now—unlike in chapter 8—the base model does not impose monotonicity restrictions, other than that between \(M\) and \(R\). These are uniformally uninformative but we include them to make clearer how for this application the variation in inferences derives from data rather than from theoretical assumptions.
We see now that we are more confident than before that the good outcomes were due to institutions, moreover both \(M\) and \(D\) are informative. First, when we see that a state is distant from the equator we become less confident institutions did the work in this case. This, is in line with what we had before but may seem surprising given our beliefs that \(R\) and \(D\) are complements. \(R\) and \(D\) may well be complements for the average treatment effect, but, conditional on \(R=1\) (and \(Y=1\)) knowing that \(D=1\) reduces confidence that \(R\) did the work.
Second, unlike in the base process tracing model, high settler mortality is informative (even though it is not for average treatment effects). Low mortality is now believed to be more likely to induce institutions in places with lower mortality. This is in line with the confounding logic we discussed in Chapter 8 but here we have not imposed beliefs about confounding, rather we have updated about confounding from the data. Figure 10.7 illustrates how our posteriors on nodal types on \(R\) and \(Y\) are now correlated.
10.2.2.2 Population level queries
Again the updated model can be used not just to inform inferences about cases but also to make population l=claims. In Figure 10.8, we graph the posteriors for a set of queries, conditional on observed data on institutional quality, distance, and settler mortality.
In all cases the posterior distribution has quite wide variance.
10.2.3 Explorations: Direct and indirect paths from \(M\) to \(Y\)
Our glance at the raw data suggested that the correlation between mortality and growth is high, even relative to the correlation between institutions and growth. This might lead us to wonder whether our model is correct—in particular, whether we should allow for a direct path from \(M\) to \(Y\). In this subsection, we make and update a model in which we allow for a direct arrow from \(M\) to \(Y\), as well as the mediated path that runs from mortality to institutions to growth. We can then pose queries about how settler mortality affects longrun growth, asking how much of the effect runs through institutions and how much of this effect runs through all other channels (i.e., “directly”).
To maintain simplicity here, we exclude \(D\) from the new model and work with a DAG of the form: \[M \rightarrow R \rightarrow Y \leftarrow M; Y \leftrightarrow R\]
So we now have both a direct path from \(M\) to \(Y\) and the mediated path from \(M\) to \(Y\) that runs through \(R\). We maintain the possibility of unobserved confounding between \(R\) and \(Y\). Note that dropping \(D\) represents a permissible reduction of the original model since \(D\) was a parent to only one node in that model.
Effect  Query  mean  sd  conf.low  conf.high 

Total  Y[M = 0]  Y[M=1]  0.272  0.079  0.117  0.425 
Direct 0  Y[M = 0, R = R[M=0]]  Y[M=1, R = R[M=0]]  0.195  0.088  0.023  0.367 
Direct 1  Y[M = 0, R = R[M=1]]  Y[M=1, R = R[M=1]]  0.211  0.087  0.041  0.380 
Indirect 0  Y[M = 0, R = R[M=0]]  Y[M=0, R = R[M=1]]  0.061  0.070  0.052  0.223 
Indirect 1  Y[M = 1, R = R[M=0]]  Y[M=1, R = R[M=1]]  0.077  0.072  0.030  0.246 
In our pathway analysis, we will distinguish between “indirect” and “direct” effects of settler mortality on growth. We define these quantities more formally below, but first we give a basic intuition for the difference. By an “indirect” effect, we mean an effect that runs along the \(M \rightarrow R \rightarrow Y\) pathway: an effect that, for its operation, depends both on mortality’s effect on institutions and on institutions’ effect on growth. By a “direct” effect, we mean an effect that operates via the direct \(M \rightarrow Y\) pathway. Importantly, labeling this effect “direct” does not imply that there are no mediating steps in this causal pathway. It means only that we have not included any of this pathway’s mediating steps in the DAG. Thus, the “direct” effect does not represent a specific alternative mechanism to the institutional one. Rather, it captures a residual: the effect of settler mortality on longrun growth that operates through all mechanisms other than the one mediated by institutions.
In Table 10.6, we report results for a pathway analysis at the population level. First, we report our posterior belief about the total average effect of settler mortality on longrun growth, with a posterior mean of 0.272. Then we report the portion of these effects that run through each pathway.
First, we pose two versions of the directeffects query, intended to get at the effect of settler mortality that does not run through mortality’s effect on institutions. To frame a directeffects query, we need to imagine a manipulation in which the mortality level is changed but institutions remain fixed. There are two versions of such a query, however. In the first version, labeled “Direct 0”, we report the expected change in longrun growth under an imagined manipulation in which we change mortality from \(0\) to \(1\) while fixing institutions at the value they would take on if settler mortality were set to \(0\). In the second version (“Direct 1”), we imagine the same change in mortality but fix institutions at the value they would take on if settler mortality were set to \(1\). The difference between these queries is potentially important since mortality’s direct effect might depend on institutional conditions. As we can see, we get quite similar posterior means from these two directeffect queries (\(0.195\) vs. \(0.211\)).
We turn then to estimating the effect operating through institutions. This indirecteffects query asks the following: what change in growth occurs if we change institutions as they would change if there were a change in settler mortality but with settler mortality in fact held constant (so that no direct effect can be operating). Again, there are two versions of this query: the first (“Indirect 0”) holds mortality fixed at \(0\) while the second (“Indirect 1”) holds mortality fixed at \(1\). For both, we posit the change in institutions that would happen if mortality were changed from \(0\) to \(1\). As we can see from the fourth and fifth rows of Table 10.6, we get similar estimates of this indirect effect from the two queries (\(0.061\) and \(0.077\)).
Overall, Table 10.6 suggests that both causal pathways are in oprtation. Yet direct effects appear far stronger than indirect effects. That is to say, we estimate that more of settler mortality’s effect on longrun growth runs through channels other than the institutional mechanism than runs through that mechanism. The strongest effect is estimated to be the direct effect with institutions fixed at whatever they would take on if mortality were high. We estimate the weakest pathway to be the indirect effect in places with low mortality. Note that the first query, the total effect, is equal to the sum of “Direct 0” and “Indirect 1” and (equivalently) to the sum of “Direct 1” and “Indirect 0”; this decomposition is documented, for instance, in Imai, Keele, and Tingley (2010).
With our updated model of the population in hand, we can now ask similar questions at the case level. Suppose, for instance, that we see a case that had high settler mortality and low growth; we also observe a suspected mediator of mortality’s effect, seeing that the case has weak institutions. One question we can ask about this case is the total caselevel effect: what is the probability that high settler mortality caused low growth, through any mechanism, in this case given our observations in this case? We can then delve further to ask about the pathway oeprating in the case: about the probability that settler mortality caused low growth through institutions or through an alternative pathway.
The results of these caselevel pathway queries — drawn from a model informed by the large\(N\) data — are reported in Table ??. In the top row, we see that the probability that high mortality was a cause of low growth in the case is estimated to be 0.648. We estimate the probability that high settler mortality caused the low growth through a noninstitutions pathway to be somewhat lower, at 0542. And the probability that high settler mortality caused low growth specifically via the institutional pathway is much lower, at 0.252.
This result is quite striking: even when institutions take precisely the form we expect them to take if the institutional mechanism is operating (i.e., they are weak in a highmortality, lowgrowth case), our trained model tells us that we should still believe it to be about twice as likely that high mortality mattered through a noninstitutional mechanism than that it mattered via institutions. The results in Table 10.7 also have implications for the effects of alternative hypothetical manipulations. They suggest that changing mortality in this kind of case from high to low—while keeping institutions weak—would be more likely to improve outcomes than would keeping mortality high but changing institutions to whatever value they would take on if mortality were low.
Overall, these results suggest that any analysis of the longrun effects of settler mortality on economic growth that constrains such effects to run through institutions will likely get the story wrong. Notably, these findings also pose a challenge to the instrumentalvariable strategy underlying Rodrik, Subramanian, and Trebbi (2004) and Acemoglu, Johnson, and Robinson (2001) analyses, which (via the exclusion restriction) involves the assumption that settler mortality affects growth only via institutions.
Query  Formula  Given  Estimate 

ATE  Y[M = 0] > Y[M = 1]  M==1 & Y ==0 & R == 0  0.648 
Direct  Y[M = 0, R = R[M=1]] > Y[M = 1, R = R[M = 1]]  M==1 & Y ==0 & R == 0  0.542 
Indirect  Y[M = 1, R = R[M=0]] > Y[M = 1, R = R[M = 1]]  M==1 & Y ==0 & R == 0  0.256 
References
With
CausalQueries
this is done usingupdate_model(model, data)
.↩︎