Chapter 8 Process Tracing Applications

We apply the causal-model-based approach to process tracing to two major substantive issues in comparative politics: the relationship between inequality and democratization and the relationship between institutions and economic growth. Drawing on case level data, we use qualitative restrictions on causal types together with flat priors to draw inferences about a range of causal queries. The applications are both very simple but they are sufficiently complex to illustrate key features of process tracing with causal models: the different types of learning that can be gleaned from evidence on moderators and mediators, the dependence of inference from some clues on the values of other clues, and the scope for learning from more distal historical data when researchers have beliefs over confounding processes.

In this chapter, we illustrate how causal-model-based process-tracing works using two substantive applications that have been of central interest to students of comparative politics for decades: the causes of democratization and the determinants of economic growth. In both cases we develop simple models to demonstrate the logic of process-tracing with causal models. In chapter 10 we push the analysis further, illustrating the integration of process tracing with cross-case correlational analysis. The key difference is that, in this chapter, we assume—consistent with the process-tracing approach outlined in Chapter 7—that the researcher comes to a case with a theoretical model in hand, including a set of beliefs about the shares of nodal types in the population. In Chapter 10, we will return to the same case-level queries but use models that have been directly informed by data from that broader population of cases.

8.1 Inequality and Democratization

8.1.1 The Debate

Sociologists, economists, and political scientists have long theorized and empirically examined the relationship between inequality and democracy (e.g., Dahl (1973), Bollen and Jackman (1985), Acemoglu and Robinson (2005), Boix (2003), Ansell and Samuels (2014)). In recent years, the work of Boix (2003), Acemoglu and Robinson (2005), and Ansell and Samuels (2014) represent major theoretical advances in specifying when and how inequality might generate transitions to democracy (as well as its persistence, which we bracket here). The first and third of these books also provide large-n cross-national and historical tests of their theories’ key correlational predictions. Haggard and Kaufman (2012), moreover, derive causal process observations from a large number of “Third Wave” cases of democratization in order to examine these theories’ claims about the centrality of distributional issues to regime change. We provide a very condensed summary of the core logic of Boix (2003) and Acemoglu and Robinson (2005) before seeking to translate that logic into a causal model for the purposes of process tracing, using a transformed version of Haggard and Kaufman’s causal-process data.

We briefly summarize the core logics of and differences among these three sets of arguments here, bracketing many of their moving parts to focus on the basic theorized relationship between inequality and democracy. Both Boix’s and Acemoglu and Robinson’s theories operate within a Meltzer-Richard (Meltzer and Richard (1981)) framework in which, in a democracy, the median voter sets the level of taxation-and-transfer and, since mean income is higher than median income, benefit from and vote for a positive tax rate, implying redistribution from rich to poor. The poorer the median voter, the more redistribution she will prefer. Democracy, with its poorer median voter, thus implies greater redistribution than (rightwing) authoritarianism—a better material position from the poor at the expense of the rich elite. Thus, in each of these approaches, struggles over political regimes are conflicts over the distribution of material resources.

In Boix’s model, the poor generally prefer democracy for its material benefits. When they mobilize to demand regime change, the rich face a choice as to whether to repress or concede, and they are more likely to repress as inequality is higher since, all else equal, they have more to lose from democracy. Thus, with the poor always preferring democracy over rightwing authoritarianism, inequality reduces the prospects for democratization.

In Acemoglu and Robinson’s model, inequality simultaneously affects the expected net gains to democracy for both rich and poor. At low levels of inequality, democracy is relatively unthreatening to the elite, as in Boix, but likewise of little benefit to the poor. Since regime change is costly, the poor do not mobilize for democracy when inequality is low, and democratization does not occur. At high levels of inequality, democracy is of great benefit to the poor but has high expected costs for the elite; thus, democratization does not occur because the elite repress popular demands for regime change. In Acemoglu and Robinson’s model, democracy emerges only when inequality is at middling levels: high enough for the poor to demand it and low enough for the rich to be willing to concede it.

Ansell and Samuels, finally, extend the distributive politics of regime change in two key ways. First, they allow for a two-sector economy, with a governing elite comprising the landed aristocracy and an urban industrial elite excluded from political power under authoritarian institutions. Total inequality in the economy is a function of inequality in the landed sector, inequality in the industrial sector, and the relative size of each. Second, authoritarian (landed) elites can tax the industrial bourgeoisie, thus giving the industrial elite an incentive to seek constraints on autocratic rule. Third, in Ansell and Samuels’ model, rising industrial inequality means a rising industrial elite, generating a larger gap between them and industrial workers, though the industrial masses are richer than the peasantry. A number of results follow, of which we highlight just a couple. Rising land inequality reduces the likelihood of bourgeois rebellion by giving the landed elite greater repressive capacities and increasing their expected losses under democracy. As industrial inequality rises, however, the industrial elite have more to lose to confiscatory taxation and thus greater incentive to push for partial democracy (in which they have the ability to constrain the government, though the poor remain politically excluded) as well as greater resources with which to mobilize and achieve it. Full democracy, brought on by joint mass and bourgeois rebellion, is most likely as the industrial sector grows in relative size, giving the urban masses more to lose to autocratic expropriation and more resources with which to mobilize and rebel.

These three theoretical frameworks thus posit rather differing relationships between inequality and democracy. Taking these theoretical logics as forms of background knowledge, we would consider it possible that inequality reduces the likelihood of democracy or that it increases the likelihood of democracy. Yet one feature that all three theories have in common is a claim that distributional grievances drive demands for regime change. Moreover, in both Boix and Acemoglu and Robinson, less economically advantaged groups are, all else equal, more likely to demand democracy the worse their relative economic position. Ansell and Samuels’ model, on the other hand, suggests that relative deprivation may cut both ways: while poorer groups may have more to gain from redistribution under democracy, better-off groups have more to fear from confiscatory taxation under autocracy. In all three frameworks, mobilization by groups with material grievances is critical to transitions to democracy: elites do not voluntarily cede power.

In their qualitative analysis of “Third Wave” democratizations, Haggard and Kaufman point to additional factors, aside from inequality, that may generate transitions. Drawing on previous work on 20th century democratic transitions (e.g., Huntington (1993), Linz and Stepan (1996)), they pay particular attention to international pressures to democratize and to elite defections.

8.1.2 A Structural Causal Model

We now need to express this background knowledge in the form of a structural causal model. Suppose that we are interested in the case-level causal effect of inequality on democratization of a previously autocratic political system. Suppose further, to simplify the illustration, that we conceptualize both variables in binary terms: inequality is either high or low, and democratization either occurs or does not occur. This means that we want to know, for a given case of interest, whether high inequality (as opposed to low inequality) causes democracy to emerge, prevents democracy from emerging, or has no effect (i.e., with democratization either occurring or not occurring independent of inequality). We can represent this query in the simple, high-level causal model shown in Figure 8.1. Here, the question, “What is the causal effect of high inequality on democratization in this case?” is equivalent to asking what the value of \(\theta^D\) is in the case, where the possible values are \(\theta_{00}^D, \theta_{01}^D, \theta_{10}^D\), and \(\theta_{11}^D\). We assume here that the case’s nodal type, \(\theta^D\), is not itself observable, and thus we are in the position of having to make inferences about it.

Drawing on the grammar of causal graphs discussed in Chapter 2, we can already identify possibilities for learning about \(\theta^D\) from the other nodes represented in this high-level graph. Merely observing the level of inequality in a case will tell us nothing since \(I\) is not \(d-\)connected to \(\theta^D\) if we have observed nothing else. On the other hand, only observing the outcome—regime type—in a case can give us information about \(\theta^D\) since \(D\) is \(d-\)connected to \(\theta^D\). For instance, if we observe \(D=1\) (that a case democratized), then we can immediately rule out \(\theta_{00}^D\) as a value of \(\theta^D\) since this type does not permit democratization to occur. Further, conditional on observing \(D\), \(I\) is now \(d-\)connected to \(\theta^D\): in other words, having observed the outcome, we can additionally learn about the case’s type from observing the status of the causal variable. For example, if \(D=1\), then observing \(I=1\) allows us additionally to rule out the value \(\theta_{10}^D\) (a negative causal effect).

Now, observing just \(I\) and \(D\) alone will always leave two nodal types in contention. For instance, seeing \(I=D=1\) (the case had high inequality and democratized) would leave us unsure whether high inequality caused the democratization in this case (\(\theta^D=\theta_{01}^D\)) or the democratization would have happened anyway (\(\theta^D=\theta_{11}^D\)). This is a limitation of \(X, Y\) data that we refer to in Humphreys and Jacobs (2015) as the “fundamental problem of type ambiguity.” Note that this does not mean that we will be left indifferent between the two remaining types. Learning from \(X, Y\) data alone—narrowing the types down to two—can be quite significant, depending on our priors over the distribution of types. For example, if we previously believed that a \(\theta_{00}^D\) type (cases in which democracy will never occur, regardless of inequality) was much more likely than a \(\theta_{11}^D\) type (democracy will always occur, regardless of inequality) and that positive and negative effects of inequality were about equally likely, then ruling out the \(\theta_{00}^D\) and \(\theta_{10}^D\) values for a case will shift us toward the belief that inequality caused democratization in the case. This is because we are ruling out both a negative effect and the type of null effect that we had considered the most likely, leaving a null effect that we consider relatively unlikely.

Simple democracy, inequality model

Figure 8.1: Simple democracy, inequality model

Nonetheless, we can increase the prospects for learning by theorizing the relationship between inequality and democratization. Given causal logics and empirical findings in the existing literature, we can say more than is contained in Figure 8.1 about the possible structure of the causal linkages between inequality and democratization. And we can embed this prior knowledge of the possible causal relations in this domain in a lower-level model that is consistent with the high-level model that most simply represents our query.

If we were to seek to fully capture them, the models developed by Boix, Acemoglu and Robinson, and Ansell and Samuels would, each individually, suggest causal graphs with a large number of nodes and edges connecting them. Representing all variables and relationships jointly contained in these three models would take an extremely complex graph. Yet there is no need to go down to the lowest possible level—to generate the most detailed graph—in order to increase our empirical leverage on the problem.

We represent in Figure 8.2 one possible lower-level model consistent with our high-level model. Drawing on causal logics in the existing literature, we unpack the nodes in the high-level model in two ways:

  1. We interpose a mediator between inequality and democratization: mobilization (\(M\)) by economically disadvantaged groups expressing material grievances. \(M\) is a function of both \(I\) and its nodal type, \(\theta^M\), which defines its response to \(I\). In inserting this mediator, we have extracted \(\theta^M\) from \(\theta^D\), pulling out that part of \(D\)’s response to \(I\) that depends on \(M\)’s response to \(I\).

  2. We specify a second influence on democratization, international pressure (\(P\)). Like \(\theta^M\), \(P\) has also been extracted from \(\theta^D\); it represents that part of \(D\)’s response to \(I\) that is conditioned by international pressures.

A lower-level model of democratization in which inequality may affect regime type both directly and through mobilization of the lower classes, and international pressure may also affect regime type.

Figure 8.2: A lower-level model of democratization in which inequality may affect regime type both directly and through mobilization of the lower classes, and international pressure may also affect regime type.

In representing the causal dependencies in this graph, we allow for inequality to have (in the language of mediation analysis) both an “indirect” effect on democratization via mobilization and a “direct” effect. The arrow running directly from \(I\) to \(D\) allows for effects of inequality on democratization beyond any effects running via mobilization of the poor, including effects that might run in the opposite direction. (For instance, it is possible that inequality has a positive effect on democratization via mobilization but a negative effect via any number of processes that are not explicitly specified in the model.) The graph also implies that there is no confounding: since there is no arrow running from another variable in the graph to \(I\), \(I\) is modeled as exogenous.

The lower-level graph thus has two exogenous, \(\theta\) nodes that will be relevant to assessing causal effects: \(\theta^M\) and \(\theta^{D_{lower}}\). \(\theta^M\), capturing \(I\)’s effect on \(M\), ranges across the usual four values for a single-cause, binary setup: \(\theta_{00}^M, \theta_{01}^M, \theta_{10}^M\), and \(\theta_{11}^M\).

\(\theta^{D_{lower}}\) is considerably more complicated, however, because this node represents \(D\)’s response to three causal variables: \(I\), \(M\), and \(P\). One way to put this is that the values of \(\theta^{D_{lower}}\) indicate how inequality’s direct effect will depend on mobilization (and vice-versa), conditional on whether or not there is international pressure. We need more complex notation than that introduced in Chapter 5 in order to represent the possible nodal types here.

The result is \(2^8=256\) possible nodal types for \(D\). With 4 nodal types for \(M\), we thus have 1024 possible combinations of causal effects between named variables in the lower-level graph. How do these lower-level nodal types map onto the higher-level nodal types that are of interest? In other words, which combinations of lower-level types represent a positive, negative, or zero causal effect of inequality on democratization? When working with the CausalQueries package, the software figures this out for us automatically once we define our model and our query, but we work through the logic “by hand” here to help convey the intuition.

To define a causal effect of \(I\) in this setup, we need to define the “joint effect” of two variables as being the effect of changing both variables simultaneously (in the same direction, unless otherwise specified). Thus, the joint effect of \(I\) and \(M\) on \(D\) is positive if changing both \(I\) and \(M\) from \(0\) to \(1\) changes \(D\) from \(0\) to \(1\). We can likewise refer to the joint effect of an increase in one variable and a decrease in another. Given this definition, a positive causal effect of inequality on democratization emerges for any of the following three sets of lower-level response patterns:

  1. Linked positive mediated effects. \(I\) has a positive effect on \(M\); and \(I\) and \(M\) have a joint positive effect on \(D\); when \(P\) takes on whatever value it takes on in the case.

  2. Linked negative mediated effects \(I\) has a negative effect on \(M\), and \(I\) and \(M\) have a joint negative effect on \(D\), when \(P\) takes on whatever value it takes on in the case.

  3. Positive direct effect \(I\) has no effect on \(M\), and \(I\) has a positive effect on \(D\), when we fix \(M\)’s value (at 0 or at 1), and at whatever value \(P\) takes on in the case.

If we start out with a case in which inequality is high and democratization has not occurred (or inequality is low and democratization has occurred), we will be interested in the possibility of a negative causal effect. A negative causal effect of inequality on democratization emerges for any of the following three sets of lower-level response patterns:

  1. Positive, then negative mediated effects \(I\) has a positive effect on \(M\), and \(I\) and \(M\) have a joint negative effect on \(D\), when \(P\) takes on whatever value it takes on in the case.

  2. Negative, then joint negative mediated effects \(I\) has a negative effect on \(M\), and jointly increasing \(I\) while decreasing \(M\) generates a decrease in \(D\), when \(P\) takes on whatever value it takes on in the case.

  3. Negative direct effects \(I\) has no effect on \(M\), and \(I\) has a negative effect on \(D\), when we fix \(M\)’s value (at 0 or at 1), and at whatever value \(P\) takes on in the case.

Finally, all other response patterns yield no effect of inequality on democratization.

Thus, for a case in which \(I=D=1\), our query amounts to assessing the probability that \(\theta^M\) and \(\theta^D_{lower}\) jointly take on values falling into conditions 1, 2, or 3. And for a case in which \(I \neq D\), where we entertain the possibility of a negative effect, our query is an assessment of the probability of conditions 4, 5, or 6 arising. Forming Priors

We now need to express prior beliefs about the probability distribution from which values of \(\theta^M\) and \(\theta^D_{lower}\) are drawn. We place structure on this problem by drawing a set of beliefs about the likelihood or monotonicity of effects and interactions among variables from the theories in Boix, Acemoglu and Robinson, and Ansell and Samuels. As a heuristic device, we weight more heavily those propositions that are more widely shared across the three works than those that are consistent with only one of the frameworks. We intend this part of the exercise to be merely illustrative of how one might go about forming priors from an existing base of knowledge; there are undoubtedly other ways in which one could do so from the inequality and democracy literature.

Specifically, the belief that we embed in our priors about \(\theta^M\) is:

  • Monotonicity of \(I\)’s effect on \(M\): In Acemoglu and Robinson, inequality should generally increase the chances of—and, in Boix, should never prevent—mobilization by the poor. Only in Ansell and Samuels’ model does inequality have a partial downward effect on the poor’s demand for democracy insofar as improved material welfare for the poor increases the chances of autocratic expropriation; and this effect is countervailed by the greater redistributive gains that the poor will enjoy under democracy as inequality rises.70 Consistent with the weight of prior theory on this effect, in our initial run of the analysis, we rule out negative effects of \(I\) on \(M\). We are indifferent in our priors between positive and null effects and between the two types of null effects (mobilization always occurring or never occurring, regardless of the level of inequality). We thus set our prior on \(\theta^M\) as: \(p(\theta^M=\theta^M_{10})=0.0\), \(p(\theta^M=\theta^M_{00})=0.25\), \(p(\theta^M=\theta^M_{11})=0.25\), and \(p(\theta^M=\theta^M_{01})=0.5\). We relax this monotonicity assumption, to account for the Ansell and Samuels logic, in a second run of the analysis.

For our prior on democracy’s responses to inequality, mobilization, and international pressure (\(\theta^D_{lower}\)), we extract the following beliefs from the literature:

  • Monotonicity of direct \(I\) effect: no positive effect: In none of the three theories does inequality promote democratization via a pathway other than via the poor’s rising demand for it. In all three theories, inequality has a distinct negative effect on democratization via an increase in the elite’s expected losses under democracy and thus its willingness to repress. In Ansell and Samuels, the distribution of resources also affects the probability of success of rebellion; thus higher inequality also reduces the prospects for democratization by strengthening the elite’s hold on power. We thus set a zero prior probability on all types in which \(I\)’s direct effect on \(D\) is positive for any value of \(P\).

  • Monotonicity of \(M\)’s effect: no negative effect: In none of the three theories does mobilization reduce the prospects of democratization. We thus set a zero probability on all types in which \(M\)’s effect on \(D\) is negative at any value of \(I\) or \(P\).

  • Monotonicity of \(P\)’s effect: no negative effect: While international pressures are only discussed in Haggard and Kaufman’s study, none of the studies considers the possibility that international pressures to democratize might prevent democratization that would otherwise have occurred. We thus set a zero probability on all types in which \(P\)’s effect is negative at any value of \(I\) or \(M\).

In all, this reduces the number of nodal types for \(D\) from 256 to just 20.

For all remaining, allowable types, we set flat priors.

In remaining 20 allowable types can involve a rich range of interactions between international pressure, inequality, and mobilization, including::

  1. Types for which \(P\) has no moderating effect

  2. Types for which \(P=1\) creates an “opportunity” for \(X\) to have an effect that it does not have at \(P=0\); at \(P=1\) and \(X=0\), \(D\) takes on the value it does when \(X=0\) and \(X\) has an effect, but does not take on this value when \(P=0\) and \(X=0\)

  3. Types for which \(P=1\) is a causal “complement” to \(X\), allowing \(X\) to have an effect it did not have at \(P=0\); at \(P=1\) and \(X=1\), \(D\) takes on the value it does when \(X=1\) and \(X\) has an effect, but does not take on this value when \(P=0\) and \(X=1\)

  4. Types for which \(P=1\) “substitutes” for \(X\), generating the outcome that \(X=1\) was necessary to generate at \(P=0\); at \(P=1\) and \(X=0\), \(D\) takes on the value it does when \(X=1\) and \(X\) has an effect, but does not take on this value when \(P=0\) and \(X=0\)

  5. Types for which \(P\) “eliminates” \(X\)’s effect, preventing \(X=1\) from generating the outcome it generates when \(P=0\); at \(P=1\) and \(X=1\), \(D\) does not take on the value it does when \(X=1\) and \(X\) has an effect, but does take on this value when \(P=0\) and \(X=1\)

Since \(P\) conditions the effect of \(I\), we must also establish a prior on the distribution of \(P\). In this analysis, we set the prior probability of \(P=1\) to 0.5, implying that before seeing the data we think that international pressures to democratize are present half the time.

8.1.3 Results

We can now choose nodes in addition to \(I\) and \(D\) to observe from the lower-level model. Recall that our query is about the joint values of \(\theta^M\) and \(\theta^{D_{lower}}\). By the logic \(d-\)separation, we can immediately see that both \(M\) and \(P\) may be informative about these nodes when \(D\) has already been observed. Conditional on \(D\), both \(M\) and \(P\) are \(d-\)connected to both \(\theta^M\) and \(\theta^{D_{lower}}\). Let us see what we learn, then, if we search for either mobilization of the lower classes or international pressure or both, and find either clue either present or absent.

We consider four distinct situations, corresponding to four possible combinations of inequality and democratization values that we might be starting with. In each situation, the nature of the query changes. Where we start with a case with low inequality and no democratization, asking if inequality caused the outcome is to ask if the lack of inequality caused the lack of democratization. Where we have high inequality and no democratization, we want to know if democratization was prevented by high inequality (as high inequality does in Boix’s account). For cases in which democratization occurred, we want to know whether the lack or presence of inequality (whichever was the case) generated the democratization.

Inference is done by applying Bayes rule to the observed data given the priors. Different “causal types” are consistent or inconsistent with possible data observations. Conversely the observation of data lets us shift weight towards causal types that are consistent with the data and away from those that are not. As a simple illustration if we observe \(D=1\) then we would shift weight from types for which \(D\) is always 0, given the observed data, to types for which \(D\) can be 1 given the observed data.

Case level inferences given possible observations of mobilization and pressure (untrained model).

Figure 8.3: Case level inferences given possible observations of mobilization and pressure (untrained model). Inferences for cases with observed democratization

We first turn to cases in which democratization has occurred—the category of cases that Haggard and Kaufman examine.

For these cases we use data from Haggard and Kaufman (2012) to show the inferences we would draw using this procedure and the actual observations made for a set of 8 cases.

Haggard and Kaufman consider only cases that democratized, so all cases in this table have the value \(D=1\). We show here how confident we would be that the level inequality caused democratization if (a) we observed only the cause and effect (\(I\) and \(D\)); (b) we additionally observed either the level of mobilization by disadvantaged classes or the level of international pressure; and (c) if we observed both, in addition to \(I\) and \(D\). Note that countries labels are marked in the “full data” cells in the lower right quadrant, but their corresponding partial data cells can be read by moving to the left column or the top row (or to the top left cell for the case with no clue data).

In coding countries’ level of inequality, we rely on Haggard and Kaufman’s codings using the Gini coefficient from the Texas Inequality dataset. In selecting cases of democratization, we use the codings in Cheibub, Gandhi, and Vreeland (2010), one of two measures used by Haggard and Kaufman. Our codings of the \(M\) and \(P\) clues come from close readings of the country-specific transition accounts in Haggard, Kaufman, and Teo (2012), the publicly shared qualitative dataset associated with Haggard and Kaufman (2012). We code \(M\) as \(1\) where the transition account refers to anti-government or anti-regime political mobilization by economically disadvantaged groups, and as \(0\) otherwise. For \(P\), we code \(P=0\) is international pressures to democratize are not mentioned in the transition account. The main estimates refer to analyses with only qualitative, monotonicity restrictions on our priors. We also show in square brackets the estimates if we allow for a negative effect of inequality on mobilization but believe it to be relatively unlikely. \(I=0, D=1\): Low inequality democracies

In a case that had low inequality and democratized, did low inequality cause democratization, as Boix’s thesis would suggest? Looking at the first set of cases in Table ??, did Mexico, Albania, Taiwan, and Nicaragua democratize because they had relatively low inequality? Based only on observing the level of inequality and the outcome of democratization, we would place a 0.438 probability on inequality having been a cause. What can we learn, then, from our two clues?

We are looking here for a negative effect of \(I\) on \(D\), which in our model can only run via a direct effect, not through mobilization. Thus, the learning from \(M\) is limited for the same reason as in an \(I=1, D=0\) case. And \(M\) is modestly informative as a moderator for the same reasons and in the same direction, with observing mobilization generally reducing our confidence in inequality’s negative effect relative to observing no mobilization. In our four cases, if we observe the level of mobilization, our confidence that inequality mattered goes up slightly (to 0.475) in Mexico and Taiwan, where mobilization did not occur, and goes down slightly in Albania and Nicaragua (to 0.394) where mobilization did occur.

Looking for the international pressure clue is, however, highly informative, though the effect runs in the opposite direction as in an \(I=1, D=0\) case. It is observing the absence of international pressure that makes us more confident in low inequality’s effect. Since democratization did occur, the presence of international pressure makes it less likely for low inequality to have generated the outcome since international pressure could have generated democratization by itself. Once we bring this second clue into the analysis, Mexico and Taiwan sharply part ways: seeing no international pressure in Mexico, we are now much more confident that inequality mattered for the Mexican transition (0.667); seeing international pressure in Taiwan, we are now substantially less confident that inequality mattered to the Taiwanese transition (0.393). Similarly, observing \(P\) sharply differentiates the Albanian and Nicaraguan cases: seeing no international pressure in the Albanian transition considerably boosts our confidence in inequality’s causal role there ((0.571)), while observing international pressure in the Nicaraguan transition strongly undermines our belief in an inequality effect there (0.263). \(I=1, D=1\): High inequality democracies

Where we see both high inequality and democratization, the question is whether high inequality caused democratization via a positive effect. Considering the second set of cases in Table ??, did high inequality cause Mongolia, Sierra Leone, Paraguay, and Malawi to democratize?

Observing only the level of inequality and the democratization outcome, we would have fairly low confidence that inequality mattered, with a belief of (0.128). Let us see what we can learn if we also observe the level of mobilization and international pressure.

As in an \(I=0, D=0\) case, \(M\) can now be highly informative since this positive effect has to run through mobilization. Here it is the observation of a lack of mobilization that is most telling: high inequality cannot have caused democratization, given our model, if inequality did not cause mobilization to occur. There is no point in looking for international pressure since doing so will have no effect on our beliefs. Thus, when we observe no mobilization by the lower classes in Mongolia and Paraguay, we can be certain (given our model) that high inequality did not cause democratization in these cases. Moreover, this result does not change if we also go and look for international pressure: neither seeing pressure nor seeing its absence shifts our posterior away from I1D1['M only'][[1]][1].

If we do see mobilization, on the other hand—as in Sierra Leone and Malawi—we are slightly more confident that high inequality was the cause of democratization (I1D1['M only'][[1]][3]). Moreover, if we first see \(M=1\), then observing international pressure can add much more information; and it substantially differentiates our conclusions about the causes of Sierra Leone’s and Malawi’s transitions. Just as in an I=0, D=1 case, it is the absence of international pressure that leaves the most “space” for inequality to have generated the democratization outcome. When we see the absence of pressure in Sierra Leone, our confidence that high inequality was a cause of the transition increases to I1D1['M and P'][[1]][3]; seeing pressure present in Malawi reduces our confidence in inequality’s effect to I1D1['M and P'][[1]][4].

We next examine causal relations for cases that did not democratize. These cases are not included in Haggard and Kaufman (2012) (and so are not labelled in the figure) but our model nevertheless characterizes our beliefs for these cases also. \(I=0, D=0\): Non democracy with low inequality

To begin with \(I=0, D=0\) cases, did the lack of inequality cause the lack of democratization (as, for instance, at the lefthand end of the Acemoglu and Robinson inverted \(U\)-curve)?

We start out, based on the \(I\) and \(D\) values and our model, believing that there is a I0D0['No clues'][[1]][1] chance that low inequality prevented democratization. We then see that our beliefs shift most dramatically if we go looking for mobilization and find that it was present. The reason is that any positive effect of \(I\) on \(D\) has to run through the pathway mediated by \(M\) because we have excluded a positive direct effect of \(I\) on \(D\) in our priors. Moreover, since we do not allow \(I\) to have a negative effect on \(M\), observing \(M=1\) when \(I=0\) must mean that \(I\) has no effect on \(M\) on this case, and thus \(I\) cannot have a positive effect on \(D\) (regardless also of what we find if we look for \(P\)). If we do not observe mobilization when we look for it, we now think it is somewhat more likely that \(I=0\) caused \(D=0\) since it is still possible that high inequality could cause mobilization.

We also see that observing whether there is international pressure has a substantial effect on our beliefs. When we observe \(M=1\) (or don’t look for \(M\) at all), the presence of international pressure increases the likelihood that low inequality prevented democratization. Intuitively, this is because international pressure, on average across types, has a positive effect on democratization; so pressure’s presence creates a greater opportunity for low inequality to counteract international pressure’s effect and prevent democratization from occurring that otherwise would have (if there had been high inequality and the resulting mobilization). \(I=1, D=0\): Non democracy with high inequality

In cases with high inequality and no democratization, the question is whether high inequality prevented democratization via a negative effect, as theorized by Boix. That negative effect has to have operated via inequality’s direct effect on democratization since our monotonicity restrictions allow only positive effects via mobilization. Here, the consequence of observing \(P\) is similar to what we see in the \(I=0, D=0\) case: seeing international pressure greatly increases our confidence that high inequality prevented democratization, while seeing no international pressure moderately reduces that confidence. There is, returning to the same intuition, more opportunity for high inequality to exert a negative effect on democratization when international pressures are present, pushing toward democratization.

Here, however, looking for \(M\) has more modest effect than it does in an \(I=0, D=0\) case. This is because we learn less about the indirect pathway from \(I\) to \(D\) by observing \(M\): as we have said, we already know from seeing high inequality and no democratization (and under our monotonicity assumptions) that any effect could not have run through the presence or absence of mobilization.

However, \(M\) provides some information because it, like \(P\), acts as moderator for \(I\)’s direct effect on \(D\) (since \(M\) is also pointing into \(D\)). As we know, learning about moderators tells us something about (a) the rules governing a case’s response to its context (i.e., one or more nodal types in the case) and (b) the context it is in. Thus, in the first instance, observing \(M\) together with \(I\) and \(D\) helps us eliminate types inconsistent with these three data points. For instance, if we see \(M=0\), then we eliminate any type in which \(D\) is 0, regardless of \(P\)’s value, when \(M=0\) and \(I=1\). Second, we learn from observing \(M\) about the value of \(M\) under which \(D\) will be responding to \(I\). Now, because \(M\) is itself potentially affected by \(I\), the learning here is somewhat complicated. What we learn most directly from observing \(M\) is the effect of \(I\) on \(M\) in this case. If we observe \(M=1\), then we know that \(I\) has no effect on \(M\) in this case; whereas if we observe \(M=0\), \(I\) might or might not have a positive effect on \(M\). Learning about this \(I \rightarrow M\) effect then allows us to form a belief about how likely \(M\) would be to be 0 or 1 if \(I\) changed from \(0\) to \(1\); that is, it allows us to learn about the context under which \(D\) would be responding to this change in \(I\) (would mobilization be occurring or not)? This belief, in turn, allows us to form a belief about how \(D\) will respond to \(I\) given our posterior beliefs across the possible types that the case is.

The net effect, assuming that we have not observed \(P\), is a small upward effect in our confidence that inequality mattered if we see no mobilization, and a small downward effect if we see mobilization. Interestingly, if we do observe \(P\), the effect of observing \(M\) reverses: observing mobilization increases our confidence in inequality’s effect, while observing no mobilization reduces it.

8.1.4 Considerations: Theory dependence

Haggard and Kaufman set out to use causal process observations to test inequality-based theories of democratization against the experiences of “Third Wave” democratizations. Their principal test is to examine whether they see evidence of distributive conflict in the process of democratization, defined largely as the presence or absence of mobilization prior to the transition. They secondarily look for other possible causes, specifically international pressure and splits in the elite.

In interpreting the evidence, Haggard and Kaufman generally treat the absence of mobilization as evidence against inequality-based theories of democratization as a whole (p. 7). They also see the presence of distributive mobilization in cases with high inequality and democratization as evidence against the causal role of inequality (p. 7). These inferences, however, seem only loosely connected to the logic of the causal theories under examination. Haggard and Kaufman express concern that inequality-oriented arguments point to “cross-cutting effects” (p. 1) of inequality, but do not systematically work through the implications of these multiple pathways for empirical strategy. Our analysis suggests that a systematic engagement with the underlying models can shift that interpretation considerably. Under the model we have formulated, where inequality is high, the absence of mobilization in a country that democratized is indeed damning to the notion that inequality mattered. However, where inequality is low—precisely the situation in which Boix’s theory predicts that we will see democratization—things are more complicated. If we assume that inequality cannot prevent mobilization, then observing no mobilization does not work against the claim that inequality mattered for the transition; indeed, it slightly supports it, at least given what we think is a plausible model-representation of arguments in the literature. Observing the absence of inequality in such a case, however, can undercut an inequality-based explanation if (and only if) we believe it is possible that inequality might prevent mobilization that would otherwise have occurred. Further, in cases with high inequality and democratization, it is the absence of mobilization by the lower classes that would least consistent with the claim that inequality mattered. Observing mobilization, in contrast, pushes in favor of an inequality-based explanation.

Moreover, it is striking that Haggard and Kaufman lean principally on a mediator clue, turning to evidence of international pressure and elite splits (moderators, or alternative causes) largely as secondary clues to identify “ambiguous” cases. As we have shown, under a plausible model given prior theory, it is the moderator clue that is likely to be much more informative.

Of course, the model that we have written down is only one possible interpretation of existing theoretical knowledge. It is very possible that Haggard and Kaufman and other scholars in this domain hold beliefs that diverge from those encoded in our working model. The larger point, however, is that our process tracing inferences will inevitably depend—and could depend greatly—on our background knowledge of the domain under examination. Moreover, formalizing that knowledge as causal model can help ensure that we are taking that prior knowledge systematically into account—that the inferences we draw from new data are consistent with the knowledge that we bring to the table.

The analysis also has insights regarding case selection. Haggard and Kaufman justify their choice of only \(D=1\) cases as a strategy “designed to test a particular theory and thus rests on identification of the causal mechanism leading to regime change” (p. 4). Ultimately, however, the authors seem centrally concerned with assessing whether inequality, as opposed to something else, played a key causal role in generating the outcome. As the results above demonstrate, however, there is nothing special about the \(D=1\) cases in generating leverage on this question. The tables for \(D=0\) show that, given the model, the same clues can shift beliefs about as much for \(D=0\) as for \(D=1\) cases. We leave a more detailed discussion of this kind of issue in model-based case-selection for Chapter 13.

Finally we emphasize that all of the inference in this chapter depends on a model that is constrained by theoretical insights but not one that is trained by data. Although we are able to make many inferences using this model, given the characteristics of a case of interest, we have no empirical grounds to justify these inferences. In Chapter 10 we show how this model can be trained with broader data from multiple cases and in Chapter 16 we illustrate how the model itself can be put into question.

8.2 Institutions and growth

We now consider a second application, again connecting to a major debate in political economy. This time we use the application to illustrate inference given a focus on rival explanations, rather than mediation, and the scope for case level inference that arises specifically from beliefs regarding unobserved confounding.

8.2.1 The debate

Just as there exists a long-running debate about the causes of democratization, a similar macro-level debate surrounds the causes of economic growth. Two of the main proposed explanations are geographic location and the quality of institutions. Geography is a fairly straightforward argument about location relative to the equator. Countries more distant from the equator experience cooler temperatures, climates less prone to disease and other environmental benefits (Sachs 2001). The institutional argument is also quite simple. Going back to Adam Smith, scholars have argued that protections against expropriation and state abuse are key to prosperity. An important contribution by Acemoglu, Johnson, and Robinson (2001) highlighted the difficulty of separating out cause and effect in studies of income and institutions and argued that a plausibly exogeneous feature—settler mortality—might usefully help disentangle the causal effects of institutions.

Rodrik, Subramanian, and Trebbi (2004) pitted these ideas against each other (and against a third focused on trade policy) and concluded that “institutions rule” in the sense that they have a larger average effect. We use the Rodrik, Subramanian, and Trebbi (2004) data and couple it with a causal model in the hopes of being able to use case data to address case level questions: were good institutions plausibly a cause of wealth in a particular country? Does knowing about the location of a country make us more or less confident that institutions mattered?71

8.2.2 A Structural Causal Model

We now construct the model. We are interested in a single outcome: economic productivity (Y) as measured by real per capita GDP in 1995.

We have two causes of interest: rule of law (R) and distance from the equator (D). We also include settler mortality as an instrument for institutional quality. In doing so we allow for the possibility that institutions are not exogenous in our model but assume that (lower) settler mortality has an effect on rule of law but is not related to wealth except via its affect on rule of law.

The model is formed using the CausalQueries package like this:

make_model("M -> R -> Y <- D; R <-> Y") 

The model statement includes two causes for \(Y\) (\(R\) and \(D\)), one cause for \(R\) (\(M\)) that is otherwise unrelated to \(Y\). In addition it allows for arbitrary confounding between \(R\) and \(Y\). The mode is represented graphically in Figure 8.4.

Institutions and growth DAG

Figure 8.4: Institutions and growth DAG

To make case level inferences on causal effects from this model we need informative beliefs over causal relations. As with the last application we will set priors based on three monotonicity assumptions. We return to this question in Chapter 10 where we seek to use data to structure beliefs.

We first adopt the monotonicity assumption built in to RST’s instrumental variables analysis: that \(M\) has a monotonic effect on \(R\). More settler mortality never leads to greater institutional strength. From work on geography and growth (e.g. Sachs (2001)) we adopt the assumption that proximity to the equator does not bolster growth. This makes sense given that geography determines climate, access to natural resources the ease of diffusion of important ideas and resources from other areas. We also suppose that strong institutions do not have a negative affect on national income. Sometimes referenced as the the broader rule of law and other times as a more specific expropriation risk, few argue that weak protection of propert rights is beneficial. For a discussion of “greasing” and “sanding” arguments see Méon and Sekkat (2005).

These restrictions dramatically reduce the number of possible nodal types. In the base model without any restrictions, there are 256 nodal types for \(Y\), \(4\) for \(R\) and \(2\) each for \(M\) and \(D\). Following the restrictions there are 9 nodal types for \(Y\) and \(3\) for \(R\). Allowing confounding however means that we have more parameters than nodal types; specifically we now have \(4 \times 9 = 36\) parameters for \(Y\) reflecting the dependence between the nodal types of \(Y\) and the four nodal types of \(R\).

We highlight that we have imposed no assumption regarding whether \(R\) and \(M\) are substitutes or complements for \(Y\).

8.2.3 Results

We consider first inferences on whether institutions caused good economic outcomes for different cases. We then adjust the model to show how inferences can chance qualitatively when patterns of confounding are specified. Finally we use the same model to demonstrate a shift in queries that switches the roles of explanatory variable and clue. Basic results

We proceed in similar manner as in the inequality democratization example, focusing now on questions of the form “did good institutions cause high income” and assessing how our answer changes as we learn different facts about a case.

Case level inferences given possible observations  of distance and mortality (untrained model).

Figure 8.5: Case level inferences given possible observations of distance and mortality (untrained model).

Each of these four sets of cases is a particular combination of economic growth and institutional quality. Within each combination we have the 4 possible permutations of settler mortality and distance from the equator. This makes for a total of 16 types of countries. For fifteen of these there is a real world example, with the sole exception coming from when growth is high, institutional quality is low, settler mortality is high and distance from the equator is high. As with the inequality and democratization application we highlight that the case level inferences shown here are based purely on the model and do not incorporate richer information about cases that experts will surely have available to them.

Consider first the cases with good growth and strong institutions. Do the good institutions explain the good economic outcomes? We see here that observing that a case is distant from the equator makes you less likely to think that the good economic outcomes is due to the institutions. Strikingly one explanation results in less weight placed on another explanation even though we placed no prior weight on the idea that causes are substitutes.

We return to unpack this feature at the end of this chapter. Beliefs about confounding

If researchers are willing to specify beliefs about selection, then learning about the instrument, \(M\), can become informative about the case level causal effect. In the following example we assume that there is a relation between the effect of \(M\) on \(R\) and \(R\) mattering for \(M\): in particular, we imagine that \(M\) is more likely to have produced \(R\) in cases where \(R\) would make a difference. Substantively we might think that settlers were more likely to create institutions in places in which institutions would be likely to help; thus knowing that settlers plausibly caused good institutions is informative for the effectiveness of institutions. We can think of this more broadly as strong effects for compliers.

In this case, observing \(M=0\) indicates that \(R\) is more likely to have been selected strategically on account of its effect on income. Conversely, learning that there was high mortality suggests that the good institutions were not due to low mortality and so—because selection effects are not in operation—observing good institutions is less informative about the effectiveness of institutions.

Table 8.1: Learning from an instrument when patterns of confounding are specified.
Case M D No clues M only D only M and D
Malaysia 0 0 0.703 0.785 0.826 0.88
Brazil 0 1 0.703 0.785 0.612 0.709
Papua New Guinea 1 0 0.703 0.5 0.826 0.667
Dominican Republic 1 1 0.703 0.5 0.612 0.4 Long run effects

Finally we note that although \(M\) is motivated as an instrument for studying the effect of recent institutions, the model can also be used to understand the effects of \(M\) itself. Indeed the title used by Acemoglu, Johnson, and Robinson (2001) (“The Colonial Origins of Comparative Development”) suggests an interest in the long run processes. For this analysis—using the same model—\(R\) can be thought of as a mediator of the effect of \(M\) on \(Y\) and so serve as a clue for this query. Inferences are shown in the next table.

Table 8.2: Inferences on whether low mortality caused high growth given observations on distance and institutions.
Distance (D) Institutions (R) posterior
0 0 0.000
1 0 0.000
0 1 0.333
1 1 0.200

We see that the strong monotonicity assumption makes \(R=1\) a hoop test for the proposition. Inferences depend also on the distance, for reasons similar to those we saw above, we see a substituion-type logic in operation where we have greater confidence that settlers caused good outcomes for countries for which good outcomes cannot be explained by distance from the equator.

8.2.4 Considerations: Interactions between clues

We saw in the last application that inferences from one clue were affected by data on another clue even though the model did not specify either complementarity or confounding. So where does this interaction come from?

The key idea is that observing \(D=1\) rules out some possible types we were entertaining in which \(R\) makes a difference when \(D=0\) (such as when either \(R\) or \(D\) would be sufficient) without ruling out types in which \(R\) does not make a difference when \(D=0\) (such as cases in which \(R\) and \(D\) are complements but \(D=0\)).

To see the logic more explicitly, imagine we had flat priors over just four possible nodal types for \(Y\) (and two for \(X_2\)). We label the causal types thus:

\(X_2=0\) \(X_2=1\)
\(X_1\) alone (\(\theta^Y = 0101\)) 1 5
\(X_2\) alone (\(\theta^Y = 0011\)) 2 6
Either \(X_1\) or \(X_2\) (\(\theta^Y = 0111\)) 3 7
\(X_1\) and \(X_2\) jointly (\(\theta^Y = 0001\)) 4 8

We are interested in whether \(X_1\) causes \(Y\) conditional perhaps on knowing \(X_1\), \(X_2\), and \(Y\). The next table summarizes inferences. For instance. \(X_1\) causes \(Y\) in cells 1, 3, 5, 8, and so our prior is 50%. If we learn that \(X_2=1\) our prior remains at 50% (two possibilities, 5 and 8, from four, all equally weighted).

Information Cases consistent with data Subset consistent with \(X=1\) causes \(Y=1\) Belief
None (Prior) \(1,2,3,4,5,6,7,8\) \(1,3,5,8\) 1/2
\(X_2 = 1\) \(5, 6, 7, 8\) \(5, 8\) 1/2
\(X_1 = 1, Y = 1\) \(1, 3, 5, 6, 7, 8\) \(1,3,5,8\) 2/3
\(X_1 = 1, Y =1, X_2=1\) \(5,6,7,8\) \(5,8\) 1/2

Note that comparing the last line to the second last line, observation of \(X_2=1\) rules out two types (1,3) in which \(X_1\) could have had a causal effect, without ruling out any types that are consistent with the already available data, in which it does not (such as 6 or 7). The conclusion is that without information on \(X_1\) and \(Y\), \(X_2\) can be uninformative for the effect of \(X_1\) on \(Y\) but can still lead to a reduction in beliefs if \(X_1\) and \(Y\) are known. In some cases it is even possible that \(X_2=1\) could lead you to revise upwards your belief that \(X_1\) causes \(Y=1\) but downwards your belief that \(X_1\) caused \(Y=1\) in a case in which \(X_1 = 1\) and \(Y=1\).72

Consider next the four \(R=0, Y=0\) cases. For these cases we ask: did the lack of high quality institutions cause the lack of growth? We see here that we again learn nothing from \(M\) (regardless of distance from the equator). But, following the same logic as we discussed above, we are now more likely to think that the poor economic outcomes are due to the weak institutions when we learn that a case is far from the equator. We are more confident that poor institutions were the culprit in Pakistan and Vietnam than in Surinam or Nigeria.

For the off diagnonal cases we have nothing to learn since our monotonicity assumption implies that a change would make no difference for these cases. If you are sick when you have an effective medicine you will be sick without it.

In all cases \(M\) makes no difference to inference. Why is that? Looking to the graph it is not because \(R\) separates \(M\) from \(Y\). Without confounding it would separate them, but with confounding, \(R\) is a collider for \(M\) and \(Y\). The reason is rather that although we have allowed for confounding in the model we have given no structure to confounding. Some structure would make a difference here.


Acemoglu, Daron, Simon Johnson, and James A Robinson. 2001. “The Colonial Origins of Comparative Development: An Empirical Investigation.” American Economic Review 91 (5): 1369–1401.
Acemoglu, Daron, and James A Robinson. 2005. Economic Origins of Dictatorship and Democracy. New York: Cambridge University Press.
Ansell, Ben W, and David J Samuels. 2014. Inequality and Democratization. New York: Cambridge University Press.
Boix, Carles. 2003. Democracy and Redistribution. New York: Cambridge University Press.
Bollen, Kenneth A, and Robert W Jackman. 1985. “Political Democracy and the Size Distribution of Income.” American Sociological Review 50 (4): 438–57.
Cheibub, José Antonio, Jennifer Gandhi, and James Raymond Vreeland. 2010. “Democracy and Dictatorship Revisited.” Public Choice 143 (1-2): 67–101.
Dahl, Robert Alan. 1973. Polyarchy: Participation and Opposition. New Haven: Yale University Press.
Haggard, Stephan, and Robert R Kaufman. 2012. “Inequality and Regime Change: Democratic Transitions and the Stability of Democratic Rule.” American Political Science Review 106 (03): 495–516.
Haggard, Stephan, Robert R Kaufman, and Terence Teo. 2012. “Distributive Conflict and Regime Change: A Qualitative Dataset.” Coding Document to Accompany Haggard and Kaufman.
Humphreys, Macartan, and Alan M Jacobs. 2015. “Mixing Methods: A Bayesian Approach.” American Political Science Review 109 (04): 653–73.
Huntington, Samuel P. 1993. The Third Wave: Democratization in the Late Twentieth Century. Norman, OK: University of Oklahoma Press.
Linz, Juan J, and Alfred Stepan. 1996. Problems of Democratic Transition and Consolidation: Southern Europe, South America, and Post-Communist Europe. Baltimore: Johns Hopkins University Press.
Meltzer, Allan H, and Scott F Richard. 1981. “A Rational Theory of the Size of Government.” Journal of Political Economy 89 (5): 914–27.
Méon, Pierre-Guillaume, and Khalid Sekkat. 2005. “Does Corruption Grease or Sand the Wheels of Growth?” Public Choice 122 (1): 69–97.
Rodrik, Dani, Arvind Subramanian, and Francesco Trebbi. 2004. “Institutions Rule: The Primacy of Institutions over Geography and Integration in Economic Development.” Journal of Economic Growth 9 (2): 131–65.
Sachs, Jeffrey D. 2001. “Tropical Underdevelopment.”

  1. In addition, as the industrial bourgeoisie become richer, which increases the Gini, this group faces a greater risk of autocratic expropriation. If we consider the rising bourgeosie’s mobilization to be mobilization by a materially disadvantaged group, then this constitutes an additional positive effect of inequality on mobilization.↩︎

  2. In what follows we ignore that trade openness argument both for reasons of parsimony and because little evidence was found for its importance in Rodrik, Subramanian, and Trebbi (2004).↩︎

  3. Returning to the table, if we were quite sure that \(X_1\) and \(X_2\) were not substitutes (and so remove cases 3 and 7), the last column would be 1/2, 2/3, 3/4, 2/3 and so \(X_2=1\) would lead you to increase your beliefs in the ATE but still reduce your beliefs in POC. If we were quite sure that they were not complements (and so remove cases 4 and 8) then \(X_2=1\) would lead you to reduce your beliefs in both the ATE and the POC. Sometimes however learning that cause \(X_2\) is present can lead you to increase your beliefs that \(X_1\) mattered even given \(X_1=1, Y=1\). For instance say you were unsure whether a case was one in which \(Y=1\) regardless of \(X_1, X_2\) or if \(Y=1\) only if both \(X_1=1\) and \(X_2=1\). Your prior on causal effect is 1/4. If you learn that \(X_1=1\) and \(Y=1\) this increases to \(1/3\) (as you rule out the possibility of joint determination and \(X_2=0\)). However if you just learn that \(X_2=1\) then your belief goes up to \(1/2\) (for both cases where you do and do know \(X_1=1\) and \(Y=1\)).↩︎