- 1 How you can get serious underestimation of many quantities by looking at outcomes only among those that are stopped
- 2 But for some quantities the naive estimate can be an
*over*estimate.- 2.1 An example in which the naive estimate overestimates discrimination overall but not discrimination
*among the stopped* - 2.2 An example in which the naive estimate overestimates the discrimination
*among the stopped*(\(CDE_{M-1}\)) - 2.3 The lazy racist: An example where the naive estimate overestimates the
*average*effect overall*and*among the stopped - 2.4 The lazy racist: An example where the naive estimate overestimates the
*average*effect overall*and*among the stopped

- 2.1 An example in which the naive estimate overestimates discrimination overall but not discrimination

Following publication of Knox, Lowe, and Mummolo there has been a lively debate in threads (e.g. Mommolo, Sharad, Blackwell) regarding estimation of race-based discrimination using data that is only available for people that are stopped by the police in the first place. See also this nice explainer by Laura Bronner which motivates this note.

The question of interest is does, or how does, using data only on people that were stopped throw off your estimates of discrimination? It’s a hard question and there are deep methodological and substantive issues at play. The paper and the issues it deals with should be required in all classes on causal inference.

The clear take away from most accounts of this issue is that the estimate of discrimination from this type of data is likely doing a bad job of estimating something. It’s a little harder to see however:

- what exactly it is
*meant*to be estimating—i.e. what the “estimand” is;^{1}

- where the problem is coming from—for instance this great explainer by Laura Bronner describes a form of selection bias somewhat distinct to what is studied in Knox, Lowe, and Mummolo and
- whether the naive approach necessarily produces an overestimate or an underestimate of the quantities of interest.

This note tries to clarify some of this.

- For 1 it distinguishes six different possible quantities of interest (many in Knox, Lowe, and Mummolo).
- For 2, I walk through an example where you can see, I hope, where the “collider” bias is coming from.
- The answer to 3 really depends on your answer to 1 and the underlying causal relations. Knox et al provide a set of conditions under which we are essentially guaranteed (“essentially” because of this discussion) to have an
*under*estimate for some important causal quantities. But important as this is, more generally whether the naive approach produces an underestimate depends (of course) on what you are trying to estimate and what else you are willing to assume about causal processes.

Let’s start with a simple example in which there is discrimination in stopping and discrimination in the use of force given that you are stopped. The example yields (at least) six causal quantities of interest. These range in value from 20% for the average effect of race on force among those that are stopped, to 67% for the share among minorities that experience violence for whom the violence was due to their race.

Although all six causal quantities of interest are positive, the estimate you get from the simple approach is actually *negative*.

The summary results are in the Table below. Then there’s a walk through.

Quantity | Value |
---|---|

1 Effect of race on probability of being stopped | +60% |

2 Effect of race on use of force if you were to be stopped | +20% |

3 Effect of race on probability of experiencing force | +40% |

4 Effect of race on probability of experiencing force among those that are stopped | +40% |

5 Effect of race on use of force if you were to be stopped among those that are stopped | +20% |

6 Probability that force used on a minority is due to race | +67% |

A Naive estimate from data | -25% |

These numbers are generated from the following simple scenario. There are 10 people that differ in terms of (a) whether they are minority of not (\(M=0\) or \(M=1\)) and (b) how they score on a “suspiciousness” score ranging from 0 to 4.

Importantly I will assume that people are equally divided into the possible \(S,M\) combinations and in doing so we will assume that *race is completely unrelated to suspiciousness*. I represent the 10 people and their values on \(M\) and \(S\) in the table below.

M=0 | M=1 | |
---|---|---|

S = 0 | a | A |

S = 1 | b | B |

S = 2 | c | C |

S = 3 | d | D |

S = 4 | e | E |

I have labeled people with letters which I will use later, but the key idea is that person \(a\) is comparable to person \(A\) *except* for race. We think of discrimination as the differences in outcomes for person \(A\) and person \(a\).

**Racist stopping rules.** We imagine now that there are racist stopping rules. Police stop people using information not just on suspiciousness but also on whether or not they are minorities. We will assume in particular that the stopping rule is “Stop if \(3M + S \geq 4\)”. Under this rule the following people are stopped:

M=0 | M=1 | |
---|---|---|

S = 0 | a | A |

S = 1 | b | B |

S = 2 | c | C |

S = 3 | d | D |

S = 4 | e | E |

The key thing is that a different stopping rule is used depending on \(M\). This introduces a statistical relationship between race and suspiciousness. Whereas I noted above that race and suspiciousness are unrelated to each other, you can see here that *among those stopped* \(S\) is higher, on average, among non minorities than among minorities. Since \(S\) is also related to the use of Force this means that, even if there were no race-based discrimination on the use of force, we might expect, all else equal, to observe more force among non minorities *among the set stopped*, but only because we have selected out the less suspicious non minorities.

**Racist use of force, if stopped.** We imagine that *were* individuals to be stopped, the use of force would also be racist, a function of race and not just suspiciousness. We will assume in particular that the use of force is determined by “Use force if \(M + S \geq 3\)”. Under this rule the following people would experience force if they were stopped:

M=0 | M=1 | |
---|---|---|

S = 0 | a | A |

S = 1 | b | B |

S = 2 | c | C |

S = 3 | d | D |

S = 4 | e | E |

**Racist use of force.** From the above we can figure out who would be stopped *and* be subjected to the use of force. These people are highlighted below.

M=0 | M=1 | |
---|---|---|

S = 0 | a | A |

S = 1 | b | B |

S = 2 | c | C |

S = 3 | d | D |

S = 4 | e | E |

From this table you can see for instance that \(E\) experiences force but not because of race, since if hey were not minority they would be like \(e\), but \(e\) also experiences force. \(C\) and \(D\) do experience force because of race. \(A\) and \(B\) do not experience force at all and nor would they if they were like \(a\) and \(b\).

We are interested in the effects of race. Gaebler et al. (2020) and Knox, Lowe, and Mummolo both highlight the conceptual difficulties in this idea, but for this discussion I will run with the idea that we can imagine an experiment in which we can keep everything constant about an individual except their race—or perhaps, keep everything constant but imagine how outcomes would be different had police made different inferences regarding about race.

Here are six quantities of interest, all of which could be of substantive importance:

**The effect of race on the probability of being stopped.**We see in the stopped table that race makes a difference to being stopped when \(S\) is 1,2, or 3. So for three fifths of people: +60%**The effect of race on the use of force if you were to be stopped.**Here we imagine the counterfactual case in which everyone is stopped. We see in the “Force if stopped” table that race makes a difference on being stopped when \(S=2\). So for one fifth of people: +20%**Effect of race on probability of experiencing force.**We see in the “Force” table that race makes a difference on being stopped when \(S=2\) or \(S=3\). So for two fifths of people: +40%**Effect of race on probability of experiencing force among those that are stopped.**Here we are interested specifically in the effect of race among people who are, in fact, stopped. From the stopped table we see that these are people in groups e, B, C, D, E. Of these, from the “Force” table, we see that C and D experience force*because*of race. So, also 40%. Note that the logics are slightly different for C and D. Although both are stopped, C would would not have experienced force if she had \(M=0\) even if she were stopped. D would experience force even if she had \(M=0\) if she were stopped, but were she of type \(M=0\) she wouldn’t have been stopped in the first place and for that reason avoids experiencing force.**The effect of race on the use of force if you were to stopped among those that are in fact stopped.**This is the “controlled direct effect among the observed” estimand. Unlike the last query, this time we keep “stopped” counterfactually fixed when we make imagine changing race. Again we consider e, B, C, D, E—the people that were stopped. Of these,*conditional on being stopped*, only \(C\) would have had a different outcomes if they were minority. So, 1 in 5.**Probability that force used on a minority is due to race.**Here we are interested in the effect of race specifically among C, D, and E. Of these there is an effect of race for C and D but not E. So 67%.

The naive **estimate** meanwhile compares the share among stopped minorities that experience force (3 out of 4) to the share among stopped non minorities that experience force (1 out of 1), yielding 75% - 100% = - 25%.

It’s really terribly wrong.

Let’s take the same example but imagine that the use of force, given stopped, is even more racist. In particular now we will assume that force is never used on whites. How does our estimate compare with the estimands?

M=0 | M=1 | |
---|---|---|

S = 0 | a | A |

S = 1 | b | B |

S = 2 | c | C |

S = 3 | d | D |

S = 4 | e | E |

Quantity | Value |
---|---|

1 Effect of race on probability of being stopped (\(S=1,2,3\)) | +60% |

2 Effect of race on use of force if you were to be stopped (\(S = 2,3,4\)) | +60% |

3 Effect of race on probability of experiencing force (\(S = 2,3,4\)) | +60% |

4 Effect of race on probability of experiencing force among those that are stopped ((e, C, D, E)/(e, B, C, D, E)) | +80% |

5 Effect of race on use of force if you were to be stopped among those that are stopped ((e, C, D, E)/(e, B, C, D, E)) | +80% |

6 Probability that force used on a minority is due to race | +100% |

A Naive estimate from data | +75% |

So in this last case the actual estimate is quite high and it is *higher* than some of the target quantities. For instance it is an *over*estimate of the effect of race on use of force if you were to be stopped. But it is still gives underestimate of the effect of race on the probability of experiencing force among those that are stopped. (Indeed Knox supplementary materials has proofs for quantities 4 and 5).

So: whether you are over- or underestimating discrimination using the naive approach depends on precisely what quantity you are aiming for. Which of these quantities is most important for what kind of policy intervention?

In the figure below black entries are “stopped” and grey not and a 1 indicates a use of force if stopped and a 0 indicates no use of force if stopped.

So a black 1 means stopped and force employed. As before we will assume a deterministic process and a switch in \(M\) moves a unit between columns but maintains position otherwise.

As before, \(M\) indicates minority status and \(S\) “suspiciousness” (now binary). These are not correlated with each other. Both being stopped and use of force if stopped are increasing in both \(M\) and \(S\).

- The effect of \(M\) on \(Y\) among the stopped is 9/21: 43%.
- The
*controlled direct effect*of \(M\) on \(Y\) among the stopped is 7/21: 33%. - The naive estimate for the effect of \(M\) is: (8/15) - (1/6) = 37%

So the naive estimate overestimates the controlled direct effect but not the average effect, among the stopped.

This seems to contradict the results in Knox et al but in fact the example violates one of their assumptions, “relative non severity of racial stops” (Assumption 3) since here:

- Force is used for (3/6) minorities that are stopped regardless of race.
- Force is used for (5/9) minorities that are stopped
*because*of race.

If you think that racist stops are possibly going to involve *more* force (and you care about controlled direct effects) then you might be worried about this example.

Imagine a world in which there are two types of police officers. One type (\(U=0\)) is diligent but non violent: they stop often but don’t use force so often. The other type (\(U=1\)) is lazier, stopping less overall, but likely to employ violence among minorities they stop.

U=0 | U=1 | |
---|---|---|

D=0 | 0.5 | 0.1 |

D=1 | 0.5 | 0.2 |

U=0 | U=1 | |
---|---|---|

D=0 | 0.2 | 0.1 |

D=1 | 0.2 | 0.9 |

With such an underlying data generating process we might observed data like this: force used among 280 out of 700 minorities and among 110 out of 600 non-minorities.

These values imply the following estimands and (large N) estimates:

```
data.frame(quantity = c("CDE | M=1", "ATE | M=1", "ATE", "Estimate", "Assumption 3 (blacks)", "Assumption 3 (whites)"),
value = c(CDE_M1 = cde_m1(p00, p10, p01, p11, y00, y10, y01, y11),
ATE_M1 = ate_m1(p00, p10, p01, p11, y00, y10, y01, y11),
ATE = ate(p00, p10, p01, p11, y00, y10, y01, y11),
estimate = est(p00, p10, p01, p11, y00, y10, y01, y11),
A3_blacks = A3_b(p00, p10, p01, p11, y00, y10, y01, y11),
A3_whites = A3_w(p00, p10, p01, p11, y00, y10, y01, y11))) %>% kable(row.names = FALSE, digits = 2) %>%
kable_styling(full_width = F)
```

quantity | value |
---|---|

CDE | M=1 | 0.18 |

ATE | M=1 | 0.19 |

ATE | 0.09 |

Estimate | 0.22 |

Assumption 3 (blacks) | 0.00 |

Assumption 3 (whites) | 1.00 |

Imagine a world in which there are two types of police officers. One type (\(U=0\)) is diligent but non violent: they stop often but don’t use force so often. The other type (\(U=1\)) is lazier, stopping less overall, but likely to employ violence among minorities they stop.

U=0 | U=1 | |
---|---|---|

D=0 | 0.5 | 0.10 |

D=1 | 0.7 | 0.15 |

U=0 | U=1 | |
---|---|---|

D=0 | 0.2 | 0.1 |

D=1 | 0.2 | 0.9 |

With such an underlying data generating process we might observed data like this: force used among 275 out of 850 minorities and among 110 out of 600 non-minorities.

These values imply the following estimands and (large N) estimates:

```
data.frame(quantity = c("CDE | M=1", "ATE | M=1", "ATE", "Estimate", "Assumption 3 (blacks)", "Assumption 3 (whites)"),
value = c(CDE_M1 = cde_m1(p00, p10, p01, p11, y00, y10, y01, y11),
ATE_M1 = ate_m1(p00, p10, p01, p11, y00, y10, y01, y11),
ATE = ate(p00, p10, p01, p11, y00, y10, y01, y11),
estimate = est(p00, p10, p01, p11, y00, y10, y01, y11),
A3_blacks = A3_b(p00, p10, p01, p11, y00, y10, y01, y11),
A3_whites = A3_w(p00, p10, p01, p11, y00, y10, y01, y11))) %>% kable(row.names = FALSE, digits = 2) %>%
kable_styling(full_width = F)
```

quantity | value |
---|---|

CDE | M=1 | 0.14 |

ATE | M=1 | 0.17 |

ATE | 0.08 |

Estimate | 0.14 |

Assumption 3 (blacks) | 0.00 |

Assumption 3 (whites) | 1.00 |

Knox, Lowe, and Mummolo mostly focus on the effect of race on force among those that are stopped; Gaebler et al. (2020) mostly focus on what would be the average effect on race on force were someone to be stopped–the controlled direct effect; Bronner focuses on the effect of race on force experienced.↩︎