Supplementary Material for ‘Bounds on the fixed effects estimand in the presence of heterogeneous assignment propensities’

Author

Macartan Humphreys

A. Derivation of \(\hat{\tau}_{FE}\)

Proposition

Let \(y\) denote a vector of outcomes and \(X\) a matrix in which the first column is the treatment assignment and columns 2 to \(s+1\) are dummy variables for each of \(s\) strata. Let \(n_j, p_j, n^1_j, \overline{y}_j, \overline{y}^0_j\) and \(\overline{y}^1_j\) denote, respectively, the size of stratum \(j\), the share of units in stratum \(j\) in treatment, the number of units in stratum \(j\) in treatment, the average outcome of units in stratum \(j\) and the stratum \(j\) average (observed) outcome among treated and control units respectively.

Then, OLS regression of \(y\) on \(X\) yields:

\[ \begin{bmatrix} \hat{\tau}_{FE} \\ \hat{\alpha}^1_{FE} \\ \vdots \\ \hat{\alpha}^s_{FE} \\ \end{bmatrix} = (X'X)^{-1}X'y = \begin{bmatrix} \frac{\sum_{j} n_j p_j (1 - p_j) (\overline{y}^1_j - \overline{y}^0_j)}{\sum_{j} n_j p_j (1 - p_j)} \\ \overline{y}_1 - p_1 \frac{\sum_{j} n_j p_j (1 - p_j) (\overline{y}^1_j - \overline{y}^0_j)}{\sum_{j} n_j p_j (1 - p_j)} \\ \vdots \\ \overline{y}_s - p_s \frac{\sum_{j} n_j p_j (1 - p_j) (\overline{y}^1_j - \overline{y}^0_j)}{\sum_{j} n_j p_j (1 - p_j)} \\ \end{bmatrix}. \]

Proof

Note first that the matrix \(X'X\) can be represented as a block matrix:

\[ X'X = \begin{bmatrix} n^1 & \mathbf{n^1}' \\ \mathbf{n^1} & M \\ \end{bmatrix} \]

where \(n^1\) is the number of units in treatment, \(\mathbf{n^1} = [n^1_1, n^1_2, \ldots, n^1_s]'\) is the number treated in each stratum, and \(M\) is a diagonal matrix reporting the number of units in each stratum.

From the inversion of block matrices (see Eqn 2.8.17 in Bernstein (2009)):

\[ {\left(X'X\right)}^{-1} = \begin{bmatrix} (n^1 - \mathbf{n^1}' M^{-1} \mathbf{n^1})^{-1} & -(n^1 - \mathbf{n^1}' M^{-1} \mathbf{n^1})^{-1} \mathbf{n^1}' M^{-1} \\ -M^{-1} \mathbf{n^1} (n^1 - \mathbf{n^1}' M^{-1} \mathbf{n^1})^{-1} & M^{-1} + M^{-1} \mathbf{n^1} (n^1 - \mathbf{n^1}' M^{-1} \mathbf{n^1})^{-1} \mathbf{n^1}' M^{-1} \\ \end{bmatrix} \]

Observing that:

\[ \mathbf{n^1}' M^{-1} = (p_1, p_2, \dots, p_s) \]

and defining:

\[ w := (n^1 - \mathbf{n^1}' M^{-1} \mathbf{n^1})^{-1} = \frac{1}{\sum_j{p_jn_j - \sum_jp_j^2n_j}}=\frac{1}{\sum_j n_jp_j(1-p_j)} \]

we have:

\[ {\left(X'X\right)}^{-1} = w \cdot \begin{bmatrix} 1 & -p_1 & -p_2 & \cdots & -p_s \\ -p_1 & \frac{1}{n_1 w} + p_1^2 & p_1 p_2 & \cdots & p_1 p_s \\ -p_2 & p_2 p_1 & \frac{1}{n_2 w} + p_2^2 & \cdots & p_2 p_s \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ -p_s & p_s p_1 & p_s p_2 & \cdots & \frac{1}{n_s w} + p_s^2 \\ \end{bmatrix} \]

Similarly:

\[ X'y = \left(\sum_{i:d_i = 1} y_i, \sum_{i\in X_1}y_i, \dots, \sum_{i\in X_s}y_i \right) \]

\(\hat{\tau}_{FE}\) is then the inner product of the first row of \({\left(X'X\right)}^{-1}\) and \(X'y\):

\[ \hat{\tau}_{FE} = w\left(\sum_{i:d_i = 1} y_i - p_1\sum_{i\in X_1}y_i- p_2\sum_{i\in X_2}y_i- \dots-p_s \sum_{i\in X_s}y_i \right) \]

To simplify, observe that \(\sum_{i: d_i = 1} y_i = \sum_j n_j p_j \overline{y}^1_j\) and \(\sum_{i\in j} y_i = n_j(p_j \overline{y}_j^1 + (1-p_j) \overline{y}_j^0)\), and so:

\[\begin{eqnarray*} \hat{\tau}_{FE} &=& w\left(\sum_j n_j p_j \overline{y}^1_j - \sum_j n_jp_j(p_j \overline{y}_j^1 + (1-p_j) \overline{y}_j^0) \right)\\ &=& \frac{\sum_{j}n_j p_j(1- p_j) (\overline{y}^1_j- \overline{y}_j^0)}{\sum_j n_jp_j(1-p_j)}\\ \end{eqnarray*}\]

In the same way:

\[\begin{eqnarray*} \hat{\alpha}^j_{FE} &=& -wp_j\left(\sum_{i:d_i = 1} y_i - p_1\sum_{i\in X_1}y_i -p_2\sum_{i\in X_2}y_i- \dots - p_s \sum_{i\in X_s}y_i \right) -w\left( - \frac{1}{n_jw} \right)\sum_{i\in X_j}y_i \\ &=&\overline{y}_j - p_j\hat{\tau}_{FE} \end{eqnarray*}\]

B. Omitted steps in proof of Proposition 1

The proof for Proposition 1 relies on an equivalence between:

\[ \sum\nolimits_j\frac{p_j(1-p_j)w_j}{\sum_j{p_j(1-p_j)w_j}}\tau_j \leq \sum\nolimits_j\frac{p_jw_j}{\sum_j{p_jw_j}}\tau_j \]

and:

\[ \sum\nolimits_j{\left(\frac{p_jw_j}{\sum_j{p_jw_j}} -\frac{p_j^2w_j}{\sum_j{p_j^2w_j}}\right)} \tau_j \leq 0 \]

To see this equivalence, define \(\alpha = \sum_j{p_jw_j}\) and \(\beta = \sum_j{p_j^2w_j}\).

The first condition can then be written:

\[ \sum\nolimits_j{p_jw_j}\tau_j\frac{1-p_j}{\alpha-\beta} \leq {\sum_j{p_jw_j}}\tau_j \frac1\alpha \]

This is equivalent to:

\[ \sum\nolimits_j p_jw_j\tau_j\left(\frac{1-p_j}{\alpha-\beta} - \frac1\alpha\right) \leq 0 \]

\[ \sum\nolimits_j p_jw_j\tau_j\left(\frac{\beta-\alpha p_j}{(\alpha-\beta)\alpha}\right) \leq 0 \]

\[ \sum\nolimits_j p_jw_j\tau_j\left(\frac{1}\alpha-\frac{p_j}{\beta}\right) \leq 0 \]

Where the last step results from multiplying across by \(\frac{\alpha-\beta}{\beta} >0\).

Resubstituting for \(\alpha\) and \(\beta\) yields the result.

C. Derivation of Equation 11

In the linear case, monotonicity is satisfied and we can therefore write:

\[ \tau_{FE} = \lambda \tau_{ATT} + (1-\lambda) \tau_{ATC}\]

Substituting from Equations 5, 6, and 8:

\[ \frac{\sum\nolimits_{j}p_{j}(1-p_{j})w_{j}\tau_j}{\sum\nolimits_{j}p_{j}(1-p_{j})w_{j}} = \lambda \frac{\sum\nolimits_{j}p_j w_j \tau_j}{\sum\nolimits_{j} p_j w_j} + (1-\lambda) \frac{\sum\nolimits_{j}(1 - p_j) w_j \tau_j}{\sum\nolimits_{j} (1 - p_j) w_j}.\]

With treatment effects linear in \(p_j\) we have for some \(\beta\):

\[ \frac{\sum\nolimits_{j}p_{j}(1-p_{j})w_{j}\beta p_j}{\sum\nolimits_{j}p_{j}(1-p_{j})w_{j}} = \lambda \frac{\sum\nolimits_{j}p_j w_j\beta p_j}{\sum\nolimits_{j} p_j w_j} + (1-\lambda) \frac{\sum\nolimits_{j}(1 - p_j) w_j\beta p_j}{\sum\nolimits_{j} (1 - p_j) w_j}\]

Dividing across by \(\beta\), and gathering terms gives:

\[\frac{\sum\nolimits_{j}p_{j}^2(1-p_{j})w_{j}}{\sum\nolimits_{j}p_{j}(1-p_{j})w_{j}} = \lambda \frac{\sum\nolimits_{j}p_j^2 w_j}{\sum\nolimits_{j} p_j w_j} + (1-\lambda) \frac{\sum\nolimits_{j}p_j(1 - p_j) w_j}{\sum\nolimits_{j} (1 - p_j) w_j}\]

Solving for \(\lambda\) then yields:

\[\begin{align*} \lambda = \frac{\frac{\sum_j p_j^2 (1 - p_j)w_j}{\sum_j p_j (1 - p_j)w_j} - \frac{\sum_j p_j (1 - p_j)w_j}{\sum_j (1 - p_j)w_j}}{\frac{\sum_j p_j^2w_j}{\sum_j p_jw_j} - \frac{\sum_j p_j (1 - p_j)w_j}{\sum_j (1 - p_j)w_j}} \end{align*}\]

D. Code illustration

There are many approaches that can be used to generate unbiased estimates in the presence of heterogeneous but known propensities across strata. I illustrate by using the {} package {} to simulate data from a version of Example 1 and show the performance of five estimation strategies: pooled OLS, OLS with stratum dummies (fixed effects), Inverse Propensity Weighting (IPW), OLS but with interactions between treatment and demeaned stratum dummies, following Lin (2013), and blocked differences in means. This code draws from material in Blair, Coppock, and Humphreys (2018).

library(DeclareDesign)
library(dplyr)

prob <- c(.067, .5, .933)

design <-
  declare_model(
    block = add_level(N = 3, p = prob, tau = c(3, -3, 3)),
    unit = add_level(N = 1000, Y0 = 10*(p + rnorm(N)), Y1 = Y0 + tau)) + 
  declare_inquiry(ATE = mean(Y1 - Y0)) +
  declare_assignment(Z = block_ra(blocks = block, block_prob = prob)) +
  declare_measurement(
    ipw = 1/(Z*p + (1-Z)*(1-p)),
    Y = Z*Y1 + (1-Z)*Y0) + 
  declare_estimator(Y ~ Z, .method =  lm_robust, 
    label = "Pooled") +
  declare_estimator(Y ~ Z + block, .method =  lm_robust,  
    label = "Fixed effects") +
  declare_estimator(Y ~ Z, blocks = block, .method =  difference_in_means, 
    label = "Blocked differences in means") +
  declare_estimator(Y ~ Z, covariates = ~ block, .method = lm_lin,
    label = "Interactions (Lin approach)") +
  declare_estimator(Y ~ Z, .method = lm_robust, weight = ipw,
    label = "Inverse propensity weights")

Simulation, summary and output:

simulate_design(design) |>
 group_by(estimator) |>
 summarize(
  SE_bias = mean(std.error - sd(estimate)),
  ATE_bias = mean(estimate - estimand) ) |>
  knitr::kable(digits = 2)

estimator	SE_bias	ATE_bias
Blocked differences in means	0.01	0.01
Fixed effects	0.03	-2.01
Interactions (Lin approach)	0.01	0.01
Inverse propensity weights	0.04	0.01
Pooled	0.03	5.02

The results highlight the poor performance of both the pooled approach and the fixed effects approach in this setting. IPW, the interaction model, and blocked differences in means are all unbiased, though they differ in the performance of standard errors—assessed here as the difference between the estimated average standard error and the estimated standard deviation of the sampling distribution of estimates under each estimator.

References

Bernstein, Dennis S. 2009. Matrix Mathematics: Theory, Facts, and Formulas. Princeton university press.

Blair, Graeme, Alexander Coppock, and Macartan Humphreys. 2018. “The Trouble with ‘Controlling for Blocks’.” https://declaredesign.org/blog/posts/biased-fixed-effects.html.

Lin, Winston. 2013. “Agnostic Notes on Regression Adjustments to Experimental Data: Reexamining Freedman’s Critique.” The Annals of Applied Statistics, 295–318.