Causal Inference

Topics I

Macartan Humphreys

1 Topics 1

1.1 Covariate Adjustment

1.1.1 When to condition? What to condition on?

Note: Next slides are also used in estimands deck

The key idea from Pearl is that you want to find a set of variables such that when you condition on these you get what you would get if you had random assignment—or, used a do operation.

1.1.2 When to condition? What to condition on?

Intuition:

You could imagine creating a “mutilated” graph by removing all the arrows leading out of X
Then select a set of variables, $Z$, such that $X$ and $Y$ are d-separated by $Z$ on the the mutilated graph
When you condition on these you are making sure that any covariation between $X$ and $Y$ is covariation that is due to the effects of $X$

1.1.3 Illustration

1.1.4 Illustration: Remove paths out

1.1.5 Illustration: Block backdoor path

1.1.6 Illustration: Why not like this?

1.1.7 Backdoor Criterion: (Pearl 1995)

The backdoor criterion is satisfied by $Z$ (relative to $X$, $Y$) if:

No node in $Z$ is a descendant of $X$
$Z$ blocks every backdoor path from $X$ to $Y$ (i.e. every path that contains an arrow into $X$)

In that case you can identify the effect of $X$ on $Y$ by conditioning on $Z$:

\[P(Y=y | \hat{x}) = \sum_z P(Y=y| X = x, Z=z)P(z)\] (This is eqn 3.19 in Pearl (2000))

1.1.8 Backdoor Criterion: (Pearl 1995)

\[P(Y=y | \hat{x}) = \sum_z P(Y=y| X = x, Z=z)P(z)\]

No notion of a linear control or anything like that; idea really is like blocking: think lots of discrete data and no missing patterns
Note this is a formula for a (possibly counterfactual) level; a counterfactual difference would be given in the obvious way by:

\[P(Y=y | \hat{x}) - P(Y=y | \hat{x}')\]

1.1.9 Backdoor Proof

Following Pearl (2009), Chapter 11. Let $T$ denote the set of parents of $X$: $T := pa(X)$, with (possibly vector valued) realizations $t$. These might not all be observed.

If the backdoor criterion is satisfied, we have:

$Y$ is independent of $T$, given $X$ and observed data, $Z$ (since $Z$ blocks backdoor paths)
$X$ is independent of $Z$ given $T$. (Since $Z$ includes only nondescendents)

Key idea: The intervention level relates to the observational level as follows: \[p(y|\hat{x}) = \sum_{t\in T} p(t)p(y|x, t)\]
Think of this as fully accounting for the (possibly unobserved) causes of $X$, $T$

1.1.10 Backdoor Proof

We want to get to:

\[p(y|\hat{x}) = \sum_{t\in T} p(t)p(y|x, t)\]

But of course we do not observe $T$, rather we observe $Z$. So we now need to write everything in terms of $Z$ rather than $T$.

We bring $Z$ into the picture by writing:

\[p(y|\hat{x}) = \sum_{t\in T} p(t) \sum_z p(y|x, t, z)p(z|x, t)\]

now we want to get rid of $T$…

1.1.11 Backdoor Proof

now we want to get rid of $T$…

Using the two conditions from the backdoor definition above:
1. replace $p(y|x, t, z)$ with $p(y | x, z)$
2. replace $p(z|x, t)$ with $p(z|t)$

This gives: \[p(y|\hat x) = \sum_{t \in T} p(t) \sum_z p(y|x, z)p(z|t)\]

Cleaning up, we can get rid of $T$:

\[p(y|\hat{x}) = \sum_z p(y|x, z)\sum_{t\in T} p(z|t)p(t) = \sum_z p(y| x, z)p(z)\]

1.1.12 Backdoor proof figure

For intuition:

We would be happy if we could condition on the parent $T$, but $T$ is not observed. However we can use $Z$ instead making use of the fact that:

$p(y|x, t, z) = p(y | x, z)$ (since $Z$ blocks)
$p(z|x, t) = p(z|t)$ (since $Z$ is upstream and blocked by parents, $T$)

1.1.13 Guidelines

1.1.14 Example

Consider for example this data.

You randomly pair offerers and receivers in a dictator game (in which offerers decide how much of $1 to give to receivers).
Your population comes from two groups (80% Baganda and 20% Banyankole) so in randomly assigning partners you are randomly determining whether a partner is a coethnic or not.
You find that in non-coethnic pairings 35% is offered, in coethnic pairings 48% is offered.

Should you believe it?

1.1.15 Covariate Adjustment

Population: randomly matched Baganda (80% of pop) and Banyankole (20% of pop)
You find: in non-coethnic pairings 35% is offered, in coethnic pairings 48% is offered.
But a closer look at the data reveals…

		To: Baganda	To: Banyankole
Offers by	Baganda	64%	16%
	Banyankole	16%	4%

Figure 1: Number of Games

		To: Baganda	To: Banyankole
Offers by	Baganda	50	50
	Banyankole	20	20

Figure 2: Average Offers

So that’s a problem

1.1.16 Covariate Adjustment

Control?

With such data you might be tempted to ‘control’ for the covariate (here: ethnic group), using regression.
But, perhaps surprisingly, it turns out that regression with covariates does not estimate average treatment effects.
It does estimate an average of treatment effects, but specifically a minimum variance estimator, not necessarily an estimator of your estimand.

1.1.17 Covariate Adjustment

Compare:

$\hat{\tau}_{ATE} =\sum_{x} \frac{w_x}{\sum_{j}w_{j}}\hat{\tau}_x$
$\hat{\tau}_{OLS} =\sum_{x} \frac{w_xp_x(1-p_x)}{\sum_{j}w_j{p_j(1-p_j)}}\hat{\tau}_x$

Instead you can use formula above for $\hat{\tau}_{ATE}$ to estimate ATE

alternatively…

1.1.18 Covariate adjustment via saturated regression

Alternatively you can use propensity weights.
Alternatively you can use a regression that includes both the treatment and the treatment interacted with the covariates.
- In practice this is best done by demeaning the covariates; doing this lets you read off the average effect from the main term. Key resource: Lin (2012)

You should have noticed that the logic for controlling for a covariate here is equivalent to the logic we saw for heterogeneous assignment propensities. These are really the same thing.

1.1.19 Covariate adjustment via saturated regression

Returning to prior example:

df <- fabricatr::fabricate(
  N = 500, 
  X = rep(0:1, N/2), 
  Z = rbinom(N, 1, .2 + .3*X),
  Y = rnorm(N) + Z*X)

lm_robust(Y ~ Z*X_c, data = df |> mutate(X_c = X - mean(X))) |>
  tidy() |> kable(digits = 2)

term	estimate	std.error	statistic	p.value	conf.low	conf.high	df	outcome
(Intercept)	-0.05	0.06	-0.89	0.37	-0.17	0.06	496	Y
Z	0.47	0.11	4.32	0.00	0.26	0.68	496	Y
X_c	0.08	0.12	0.65	0.51	-0.15	0.30	496	Y
Z:X_c	0.85	0.22	3.91	0.00	0.42	1.28	496	Y

1.1.20 Covariate adjustment via saturated regression

Returning to prior example:

lm_lin(Y ~ Z, ~ X, data = df) |>
  tidy() |> kable(digits = 2)

term	estimate	std.error	statistic	p.value	conf.low	conf.high	df	outcome
(Intercept)	-0.05	0.06	-0.89	0.37	-0.17	0.06	496	Y
Z	0.47	0.11	4.32	0.00	0.26	0.68	496	Y
X_c	0.08	0.12	0.65	0.51	-0.15	0.30	496	Y
Z:X_c	0.85	0.22	3.91	0.00	0.42	1.28	496	Y

1.1.21 Demeaning and saturating

Demeaning interactions

Say you have a factorial design with treatments X1 and X2 (or observational data with two covariates)
You analyse with a model that has main terms and interaction terms
Interpreting coefficients can be confusing, but sometimes demeaning can help. What does demeaning do?

1.1.22 Demeaning and saturating

Let’s:

Declare a factorial design in which Y is generated according to

f_Y <- function(X1, X2, u) .1 + .2*X1 + .3*X2 + u*X1*X2

where u is distributed $U[0,1]$.

Specify estimands carefully
Run analyses in which we do and do not demean the treatments; compare and explain results

1.1.23 Demeaning interactions

f_Y <- function(X1, X2, u) .1 + .2*X1 + .3*X2 + u*X1*X2

design <-
  declare_model(N = 1000, u = runif(N),
                X1 = complete_ra(N), X2 = block_ra(blocks = X1),
                X1_demeaned = X1 - mean(X1),
                X2_demeaned = X2 - mean(X2),
                Y = f_Y(X1, X2, u)) +
  declare_inquiry(
    base = mean(f_Y(0, 0, u)),
    average = mean(f_Y(0, 0, u) + f_Y(0, 1, u)  + f_Y(1, 0, u)  + f_Y(1, 1, u))/4,
    CATE_X1_given_0 = mean(f_Y(1, 0, u) - f_Y(0, 0, u)),
    CATE_X2_given_0 = mean(f_Y(0, 1, u) - f_Y(0, 0, u)),
    ATE_X1 = mean(f_Y(1, X2, u) - f_Y(0, X2, u)),
    ATE_X2 = mean(f_Y(X1, 1, u) - f_Y(X1, 0, u)),
    I_X1_X2 = mean((f_Y(1, 1, u) - f_Y(0, 1, u)) - (f_Y(1, 0, u) - f_Y(0, 0, u)))
  ) +
  declare_estimator(Y ~ X1*X2, 
                    inquiry = c("base", "CATE_X1_given_0", "CATE_X2_given_0", "I_X1_X2"), 
                    term = c("(Intercept)", "X1", "X2", "X1:X2"),
                    label = "natural") +
  declare_estimator(Y ~ X1_demeaned*X2_demeaned, 
                    inquiry = c("average", "ATE_X1", "ATE_X2", "I_X1_X2"), 
                    term = c("(Intercept)", "X1_demeaned", "X2_demeaned", "X1_demeaned:X2_demeaned"),
                    label = "demeaned")

1.1.24 Demeaning interactions: Solution

Estimator	Inquiry	Term	Mean Estimand	Mean Estimate
demeaned	ATE_X1	X1_demeaned	0.45	0.45
demeaned	ATE_X2	X2_demeaned	0.55	0.55
demeaned	I_X1_X2	X1_demeaned:X2_demeaned	0.50	0.50
demeaned	average	(Intercept)	0.48	0.48
natural	CATE_X1_given_0	X1	0.20	0.20
natural	CATE_X2_given_0	X2	0.30	0.30
natural	I_X1_X2	X1:X2	0.50	0.50
natural	base	(Intercept)	0.10	0.10

It’s all good. But you need to match the estimator to the inquiry: demean for average marginal effects; do not demean for conditional marginal effects.

1.1.25 Recap

If you have different groups with different assignment propensities you can do any or all of these:

Blocked differences in means
Inverse propensity weighting
Saturated regression (Lin)
More… (coming)

You cannot (reliably):

Ignore the groups
Include them in a regression (without interactions)

1.2 To control or not control

Even “good” controls might not pay their way in practice:

When (and how) does controlling for covariates improve things and when does it make it worse

1.2.1 Considerations

Even though randomization ensures no bias, you may sometimes want to “control” for covariates in order to improve efficiency (see the discussion of blocking above).
Or you may have to take account of the fact that the assignment to treatment is correlated with a covariate (as above).
In observational work you might also figure out you have to control for a covariate to justify inferences (Refer to our discussion of the backdoor criteria)

1.2.2 Observational work

Observational motivation: Controls can provide grounds for identification
But recall – they can also destroy identification

For a great walk through of what you can draw from graphical models for the decision to control see:

A Crash Course in Good and Bad Controls by Cinelli, Forney, and Pearl (2022)

Aside: these implications generally refer to use controls as covariates – e.g. by implementing blocked differences in means or similar. For a Bayesian model of the form used in CausalQueries the information from “bad controls” is used wisely.

1.2.3 Experimental work

Conditional Bias and Precision Gains from Controls
Experimental motivation: Controls can reduce noise and improve precision. This is an argument for using variables that are correlated with the output (not with the treatment).

1.2.4 Precision Gains from Controls

However: Introducing controls can create complications

As argued by Freedman (summary from Lin (2012)), we can get: “worsened asymptotic precision, invalid measures of precision, and small-sample bias”$^*$
These adverse effects are essentially removed with an interacted model
See discussions in Imbens and Rubin (2015) (7.6, 7.7) and especially Theorem 7.2 for the asymptotic variance of the estimator

$^*$ though note that the precision concern does not hold when treatment and control groups are equally sized

1.2.5 To control or not

We will illustrate by comparing:

DIM
OLS (linear controls)
Lin (saturated)

With:

Varying fraction assigned to treatment
Varying relation between $Y(0)$ and $Y(1)$

1.2.6 Declaration (from RDSS)

# https://book.declaredesign.org/library/experimental-causal.html

prob <- 0.5
control_slope <- -1

declaration_18.3 <-
  declare_model(N = 100, X = runif(N, 0, 1),
                U = rnorm(N, sd = 0.1),
                Y_Z_1 = 1*X + U, Y_Z_0 = control_slope*X + U
  ) +
  declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) +
  declare_assignment(Z = complete_ra(N = N, prob = prob)) + 
  declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
  declare_estimator(Y ~ Z, inquiry = "ATE", label = "DIM") +
  declare_estimator(Y ~ Z + X, .method = lm_robust, inquiry = "ATE", label = "OLS") +
  declare_estimator(Y ~ Z, covariates = ~X, .method = lm_lin, 
                    inquiry = "ATE", label = "Lin")

1.2.7 Implied potential outcomes

The variances and covariance of potential outcomes depend on the slope parameter

1.2.8 To control or not

Diagnosis:

declaration_18.3 |> 
  redesign(
    control_slope = seq(-1, 1, 0.5), 
    prob = seq(0.1, 0.9, 0.1)) |> 
  diagnose_designs()

1.2.9 To control or not

Lin always does well
OLS does fine with equal probability assignments
Otherwise the ranking of DIM and OLS is design dependent

1.2.10 Conditional Bias and Precision Gains from Controls

Treatment correlated with covariates can induce “conditional bias.”
Including controls that are correlated with treatment can introduce inefficiencies
Including controls can change your estimates so be sure not to fish!

1.2.11 Declaration

a <- 0
b <- 0

design <- 
  declare_model(N = 100,
                        X = rnorm(N),
                        Z = complete_ra(N),
                        Y_Z_0 = a*X + rnorm(N),
                        Y_Z_1 = a*X + correlate(given = X, rho = b, rnorm) + 1,
                        Y = reveal_outcomes(Y ~ Z)) +
  declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) +
  declare_estimator(Y ~ Z, covariates = ~X, .method = lm_lin, label = "Lin") +
  declare_estimator(Y ~ Z,  label = "No controls") +
  declare_estimator(Z ~ X,  label = "Condition")

The design implements estimation controlling and not controlling for $X$ and also keeps track of the results of a test for the relation between $Z$ and $X$.

1.2.12 Simulations

We simulate with many simulations over a range of designs

simulations <- 
  list(design |> redesign(a = 0, b = 0),  design |> redesign(a = 1, b = 0),  design |> redesign(a = 0, b = 1)) |>
  simulate_design(sims = 20000)

1.2.13 Standard errors

We see the standard errors are larger when you control in cases in which the control is not predictive of the outcome and it is correlated with the treatment. Otherwise they can be smaller.

See Mutz et al

1.2.14 Errors

We also see “conditional bias” when we do not control: where the distribution of errors depends on the correlation with the covariate.

Errors

1.2.15 Estimands

Puzzle: Does the sample average treatment effect on the treated depend on the covariate balance?

1.3 Doubly robust estimation

1.3.1 Doubly robust estimation

Doubly robust estimation combines:

A model for how the covariates predict the potential outcomes
A model for how the covariates predict assignment propensities

Using both together to estimate potential outcomes using propensity weighting lets you do well even if either model is wrong.

Each part can be done using nonparameteric methods resulting in an overall semi-parametric procedure.

$\pi(Z) = \Pr(Z=1|X)$: Estimate $\hat\pi$
$Y_z = \mathbb{E}[Y|Z=z, X]$: Estimate $\hat{Y}_z$
Estimate of causal effect: $\frac{1}{n}\sum_{i=1}^n\left(\left(\frac{Z_i}{\hat{\pi}_i}(Y_i - \hat{Y}_{i1}\right) - \left(\frac{1-Z_i}{1-\hat{\pi}_i}(Y_i - \hat{Y}_{i0}\right) + \left(\hat{Y}_{i1} - \hat{Y}_{i0}\right) \right)$

1.3.2 Doubly robust estimation

Estimate of causal effect: $\frac{1}{n}\sum_{i=1}^n\left(\left(\frac{Z_i}{\hat{\pi}_i}(Y_i - \hat{Y}_{i1}\right) - \left(\frac{1-Z_i}{1-\hat{\pi}_i}(Y_i - \hat{Y}_{i0}\right) + \left(\hat{Y}_{i1} - \hat{Y}_{i0}\right) \right)$
Note that if $\hat{Y}_{iz}$ are correct then the first parts drop out and we we get the right answer.
So if you can impute the potential outcomes, you are good (though hardly surprising)

1.3.3 Doubly robust estimation

More subtly say the $\hat{\pi}$s are correct, but your imputations are wrong; then we again have an unbiased estimator.

To see this imagine with probability $\pi$ we assign unit 1 to treatment and 2 to control (otherwise 1 to control and 2 to treatment).

Then our expected estimate is:

$\frac12\pi\left(\left(\frac{1}{\pi}(Y_{11} - \hat{Y}_{11}\right) - \left(\frac{1}{\pi}(Y_{20} - \hat{Y}_{20}\right) \right) + (1-\pi)\left(\left(\frac{1}{1-\pi}(Y_{21} - \hat{Y}_{21}\right) - \left(\frac{1}{1-\pi}(Y_{10} - \hat{Y}_{10}\right) \right) + \left(\hat{Y}_{11} - \hat{Y}_{20}\right) + \left(\hat{Y}_{21} - \hat{Y}_{10}\right)$

$\frac12\left(Y_{11} - Y_{10} + Y_{21}- Y_{20} +\pi\left(\left(\frac{1}{\pi}( - \hat{Y}_{11}\right) - \left(\frac{1}{\pi}( - \hat{Y}_{20}\right) \right) + (1-\pi)\left(\left(\frac{1}{1-\pi}( - \hat{Y}_{21}\right) - \left(\frac{1}{1-\pi}(- \hat{Y}_{10}\right) \right)\right) + \left(\hat{Y}_{11} - \hat{Y}_{20}\right) + \left(\hat{Y}_{21} - \hat{Y}_{10}\right)$

$\frac12\left(Y_{11} - Y_{10} + Y_{21}- Y_{20}\right)$

Robins, Rotnitzky, and Zhao (1994)

1.3.4 Doubly robust estimation illustration

Consider this data (with confounding):

# df with true treatment effect of 1 
# (0.5 if race = 0; 1.5 if race = 1)

df <- fabricatr::fabricate(
  N = 5000,
  class = sample(1:3, N, replace = TRUE),
  race = rbinom(N, 1, .5),
  Z = rbinom(N, 1, .2 + .3*race),
  Y = .5*Z + race*Z + class + rnorm(N),
  qsmk = factor(Z),
  class = factor(class),
  race = factor(race)
)

1.3.5 Simple approaches

Naive regression produces biased estimates, even with controls. Lin regression gets the right result however.

# Naive
lm_robust(Y ~ Z, data = df)$coefficients[["Z"]]

[1] 1.229234

# OLS with controls
lm_robust(Y ~ Z + class + race, data = df)$coefficients[["Z"]]

[1] 1.139115

# Lin
lm_lin(Y ~ Z,  ~ class + race, data = df)$coefficients[["Z"]]

[1] 1.017271

1.3.6 Doubly robust estimation

drtmle is an R package that uses doubly robust estimation to compute “marginal means of an outcome under fixed levels of a treatment.”

library(SuperLearner)
library(drtmle)
drtmle_fit <- drtmle(
  W = df |> select(race, class), 
  A = df$Z, 
  Y = df$Y, 
  SL_Q = c("SL.glm", "SL.mean", "SL.glm.interaction"),
  SL_g = c("SL.glm", "SL.mean", "SL.glm.interaction"),
  SL_Qr = "SL.glm",
  SL_gr = "SL.glm", 
  maxIter = 1
)

1.3.7 Doubly robust estimation

# "Marginal means"
drtmle_fit$drtmle$est

[1] 2.990577 1.972552

# Effects
ci(drtmle_fit, contrast = c(-1,1))

$drtmle
                   est    cil    ciu
E[Y(0)]-E[Y(1)] -1.018 -1.082 -0.954

wald_test(drtmle_fit, contrast = c(-1,1))

$drtmle
                       zstat pval
H0:E[Y(0)]-E[Y(1)]=0 -31.203    0

Resource: https://muse.jhu.edu/article/883477

1.3.8 Assessing performance

Challenge: Use DeclareDesign to compare performance of drtmle and lm_lin

1.4 Matching

1.4.1 Core idea

The matching idea can be seen as an application of the backdoor criterion:

Seek features $X$ that block backdoor paths from $Z$ to $Y$ and estimate:

\[\sum_x p(x) \mathbb E[Y | Z = 1, X = x] - \mathbb E[Y | Z = 0, X = x]\]

This, fundamentally, is a blocked differences in means. The intuitions from the discussion of Lin estimation applies here.

1.4.2 ATT Wrinkle

In practice matching differs from simply controlling for covariates (with interactions) insofar matches are (often) sought specifically for treated units.

Thus for example in data like this:

Z	X	Y
0	1	1
0	2	2
1	1	3
1	1	4

we would not make use of the second observation. Indeed it is not even part of our estimand.

1.4.3 ATT Wrinkle

More formally if $M_i$ are matches for each unit $i$ of $n_1$ treated units, our estimand is:

\[\tau_{ATT} = \frac{1}{n_1} \sum_{i: Z_i = 1}\left(Y_i(1) - Y_i(0)\right) \]

and we estimate:

\[\hat\tau_{ATT} =\frac{1}{n_1} \sum_{i: Z_i = 1}\left(Y_i - \frac{1}{|M_i|}\sum_{j \in M_i}Y_j\right) \]

1.4.4 ATE?

In principle however one could maintain a focus on ATE and find matches both for treated and for control units.

The commitment to the ATT then is not cooked in. See Stuart and Green (2008)

Example below.

1.4.5 Variations

Exact matching. Great if possible: use as matches units that exactly agree on all covariates
Coarsened exact matching. Transparent and easy: use as matches units that exactly agree on all coarsened covariates (i.e. nearly agree on covariates)
Distance matching e.g. Mahalanobis, nearest neighbor: use as matches units that are close in the multidimensional covariate space
Propensity matching. Clever but more model dependent: use as matches units that, baed ono covariates, have similar estimated probabilities of being in treatment

1.4.6 Exact matching illustration

matching_function <- 
  function(data, formula = "Z ~ X", method = "exact", ...) { 
    matched <- as.formula(formula) |> 
      MatchIt::matchit(method = method, data = data, ...) 
    MatchIt::match.data(matched) 
  }

matching_method = "exact"

design <-
  declare_model(
    N = 20, 
    U = rnorm(N), 
    X = sample(c(.2, .4, .6, .8), N, replace = TRUE),
    Z = rbinom(N, 1, prob = X),
    Y_Z_0 = 0.2 * X + U,
    Y_Z_1 = Y_Z_0 + 0.5,
    Y = reveal_outcomes(Y~Z)
  ) + 
  declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) +
  declare_step(handler = matching_function, method = matching_method) +
  declare_estimator(Y ~ Z, weights = weights, label = "Matched")

1.4.7 Investigate

set.seed(1000)
data <- draw_data(design)
dim(data)

[1] 15  9

data|> group_by(X,Z) |> summarize(Y = mean(Y), n = n()) |> kable()

X	Z	Y	n
0.2	0	-0.1203946	2
0.2	1	-0.7960410	1
0.4	0	-0.4513312	5
0.4	1	1.2997507	1
0.6	0	-0.1495231	4
0.6	1	-0.0433644	2

1.4.8 Investigate

get_estimates(design,  data) |> kable()

estimator	term	estimate	std.error	statistic	p.value	conf.low	conf.high	df	outcome
Matched	Z	0.3219382	0.5401371	0.5960305	0.5613898	-0.8449571	1.488833	13	Y

1.4.9 By hand

sets <- 
  data|> group_by(X,Z) |> summarize(Y = mean(Y), n = n()) |>
  group_by(X) |> summarize(diff = Y[2] - Y[1], n = n[2])

sets |> kable()

X	diff	n
0.2	-0.6756465	1
0.4	1.7510819	1
0.6	0.1061586	2

weighted.mean(sets$diff, sets$n)

[1] 0.3219382

1.4.10 Alternative method

Providing alternative arguments to the matching function which then get passed along yo MatchIt

design <-
  declare_model(
    N = 20, 
    U = rnorm(N), 
    X = sample(c(.2, .4, .6, .8), N, replace = TRUE),
    Z = rbinom(N, 1, prob = X),
    Y_Z_0 = 0.2 * X + U,
    Y_Z_1 = Y_Z_0 + 0.5,
    Y = reveal_outcomes(Y~Z)
  ) + 
  declare_inquiry(ATT = mean(Y_Z_1[Z==1] - Y_Z_0[Z==1])) +
  declare_step(handler = matching_function, 
               method =  "full",
               distance = "glm", link = "probit") +
  declare_estimator(Y ~ Z, weights = weights, label = "Matched")

draw_estimates(design) |> kable()

estimator	term	estimate	std.error	statistic	p.value	conf.low	conf.high	df	outcome
Matched	Z	0.4901459	0.9547857	0.5133569	0.6139452	-1.515784	2.496076	18	Y

1.4.11 Matching estimator for `DeclareDesign`

For flexibility (e.g if you want to add multiple estimators) we can turn the matching and estimation into a single estimation function:

matching_estimator <- 
  function(data, match_formula = "Z ~ X", analysis_formula = "Y ~ Z", treatment = "Z", method = "exact", ...) { 
    matched <- as.formula(match_formula) |> 
      MatchIt::matchit(method = method, data = data, ...) 
    
    data <- MatchIt::match.data(matched) 
    
    lm_robust(as.formula(analysis_formula), 
              data = data, weights = weights) |> 
      tidy() |> filter(term == treatment)
  }


design <- design[[1]] + design[[2]] + 
  declare_estimator(handler = label_estimator(matching_estimator), 
                    match_formula = "Z ~ X", analysis_formula = "Y ~ Z",
                    method =  "full", distance = "glm", link = "probit",
                    inquiry = "ATT")

design |> draw_estimates() |> kable()

estimator	term	estimate	std.error	statistic	p.value	conf.low	conf.high	df	outcome	inquiry
estimator	Z	0.1656742	0.5254799	0.3152816	0.7561737	-0.9383182	1.269667	18	Y	ATT

1.4.12 Efficiency considerations

Here there is no actual confounding and there are efficiency losses from matching. Why?

design <-
  declare_model(
    N = 20, 
    U = rnorm(N), 
    X = sample(c(.2, .4, .6, .8), N, replace = TRUE),
    Z = rbinom(N, 1, prob = .5),
    Y_Z_0 = 0.2 * X + U,
    Y_Z_1 = Y_Z_0 + 0.5,
    Y = reveal_outcomes(Y~Z)
  ) + 
  declare_inquiry(
    ATE = mean(Y_Z_1 - Y_Z_0), ATT = mean(Y_Z_1 - Y_Z_0)) +
  declare_estimator(Y ~ Z, label = "Unmatched", inquiry = "ATE") +
  declare_estimator(handler = label_estimator(matching_estimator), 
                    label = "Matched", inquiry = "ATT")

diagnose_design(design) |> reshape_diagnosis() |> select(Estimator, RMSE) |> kable()

Estimator	RMSE
Unmatched	0.49
	(0.01)
Matched	0.53
	(0.02)

1.4.13 Puzzle

Say that in fact $X$ does not affect $Z$; then is there anything at stake in going from the ATE to the ATT?

1.4.14 For real

The MatchIt package (Ho et al. 2011) provides multiple methods and diagnostic tools to implement different strategies.

The below example of full propensity matching is from their vignettes

Steps:

Choose covariates (close the backdoor)
Choose matching method
Check matches
Maybe revise method
Implement

1.4.15 For real

“Lalonde” data has been used by many researchers to assess the merits of different strategies. It is used ot assess job training effect on earnings. We have before and after earnings data and various characteristics.

The basic analysis would yield:

lm_robust(re78 ~ treat, data = lalonde) |> tidy() |> kable()

term	estimate	std.error	statistic	p.value	conf.low	conf.high	df	outcome
(Intercept)	6984.1697	352.1654	19.8320697	0.0000000	6292.570	7675.7691	612	re78
treat	-635.0262	677.1954	-0.9377297	0.3487532	-1964.935	694.8824	612	re78

1.4.16 For real

But there are clear concerns regarding confounding:

lm_robust(treat ~ age + educ + race + married + nodegree + re74 +  re75, data = lalonde) |> 
  tidy() |> kable(digits = 2)

term	estimate	std.error	statistic	p.value	conf.low	conf.high	df	outcome
(Intercept)	0.34	0.13	2.67	0.01	0.09	0.59	605	treat
age	0.00	0.00	1.34	0.18	0.00	0.01	605	treat
educ	0.02	0.01	2.62	0.01	0.01	0.04	605	treat
racehispan	-0.44	0.05	-8.10	0.00	-0.55	-0.34	605	treat
racewhite	-0.53	0.04	-13.95	0.00	-0.60	-0.45	605	treat
married	-0.09	0.04	-2.60	0.01	-0.16	-0.02	605	treat
nodegree	0.10	0.04	2.20	0.03	0.01	0.19	605	treat
re74	0.00	0.00	-2.26	0.02	0.00	0.00	605	treat
re75	0.00	0.00	0.75	0.46	0.00	0.00	605	treat

1.4.17 For real: Match

Here we match using a probit mode to predict assignment probabilities:

pm <- 
  matchit(treat ~ age + educ + race + married + nodegree + re74 + re75,
          data = lalonde,
          method = "full",
          distance = "glm",
          link = "probit")
pm_data <- MatchIt::match.data(pm) 

pm_data |> arrange(subclass) |> head() |> kable(digits = 2)

	treat	age	educ	race	married	nodegree	re74	re75	re78	distance	weights	subclass
NSW1	1	37	11	black	1	1	0.00	0.00	9930.05	0.64	1.00	1
PSID69	0	30	17	black	0	0	17827.37	5546.42	14421.13	0.63	2.32	1
NSW10	1	33	12	white	1	0	0.00	0.00	12418.07	0.04	1.00	2
PSID13	0	34	8	white	1	1	8038.87	11404.35	5486.80	0.04	0.06	2
PSID16	0	42	0	hispan	1	1	2797.83	10929.92	9922.93	0.05	0.06	2
PSID26	0	27	7	white	1	1	3064.29	8461.07	11149.45	0.04	0.06	2

“matching as pre-processing”

1.4.18 For real: Inspect

plot(summary(pm))

Huge gains in balance

1.4.19 For real: Estimate

lm(re78 ~ treat * (age + educ + race + married + nodegree + re74 + re75),
          data = pm_data, weights = weights) |>
  
  marginaleffects::avg_comparisons(
    variables = "treat",
    vcov = ~subclass, 
    newdata = subset(treat == 1))


 Estimate Std. Error    z Pr(>|z|)   S 2.5 % 97.5 %
     1977        704 2.81  0.00501 7.6   596   3357

Term: treat
Type:  response 
Comparison: 1 - 0

1.4.20 Intuition: What are these weights?

pm_data <- pm_data |> 
     mutate(
       propensity = glm(
         treat ~ age + educ + race + married + nodegree + re74 + re75,
         data = pm_data, family = binomial(link = "probit"))$fitted.values)

pm_data |> 
  ggplot(aes(propensity, weights, color = subclass, shape = factor(treat))) +  geom_point() +
  scale_y_continuous(trans = "log") + theme(legend.position = "none")

Weights on treated units = 1. If propensity is low, treated units weighted more than control.

1.4.21 Intuition: Comparison with IPW (by hand)

lm_robust(re78 ~ treat * (age + educ + race + married + nodegree + re74 + re75 + ipw),
   data = pm_data |> 
     mutate(ipw = 1/(treat*propensity + (1-treat)*(1-propensity))) |>
     filter(ipw <= 20), 
   weights = ipw) |>
  
  marginaleffects::avg_comparisons(
    variables = "treat",
    newdata = subset(treat == 1))


 Estimate Std. Error    z Pr(>|z|)   S 2.5 % 97.5 %
     1854        852 2.18   0.0296 5.1   184   3524

Term: treat
Type:  response 
Comparison: 1 - 0

1.5 Diff in diff

Key idea: the evolution of units in the control group allow you to impute what the evolution of units in the treatment group would have been had they not been treated

1.5.1 Logic

We have group $A$ that enters treatment at some point and group $B$ that never does

The estimate:

\[\hat\tau = (\mathbb{E}[Y^A | post] - \mathbb{E}[Y^A | pre]) -(\mathbb{E}[Y^B | post] - \mathbb{E}[Y^B | pre])\] (how different is the change in $A$ compared to the change in $B$?)

can be written:

\[\hat\tau = (\mathbb{E}[Y_1^A | post] - \mathbb{E}[Y_0^A | pre]) -(\mathbb{E}[Y_0^B | post] - \mathbb{E}[Y_0^B | pre])\]

1.5.2 Logic

\[\hat\tau = (\mathbb{E}[Y_1^A | post] - \mathbb{E}[Y_0^A | pre]) -(\mathbb{E}[Y_0^B | post] - \mathbb{E}[Y_0^B | pre])\] Adding and subtract $\mathbb{E}[Y_0^A | post]$:

\[\hat\tau = (\mathbb{E}[Y_1^A | post] - \mathbb{E}[Y_0^A | post]) + ((\mathbb{E}[Y_0^A | post] - \mathbb{E}[Y_0^A | pre]) -(\mathbb{E}[Y_0^B | post] - \mathbb{E}[Y_0^B | pre]))\]

Cleaning up:

\[\hat\tau_{ATT} = \tau_{ATT} + \text{Difference in trends}\]

1.5.3 Simplest case

n_units <- 2
design <- 
  declare_model(
    unit = add_level(N = n_units, I = 1:N),
    time = add_level(N = 6, T = 1:N, nest = FALSE),
    obs = cross_levels(by = join_using(unit, time))) +
  declare_model(potential_outcomes(Y ~ I + T^.5 + Z*T)) +
  declare_assignment(Z = 1*(I>(n_units/2))*(T>3)) +
  declare_measurement(Y = reveal_outcomes(Y~Z)) + 
  declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0),
                  ATT = mean(Y_Z_1[Z==1] - Y_Z_0[Z==1])) +
  declare_estimator(Y ~ Z, label = "naive") + 
  declare_estimator(Y ~ Z + I, label = "FE1") + 
  declare_estimator(Y ~ Z + as.factor(T), label = "FE2") + 
  declare_estimator(Y ~ Z + I + as.factor(T), label = "FE3")

1.5.4 Diagnosis

Here only the two way fixed effects is unbiased and only for the ATT.

The ATT here is averaging over effects for treated units (later periods only). We know nothing about the size of effects in earlier periods when all units are in control!

design |> diagnose_design()

Inquiry	Estimator	Bias
ATE	FE1	2.25
ATE	FE2	6.50
ATE	FE3	1.50
ATE	naive	5.40
ATT	FE1	0.75
ATT	FE2	5.00
ATT	FE3	0.00
ATT	naive	3.90

1.5.5 The classic graph

design |> 
  draw_data() |>
  ggplot(aes(T, Y, color = unit)) + geom_line() +
       geom_point(aes(T, Y_Z_0)) + theme_bw()

1.5.6 Extends to multiple units easily

design |> redesign(n_units = 10) |> diagnose_design()

Inquiry	Estimator	Bias
ATE	FE1	2.25
ATE	FE2	6.50
ATE	FE3	1.50
ATE	naive	5.40
ATT	FE1	0.75
ATT	FE2	5.00
ATT	FE3	0.00
ATT	naive	3.90

1.5.7 Extends to multiple units easily

design |> 
  redesign(n_units = 10) |>  
  draw_data() |> 
  ggplot(aes(T, Y, color = unit)) + geom_line() +
       geom_point(aes(T, Y_Z_0)) + theme_bw()

1.5.8 In practice

Need to defend parallel trends
Most typically using an event study: predicting trends in a unit based on its history
Sometimes: report balance between treatment and control groups in covariates
Placebo leads and lags

1.5.9 Heterogeneity

Things get much more complicated when there is
1. heterogeneous timing in treatment take up and
2. heterogeneous effects
It’s only recently been appreciated how tricky things can get
But we already have an intuition from our analysis of trials with heterogeneous assignment and heterogeneous effects:
in such cases fixed effects analysis weights stratum level treatment effects by the variance in assignment to treatment
something similar here

1.5.10 Staggared assignments

Just two units assigned at different times:

trend = 0

design <- 
  declare_model(
    unit = add_level(N = 2, ui = rnorm(N), I = 1:N),
    time = add_level(N = 6, ut = rnorm(N), T = 1:N, nest = FALSE),
    obs = cross_levels(by = join_using(unit, time))) +
  declare_model(
    potential_outcomes(Y ~ trend*T + (1+Z)*(I == 2))) +
  declare_assignment(Z = 1*((I == 1) * (T>3) + (I == 2) * (T>5))) +
  declare_measurement(Y = reveal_outcomes(Y~Z), 
                      I_c = I - mean(I)) +
  declare_inquiry(mean(Y_Z_1 - Y_Z_0)) +
  declare_estimator(Y ~ Z, label = "1. naive") + 
  declare_estimator(Y ~ Z + I, label = "2. FE1") + 
  declare_estimator(Y ~ Z + as.factor(T), label = "3. FE2") + 
  declare_estimator(Y ~ Z + I + as.factor(T), label = "4. TWFE") + 
  declare_estimator(Y ~ Z*I_c + as.factor(T), label = "5. Sat")

1.5.11 Staggared assignments diagnosis

draw_data(design) |> mutate(I = factor(I)) |>
  ggplot(aes(T, Y, color = I, label = Z)) + geom_text()

1.5.12 Staggared assignments diagnosis

Estimator	Mean Estimand	Mean Estimate
1. naive	0.50	-0.12
	(0.00)	(0.00)
2. FE1	0.50	0.36
	(0.00)	(0.00)
3. FE2	0.50	-1.00
	(0.00)	(0.00)
4. TWFE	0.50	0.25
	(0.00)	(0.00)
5. Sat	0.50	0.50
	(0.00)	(0.00)

1.5.13 Where do these numbers come?

The estimand is .5 – this comes from weighting the effect for unit 1 (0) and the effect for unit 2 (1) equally
The naive estimate is wildly off because it does not take into account that units with different treatment shares have different average levels in outcomes

1.5.14 Where do these numbers come?

The estimate when we control for unit is 0.36: this comes from weighting the unit-stratum level effects according to the variance of assignment to each stratum:

design |> draw_data() |> group_by(unit) |> summarize(var = mean(Z)*(1-mean(Z))) |>
  mutate(weight = var/sum(var)) |> kable(digits = 2)

unit	var	weight
1	0.25	0.64
2	0.14	0.36

1.5.15 Where do these numbers come?

The estimate when we control for time is -1: this comes from weighting the time-stratum level effects according to the variance of assignment to each stratum
it weights periods 4 and 5 only and equally, yielding the difference in outcomes for unit 1 in treatment (0) and group 2 in control (1)

design |> draw_data()  |> group_by(time) |> summarize(var = mean(Z)*(1-mean(Z))) |>
  mutate(weight = var/sum(var)) |> kable(digits = 2)

time	var	weight
1	0.00	0.0
2	0.00	0.0
3	0.00	0.0
4	0.25	0.5
5	0.25	0.5
6	0.00	0.0

1.5.16 Where do these numbers come?

The estimate when we control for time and unit is 0.25
This is actually a lot harder to interpret:
We can figure out what it is from the “Goodman-Bacon decomposition” in Goodman-Bacon (2021)

1.5.17 Where do these numbers come?

1.5.18 Where do these numbers come?

In this case we can think of our data having the following structure:

	y1	y2
pre	0	1
mid	0	1
post	0	2

the mid to pre diff in diff is (0-0) - (1-1) = 0 (group 2 serves as control)
the post to mid diff in diff is (2-1) - (0-0) = 1 (group 1 serves as control though already in treatment!)

TWFE gives a weighted average of these two, putting a 3/4 weight on the first and a 1/4 weight on the second

1.5.19 Two way fixed effects

Specifically:

\[\hat\beta^{DD} = \mu_{12}\hat\beta^{2 \times 2, 1}_{12} + (1-\mu_{12})\hat\beta^{2 \times 2, 2}_{12}\]

where $\mu_{12} = \frac{1-\overline{Z_1}}{1-(\overline{Z_1}-\overline{Z_2})} = \frac{1-\frac36}{1-\left(\frac36-\frac16\right)} = \frac34$

\[\frac34\hat\beta^{2 \times 2, 1}_{12} + \frac14 \hat\beta^{2 \times 2, 2}_{12}\] (weights formula from WP version)

1.5.20 Two way fixed effects

And:

$\hat\beta^{2 \times 2, 1}_{12}$ is $(\overline{y}_1^{MID(1,2)} - \overline{y}_1^{PRE(1)}) - (\overline{y}_2^{MID(1,2)} - \overline{y}_2^{PRE(1)})$

which in the simple example without time trends is $(0 - 0) - (1-1) = 0$

$\hat\beta^{2 \times 2, 2}_{12}$ is $(\overline{y}_2^{POST(2)} - \overline{y}_2^{MID(1,2)}) - (\overline{y}_1^{POST(2)} - \overline{y}_1^{MID(1,2)})$

which is $(2 - 1) - (0 - 0) = 1$

1.5.21 A problem

So quite complex weighting of different comparisons

Units effectively get counted as both treatment and control units for different comparisons
Treated units counted as control units
Effectively negative weights on some quantities
Possible to have very poorly performing estimates

1.5.22 Solutions

Involve better specification of estimands
Use of comparisons directly relevant for the estimands
Imputation of control outcomes in treated units using data from appropriate control units only and then targetting estimands directly (Borusyak, Jaravel, and Spiess 2021)
Particular solution: focus on the effect of treatment at the time of first treatment / or time of switching: this usually involves a no carryover assumption (De Chaisemartin and d’Haultfoeuille 2020) also Imai and Kim (2021)

1.5.23 Solutions

See excellent review by Roth et al. (2023)

knitr::include_graphics("assets/didtools.jpg")

1.5.24 Chaisemartin and d’Haultfoeuille (2020).

library(DIDmultiplegt)
library(rdss)
design <- 
  declare_model(
    unit = add_level(N = 4, I = 1:N),
    time = add_level(N = 8, T = 1:N, nest = FALSE),
    obs = cross_levels(by = join_using(unit, time),
                       potential_outcomes(Y ~ T + (1 + Z)*I))) +
  declare_assignment(Z = 1*(T > (I + 4))) +
  declare_measurement(
    Y = reveal_outcomes(Y~Z),
    Z_lag = lag_by_group(Z, groups = unit, n = 1, order_by = T)) +
  declare_inquiry(
    ATT_switchers = mean(Y_Z_1 - Y_Z_0), 
    subset = Z == 1 & Z_lag == 0 & !is.na(Z_lag)) +
  declare_estimator(
    Y = "Y",  G = "unit",  T = "T",  D = "Z", mode = "old", 
    handler = label_estimator(rdss::did_multiplegt_tidy),
    inquiry = c("ATT_switchers"),
    label = "chaisemartin"
  )

1.5.25 Chaisemartin and d’Haultfoeuille (2020)

Note the inquiry

run_design(design) |> kable(digits = 2)

inquiry	estimand	estimator	estimate
ATT_switchers	2	chaisemartin	2

1.5.26 Triple differencing

A response to concerns that double differencing is not enough is to triple difference
When you think that there may be a violation of parallel tends but you have other outcomes that would pick up the same difference in trends
See Olden and Møen (2022)

1.5.27 Triple differencing

You are interested in the effects of influx of refugees on right wing voting

You have (say more conservative) states with no refugees at any period
You have (say more liberal) states with refugees post 2016 only

You want to do differences in differences comparing these states before and after

However you worry that things change differntially in the conservative and liberal states: no parallel trends
but you can identify areas within states that are more or less likely to be exposed and compare differnces in differences in the exposed and unexposed groups.

1.5.28 Triple differencing

So:

Two types of states: $L \in \{0,1\}$, only $L=1$ types get refugee influx
Two time periods: $Post \in \{0,1\}$, refugee influx occurs in period $Post = 1$
Two groups: $B \in \{0,1\}$, $B=1$ types affected by refugee influx

\[Y = \beta_0 + \beta_1 L + \beta_2 B + \beta_3 Post + \beta_4 LB + \beta_5 L Post + \beta_6 B Post + \beta_7L B Post + \epsilon\]

1.5.29 Triple differencing

\[\frac{\partial ^3Y}{\partial L \partial B \partial Post} = \beta_7\]

1.5.30 Can we not just condition on the $B=1$ types?

The level among the $B=1$ types is:

\[Y = \beta_0 + \beta_1 L + \beta_2 + \beta_3 Post + \beta_4 L + \beta_5 L Post + \beta_6 Post + \beta_7L Post + \epsilon\] If you did simple before / after differences among the $B$ types you would get

\[\Delta Y| L = 1, B = 1 = \beta_3 + \beta_5 + \beta_6 + \beta_7\] \[\Delta Y| L = 0, B = 1 = \beta_3 + \beta_6\]

1.5.31 Can we not just condition on the $B=1$ types?

And so if you differenced again you would get:

\[\Delta^2 Y| B = 1 = \beta_5 + \beta_7\] So the problem is that you have $\beta^5$ in here which corresponds exactly to how $L$ states change over time.

1.5.32 Triple

But we can figure out $\beta_5$ by doing a diff in diff among the $B$’s.

\[Y|B = 0 = \beta_0 + \beta_1 L + \beta_3 Post + \beta_5 L Post\]

\[\Delta^2 Y| B = 0 = \beta_5\]

1.5.33 Easier to swallow?

The identifying assumption is that absent treatment the differences in trends between $L=0$ and $L=1$ would be the same for units with $B=0$ and $B=1$.
See equation 5.4 in Olden and Møen (2022)

\[ \left(E[Y_0|L=1, B=1, {\textit {Post}}=1] - E[Y_0|L=1, B=1, {\textit {Post}}=0]\right) \\ \quad - \ \left(E[Y_0|L=1, B=0, {\textit {Post}}=1] - E[Y_0|L=1, B=0, {\textit {Post}}=0]\right) \\ = \nonumber \\ \left(E[Y_0|L=0, B=1, {\textit {Post}}=1] - E[Y_0|L=0, B=1, {\textit {Post}}=0]\right) \\ \quad - \ \left(E[Y_0|L=0, B=0, {\textit {Post}}=1] - E[Y_0|L=0, B=0, {\textit {Post}}=0]\right)\]

1.5.34 Easier to swallow?

In a sense this is one parallel trends assumption, not two
But there are four counterfactual quantities in this expression.

Puzzle: Is it possible to have an effect identified by a difference in difference but incorrectly by a triple difference design?

Borusyak, Kirill, Xavier Jaravel, and Jann Spiess. 2021. “Revisiting Event Study Designs: Robust and Efficient Estimation.” arXiv Preprint arXiv:2108.12419.

Cinelli, Carlos, Andrew Forney, and Judea Pearl. 2022. “A Crash Course in Good and Bad Controls.” Sociological Methods & Research, 00491241221099552.

De Chaisemartin, Clément, and Xavier d’Haultfoeuille. 2020. “Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects.” American Economic Review 110 (9): 2964–96.

Goodman-Bacon, Andrew. 2021. “Difference-in-Differences with Variation in Treatment Timing.” Journal of Econometrics 225 (2): 254–77.

Ho, Daniel, Kosuke Imai, Gary King, and Elizabeth A Stuart. 2011. “MatchIt: Nonparametric Preprocessing for Parametric Causal Inference.” Journal of Statistical Software 42: 1–28.

Imai, Kosuke, and In Song Kim. 2021. “On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data.” Political Analysis 29 (3): 405–15.

Imbens, Guido W, and Donald B Rubin. 2015. Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press.

Lin, Winston. 2012. “Agnostic Notes on Regression Adjustments to Experimental Data: Reexamining Freedman’s Critique.” arXiv Preprint arXiv:1208.2301.

Olden, Andreas, and Jarle Møen. 2022. “The Triple Difference Estimator.” The Econometrics Journal 25 (3): 531–53.

Robins, James M, Andrea Rotnitzky, and Lue Ping Zhao. 1994. “Estimation of Regression Coefficients When Some Regressors Are Not Always Observed.” Journal of the American Statistical Association 89 (427): 846–66.

Roth, Jonathan, Pedro HC Sant’Anna, Alyssa Bilinski, and John Poe. 2023. “What’s Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature.” Journal of Econometrics.

Stuart, Elizabeth A, and Kerry M Green. 2008. “Using Full Matching to Estimate Causal Effects in Nonexperimental Studies: Examining the Relationship Between Adolescent Marijuana Use and Adult Outcomes.” Developmental Psychology 44 (2): 395.