The key idea from Pearl is that you want to find a set of variables such that when you condition on these you get what you would get if you had random assignment—or, used a do operation.
1.1.2 When to condition? What to condition on?
Intuition:
You could imagine creating a “mutilated” graph by removing all the arrows leading out of X
Then select a set of variables, \(Z\), such that \(X\) and \(Y\) are d-separated by \(Z\) on the the mutilated graph
When you condition on these you are making sure that any covariation between \(X\) and \(Y\) is covariation that is due to the effects of \(X\)
1.1.3 Illustration
1.1.4 Illustration: Remove paths out
1.1.5 Illustration: Block backdoor path
1.1.6 Illustration: Why not like this?
1.1.7 Backdoor Criterion: (Pearl 1995)
The backdoor criterion is satisfied by \(Z\) (relative to \(X\), \(Y\)) if:
No node in \(Z\) is a descendant of \(X\)
\(Z\) blocks every backdoor path from \(X\) to \(Y\) (i.e. every path that contains an arrow into \(X\))
In that case you can identify the effect of \(X\) on \(Y\) by conditioning on \(Z\):
\[P(Y=y | \hat{x}) = \sum_z P(Y=y| X = x, Z=z)P(z)\] (This is eqn 3.19 in Pearl (2000))
1.1.8 Backdoor Criterion: (Pearl 1995)
\[P(Y=y | \hat{x}) = \sum_z P(Y=y| X = x, Z=z)P(z)\]
No notion of a linear control or anything like that; idea really is like blocking: think lots of discrete data and no missing patterns
Note this is a formula for a (possibly counterfactual) level; a counterfactual difference would be given in the obvious way by:
\[P(Y=y | \hat{x}) - P(Y=y | \hat{x}')\]
1.1.9 Backdoor Proof
Following Pearl (2009), Chapter 11. Let \(T\) denote the set of parents of \(X\): \(T := pa(X)\), with (possibly vector valued) realizations \(t\). These might not all be observed.
If the backdoor criterion is satisfied, we have:
\(Y\) is independent of \(T\), given \(X\) and observed data, \(Z\) (since \(Z\) blocks backdoor paths)
\(X\) is independent of \(Z\) given \(T\). (Since \(Z\) includes only nondescendents)
Key idea: The intervention level relates to the observational level as follows: \[p(y|\hat{x}) = \sum_{t\in T} p(t)p(y|x, t)\]
Think of this as fully accounting for the (possibly unobserved) causes of \(X\), \(T\)
1.1.10 Backdoor Proof
We want to get to:
\[p(y|\hat{x}) = \sum_{t\in T} p(t)p(y|x, t)\]
But of course we do not observe \(T\), rather we observe \(Z\). So we now need to write everything in terms of \(Z\) rather than \(T\).
\(p(z|x, t) = p(z|t)\) (since \(Z\) is upstream and blocked by parents, \(T\))
1.1.13 Guidelines
1.1.14 Example
Consider for example this data.
You randomly pair offerers and receivers in a dictator game (in which offerers decide how much of $1 to give to receivers).
Your population comes from two groups (80% Baganda and 20% Banyankole) so in randomly assigning partners you are randomly determining whether a partner is a coethnic or not.
You find that in non-coethnic pairings 35% is offered, in coethnic pairings 48% is offered.
Should you believe it?
1.1.15 Covariate Adjustment
Population: randomly matched Baganda (80% of pop) and Banyankole (20% of pop)
You find: in non-coethnic pairings 35% is offered, in coethnic pairings 48% is offered.
But a closer look at the data reveals…
To: Baganda
To: Banyankole
Offers by
Baganda
64%
16%
Banyankole
16%
4%
Figure 1: Number of Games
To: Baganda
To: Banyankole
Offers by
Baganda
50
50
Banyankole
20
20
Figure 2: Average Offers
So that’s a problem
1.1.16 Covariate Adjustment
Control?
With such data you might be tempted to ‘control’ for the covariate (here: ethnic group), using regression.
But, perhaps surprisingly, it turns out that regression with covariates does not estimate average treatment effects.
It does estimate an average of treatment effects, but specifically a minimum variance estimator, not necessarily an estimator of your estimand.
Instead you can use formula above for \(\hat{\tau}_{ATE}\) to estimate ATE
alternatively…
1.1.18 Covariate adjustment via saturated regression
Alternatively you can use propensity weights.
Alternatively you can use a regression that includes both the treatment and the treatment interacted with the covariates.
In practice this is best done by demeaning the covariates; doing this lets you read off the average effect from the main term. Key resource: Lin (2012)
You should have noticed that the logic for controlling for a covariate here is equivalent to the logic we saw for heterogeneous assignment propensities. These are really the same thing.
1.1.19 Covariate adjustment via saturated regression
Returning to prior example:
df <- fabricatr::fabricate(N =500, X =rep(0:1, N/2), Z =rbinom(N, 1, .2+ .3*X),Y =rnorm(N) + Z*X)lm_robust(Y ~ Z*X_c, data = df |>mutate(X_c = X -mean(X))) |>tidy() |>kable(digits =2)
term
estimate
std.error
statistic
p.value
conf.low
conf.high
df
outcome
(Intercept)
-0.05
0.06
-0.89
0.37
-0.17
0.06
496
Y
Z
0.47
0.11
4.32
0.00
0.26
0.68
496
Y
X_c
0.08
0.12
0.65
0.51
-0.15
0.30
496
Y
Z:X_c
0.85
0.22
3.91
0.00
0.42
1.28
496
Y
1.1.20 Covariate adjustment via saturated regression
It’s all good. But you need to match the estimator to the inquiry: demean for average marginal effects; do not demean for conditional marginal effects.
1.1.25 Recap
If you have different groups with different assignment propensities you can do any or all of these:
Blocked differences in means
Inverse propensity weighting
Saturated regression (Lin)
More… (coming)
You cannot (reliably):
Ignore the groups
Include them in a regression (without interactions)
1.2 To control or not control
Even “good” controls might not pay their way in practice:
When (and how) does controlling for covariates improve things and when does it make it worse
1.2.1 Considerations
Even though randomization ensures no bias, you may sometimes want to “control” for covariates in order to improve efficiency (see the discussion of blocking above).
Or you may have to take account of the fact that the assignment to treatment is correlated with a covariate (as above).
In observational work you might also figure out you have to control for a covariate to justify inferences (Refer to our discussion of the backdoor criteria)
1.2.2 Observational work
Observational motivation: Controls can provide grounds for identification
But recall – they can also destroy identification
For a great walk through of what you can draw from graphical models for the decision to control see:
Aside: these implications generally refer to use controls as covariates – e.g. by implementing blocked differences in means or similar. For a Bayesian model of the form used in CausalQueries the information from “bad controls” is used wisely.
1.2.3 Experimental work
Conditional Bias and Precision Gains from Controls
Experimental motivation: Controls can reduce noise and improve precision. This is an argument for using variables that are correlated with the output (not with the treatment).
1.2.4 Precision Gains from Controls
However: Introducing controls can create complications
As argued by Freedman (summary from Lin (2012)), we can get: “worsened asymptotic precision, invalid measures of precision, and small-sample bias”\(^*\)
These adverse effects are essentially removed with an interacted model
See discussions in Imbens and Rubin (2015) (7.6, 7.7) and especially Theorem 7.2 for the asymptotic variance of the estimator
\(^*\) though note that the precision concern does not hold when treatment and control groups are equally sized
The design implements estimation controlling and not controlling for \(X\) and also keeps track of the results of a test for the relation between \(Z\) and \(X\).
1.2.12 Simulations
We simulate with many simulations over a range of designs
simulations <-list(design |>redesign(a =0, b =0), design |>redesign(a =1, b =0), design |>redesign(a =0, b =1)) |>simulate_design(sims =20000)
1.2.13 Standard errors
We see the standard errors are larger when you control in cases in which the control is not predictive of the outcome and it is correlated with the treatment. Otherwise they can be smaller.
Challenge: Use DeclareDesign to compare performance of drtmle and lm_lin
1.4 Matching
1.4.1 Core idea
The matching idea can be seen as an application of the backdoor criterion:
Seek features \(X\) that block backdoor paths from \(Z\) to \(Y\) and estimate:
\[\sum_x p(x) \mathbb E[Y | Z = 1, X = x] - \mathbb E[Y | Z = 0, X = x]\]
This, fundamentally, is a blocked differences in means. The intuitions from the discussion of Lin estimation applies here.
1.4.2 ATT Wrinkle
In practice matching differs from simply controlling for covariates (with interactions) insofar matches are (often) sought specifically for treated units.
Thus for example in data like this:
Z
X
Y
0
1
1
0
2
2
1
1
3
1
1
4
we would not make use of the second observation. Indeed it is not even part of our estimand.
1.4.3 ATT Wrinkle
More formally if \(M_i\) are matches for each unit \(i\) of \(n_1\) treated units, our estimand is:
In principle however one could maintain a focus on ATE and find matches both for treated and for control units.
The commitment to the ATT then is not cooked in. See Stuart and Green (2008)
Example below.
1.4.5 Variations
Exact matching. Great if possible: use as matches units that exactly agree on all covariates
Coarsened exact matching. Transparent and easy: use as matches units that exactly agree on all coarsened covariates (i.e. nearly agree on covariates)
Distance matching e.g. Mahalanobis, nearest neighbor: use as matches units that are close in the multidimensional covariate space
Propensity matching. Clever but more model dependent: use as matches units that, baed ono covariates, have similar estimated probabilities of being in treatment
“Lalonde” data has been used by many researchers to assess the merits of different strategies. It is used ot assess job training effect on earnings. We have before and after earnings data and various characteristics.
The basic analysis would yield:
lm_robust(re78 ~ treat, data = lalonde) |>tidy() |>kable()
term
estimate
std.error
statistic
p.value
conf.low
conf.high
df
outcome
(Intercept)
6984.1697
352.1654
19.8320697
0.0000000
6292.570
7675.7691
612
re78
treat
-635.0262
677.1954
-0.9377297
0.3487532
-1964.935
694.8824
612
re78
1.4.11 For real
But there are clear concerns regarding confounding:
lm_robust(treat ~ age + educ + race + married + nodegree + re74 + re75, data = lalonde) |>tidy() |>kable(digits =2)
term
estimate
std.error
statistic
p.value
conf.low
conf.high
df
outcome
(Intercept)
0.34
0.13
2.67
0.01
0.09
0.59
605
treat
age
0.00
0.00
1.34
0.18
0.00
0.01
605
treat
educ
0.02
0.01
2.62
0.01
0.01
0.04
605
treat
racehispan
-0.44
0.05
-8.10
0.00
-0.55
-0.34
605
treat
racewhite
-0.53
0.04
-13.95
0.00
-0.60
-0.45
605
treat
married
-0.09
0.04
-2.60
0.01
-0.16
-0.02
605
treat
nodegree
0.10
0.04
2.20
0.03
0.01
0.19
605
treat
re74
0.00
0.00
-2.26
0.02
0.00
0.00
605
treat
re75
0.00
0.00
0.75
0.46
0.00
0.00
605
treat
1.4.12 For real: Match
Here we match using a probit mode to predict assignment probabilities:
Error in `mutate()`:
ℹ In argument: `ipw = 1/(treat * propensity + (1 - treat) * (1 -
propensity))`.
Caused by error:
! object 'propensity' not found
1.5 Diff in diff
Key idea: the evolution of units in the control group allow you to impute what the evolution of units in the treatment group would have been had they not been treated
1.5.1 Logic
We have group \(A\) that enters treatment at some point and group \(B\) that never does
The estimate:
\[\hat\tau = (\mathbb{E}[Y^A | post] - \mathbb{E}[Y^A | pre]) -(\mathbb{E}[Y^B | post] - \mathbb{E}[Y^B | pre])\] (how different is the change in \(A\) compared to the change in \(B\)?)
\[\hat\tau_{ATT} = \tau_{ATT} + \text{Difference in trends}\]
1.5.3 Simplest case
n_units <-2design <-declare_model(unit =add_level(N = n_units, I =1:N),time =add_level(N =6, T =1:N, nest =FALSE),obs =cross_levels(by =join_using(unit, time))) +declare_model(potential_outcomes(Y ~ I + T^.5+ Z*T)) +declare_assignment(Z =1*(I>(n_units/2))*(T>3)) +declare_measurement(Y =reveal_outcomes(Y~Z)) +declare_inquiry(ATE =mean(Y_Z_1 - Y_Z_0),ATT =mean(Y_Z_1[Z==1] - Y_Z_0[Z==1])) +declare_estimator(Y ~ Z, label ="naive") +declare_estimator(Y ~ Z + I, label ="FE1") +declare_estimator(Y ~ Z +as.factor(T), label ="FE2") +declare_estimator(Y ~ Z + I +as.factor(T), label ="FE3")
1.5.4 Diagnosis
Here only the two way fixed effects is unbiased and only for the ATT.
The ATT here is averaging over effects for treated units (later periods only). We know nothing about the size of effects in earlier periods when all units are in control!
design |>diagnose_design()
Inquiry
Estimator
Bias
ATE
FE1
2.25
ATE
FE2
6.50
ATE
FE3
1.50
ATE
naive
5.40
ATT
FE1
0.75
ATT
FE2
5.00
ATT
FE3
0.00
ATT
naive
3.90
1.5.5 The classic graph
design |>draw_data() |>ggplot(aes(T, Y, color = unit)) +geom_line() +geom_point(aes(T, Y_Z_0)) +theme_bw()
design |>redesign(n_units =10) |>draw_data() |>ggplot(aes(T, Y, color = unit)) +geom_line() +geom_point(aes(T, Y_Z_0)) +theme_bw()
1.5.8 In practice
Need to defend parallel trends
Most typically using an event study: predicting trends in a unit based on its history
Sometimes: report balance between treatment and control groups in covariates
Placebo leads and lags
1.5.9 Heterogeneity
Things get much more complicated when there is
heterogeneous timing in treatment take up and
heterogeneous effects
It’s only recently been appreciated how tricky things can get
But we already have an intuition from our analysis of trials with heterogeneous assignment and heterogeneous effects:
in such cases fixed effects analysis weights stratum level treatment effects by the variance in assignment to treatment
something similar here
1.5.10 Staggared assignments
Just two units assigned at different times:
trend =0design <-declare_model(unit =add_level(N =2, ui =rnorm(N), I =1:N),time =add_level(N =6, ut =rnorm(N), T =1:N, nest =FALSE),obs =cross_levels(by =join_using(unit, time))) +declare_model(potential_outcomes(Y ~ trend*T + (1+Z)*(I ==2))) +declare_assignment(Z =1*((I ==1) * (T>3) + (I ==2) * (T>5))) +declare_measurement(Y =reveal_outcomes(Y~Z), I_c = I -mean(I)) +declare_inquiry(mean(Y_Z_1 - Y_Z_0)) +declare_estimator(Y ~ Z, label ="1. naive") +declare_estimator(Y ~ Z + I, label ="2. FE1") +declare_estimator(Y ~ Z +as.factor(T), label ="3. FE2") +declare_estimator(Y ~ Z + I +as.factor(T), label ="4. TWFE") +declare_estimator(Y ~ Z*I_c +as.factor(T), label ="5. Sat")
1.5.11 Staggared assignments diagnosis
draw_data(design) |>mutate(I =factor(I)) |>ggplot(aes(T, Y, color = I, label = Z)) +geom_text()
1.5.12 Staggared assignments diagnosis
Estimator
Mean Estimand
Mean Estimate
1. naive
0.50
-0.12
(0.00)
(0.00)
2. FE1
0.50
0.36
(0.00)
(0.00)
3. FE2
0.50
-1.00
(0.00)
(0.00)
4. TWFE
0.50
0.25
(0.00)
(0.00)
5. Sat
0.50
0.50
(0.00)
(0.00)
1.5.13 Where do these numbers come?
The estimand is .5 – this comes from weighting the effect for unit 1 (0) and the effect for unit 2 (1) equally
The naive estimate is wildly off because it does not take into account that units with different treatment shares have different average levels in outcomes
1.5.14 Where do these numbers come?
The estimate when we control for unit is 0.36: this comes from weighting the unit-stratum level effects according to the variance of assignment to each stratum:
The estimate when we control for time is -1: this comes from weighting the time-stratum level effects according to the variance of assignment to each stratum
it weights periods 4 and 5 only and equally, yielding the difference in outcomes for unit 1 in treatment (0) and group 2 in control (1)
So quite complex weighting of different comparisons
Units effectively get counted as both treatment and control units for different comparisons
Treated units counted as control units
Effectively negative weights on some quantities
Possible to have very poorly performing estimates
1.5.22 Solutions
Involve better specification of estimands
Use of comparisons directly relevant for the estimands
Imputation of control outcomes in treated units using data from appropriate control units only and then targetting estimands directly (Borusyak, Jaravel, and Spiess 2021)
Particular solution: focus on the effect of treatment at the time of first treatment / or time of switching: this usually involves a no carryover assumption (De Chaisemartin and d’Haultfoeuille 2020) also Imai and Kim (2021)
You are interested in the effects of influx of refugees on right wing voting
You have (say more conservative) states with no refugees at any period
You have (say more liberal) states with refugees post 2016 only
You want to do differences in differences comparing these states before and after
However you worry that things change differntially in the conservative and liberal states: no parallel trends
but you can identify areas within states that are more or less likely to be exposed and compare differnces in differences in the exposed and unexposed groups.
1.5.28 Triple differencing
So:
Two types of states: \(L \in \{0,1\}\), only \(L=1\) types get refugee influx
Two time periods: \(Post \in \{0,1\}\), refugee influx occurs in period \(Post = 1\)
Two groups: \(B \in \{0,1\}\), \(B=1\) types affected by refugee influx
\[Y = \beta_0 + \beta_1 L + \beta_2 B + \beta_3 Post + \beta_4 LB + \beta_5 L Post + \beta_6 B Post + \beta_7L B Post + \epsilon\]
1.5.29 Triple differencing
\[\frac{\partial ^3Y}{\partial L \partial B \partial Post} = \beta_7\]
1.5.30 Can we not just condition on the \(B=1\) types?
The level among the \(B=1\) types is:
\[Y = \beta_0 + \beta_1 L + \beta_2 + \beta_3 Post + \beta_4 L + \beta_5 L Post + \beta_6 Post + \beta_7L Post + \epsilon\] If you did simple before / after differences among the \(B\) types you would get
\[\Delta Y| L = 1, B = 1 = \beta_3 + \beta_5 + \beta_6 + \beta_7\]\[\Delta Y| L = 0, B = 1 = \beta_3 + \beta_6\]
1.5.31 Can we not just condition on the \(B=1\) types?
And so if you differenced again you would get:
\[\Delta^2 Y| B = 1 = \beta_5 + \beta_7\] So the problem is that you have \(\beta^5\) in here which corresponds exactly to how \(L\) states change over time.
1.5.32 Triple
But we can figure out \(\beta_5\) by doing a diff in diff among the \(B\)’s.
\[Y|B = 0 = \beta_0 + \beta_1 L + \beta_3 Post + \beta_5 L Post\]
\[\Delta^2 Y| B = 0 = \beta_5\]
1.5.33 Easier to swallow?
The identifying assumption is that absent treatment the differences in trends between \(L=0\) and \(L=1\) would be the same for units with \(B=0\) and \(B=1\).
Key insight: the contributions of the \(a\)s and \(n\)s are the same in the \(Z=0\) and \(Z=1\) groups so if you difference you are left with the changes in the contributions of the \(c\)s.
1.6.4 Local Average Treatment Effects
Average in \(Z=0\) group: \(\frac{{n_c} \overline{y}^0_{c}+ \left(n_{n}\overline{y}_{n} +{n_a} \overline{y}_a\right)}{n_a+n_c+n_n}\)
Average in \(Z=1\) group: \(\frac{{n_c} \overline{y}^1_{c} + \left(n_{n}\overline{y}_{n} +{n_a} \overline{y}_a \right)}{n_a+n_c+n_n}\)
So, the difference is the ITT: \(({\overline{y}^1_c-\overline{y}^0_c})\frac{n_c}{n}\)
\[LATE = \frac{ITT}{\frac{n_c}{n}}= \frac{\text{Intent to treat effect}}{\text{First stage effect}}\]
1.6.5 The good and the bad of LATE
You get a well-defined estimate even when there is non-random take-up
May sometimes be used to assess mediation or knock-on effects
But:
You need assumptions (monotonicity and the exclusion restriction – where were these used above?)
Your estimate is only for a subpopulation
The subpopulation is not chosen by you and is unknown
Different encouragements may yield different estimates since they may encourage different subgroups
1.6.6 Pearl and Chickering again
With and without an imposition of monotonicity
data("lipids_data")models <-list(unrestricted =make_model("Z -> X -> Y; X <-> Y"),restricted =make_model("Z -> X -> Y; X <-> Y") |>set_restrictions("X[Z=1] < X[Z=0]")) |>lapply(update_model, data = lipids_data, refresh =0) models |>query_model(query =list(CATE ="Y[X=1] - Y[X=0]", Nonmonotonic ="X[Z=1] < X[Z=0]"),given =list("X[Z=1] > X[Z=0]", TRUE),using ="posteriors")
1.6.7 Pearl and Chickering again
With and without an imposition of monotonicity:
model
query
mean
sd
unrestricted
CATE
0.70
0.05
restricted
CATE
0.71
0.05
unrestricted
Nonmonotonic
0.01
0.01
restricted
Nonmonotonic
0.00
0.00
In one case we assume monotonicity, in the other we update on it (easy in this case because of the empirically verifiable nature of one sided non compliance)
1.7 Regression discontintuity
Errors and diagnostics
1.7.1 Intuition
The core idea in an RDD design is that if a decision rule assigns units that are almost identical to each other to treatment and control conditions then we can infer effects for those cases by looking at those cases.
See excellent introduction: Lee and Lemieux (2010)
1.7.2 Intuition
Kids born on 31 August start school a year younger than kids born on 1 September: does starting younger help or hurt?
Kids born on 12 September 1983 are more likely to register Republican than kids born on 10 September 1983: can this identify the effects of registration on long term voting?
A district in which Republicans got 50.1% of the vote get a Republican representative while districts in which Republicans got 49.9% of the vote do not: does having a Republican representative make a difference for these districts?
1.7.3 Argument for identification
Setting:
Typically the decision is based on a value on a “running variable”, \(X\). e.g. Treatment if \(X > 0\)
The estimand is \(\mathbb{E}[Y(1) - Y(0)|X=0]\)
Two arguments:
Continuity: \(\mathbb{E}[Y(1)|X=x]\) and \(\mathbb{E}[Y(0)|X=x]\) are continuous (at \(x=0\)) in \(x\): so \(\lim_{\hat x \rightarrow 0}\mathbb{E}[Y(0)|X=\hat x] = \mathbb{E}[Y(0)|X=\hat 0]\)
Local randomization: tiny things that determine exact values of \(x\) are as if random and so we can think of a local experiment around \(X=0\).
also builds off a conditional expectation function defined at \(X=0\)
Exclusion restriction is implicit in continuity: If something else happens at the threshold then the conditional expectation functions jump at the thresholds
Implicit: \(X\) is exogenous in the sense that units cannot adjust \(X\) in order to be on one or the other side of the threshold
1.7.5 Evidence
Typically researchers show:
“First stage” results: assignment to treatment does indeed jump at the threshold
“ITT”: outcomes jump at the threshold
LATE (if fuzzy / imperfect compliance) using IV
1.7.6 Evidence
Typically researchers show:
In addition:
Arguments for no other treatments at the threshold
Arguments for no “sorting” at the threshold
Evidence for no “heaping” at the threshold (McCrary density test)
Sometimes:
argue for why estimates extend beyond the threshold
exclude points at the threshold (!)
1.7.7 Design
library(rdss) # for helper functionslibrary(rdrobust)cutoff <-0.5bandwidth <-0.5control <-function(X) {as.vector(poly(X, 4, raw =TRUE) %*%c(.7, -.8, .5, 1))}treatment <-function(X) {as.vector(poly(X, 4, raw =TRUE) %*%c(0, -1.5, .5, .8)) + .25}rdd_design <-declare_model(N =1000,U =rnorm(N, 0, 0.1),X =runif(N, 0, 1) + U - cutoff,D =1* (X >0),Y_D_0 =control(X) + U,Y_D_1 =treatment(X) + U ) +declare_inquiry(LATE =treatment(0) -control(0)) +declare_measurement(Y =reveal_outcomes(Y ~ D)) +declare_sampling(S = X >-bandwidth & X < bandwidth) +declare_estimator(Y ~ D*X, term ="D", label ="lm") +declare_estimator( Y, X, term ="Bias-Corrected",.method = rdrobust_helper,label ="optimal" )
1.7.8 RDD Data plotted
Note rdrobust implements:
local polynomial Regression Discontinuity (RD) point estimators
robust bias-corrected confidence intervals
See Calonico, Cattaneo, and Titiunik (2014) and related papers ? rdrobust::rdrobust
1.7.9 RDD Data plotted
rdd_design |>draw_data() |>ggplot(aes(X, Y, color =factor(D))) +geom_point(alpha = .3) +theme_bw() +geom_smooth(aes(X, Y_D_0)) +geom_smooth(aes(X, Y_D_1)) +theme(legend.position ="none")
1.7.10 RDD diagnosis
rdd_design |>diagnose_design()
Estimator
Mean Estimate
Bias
SD Estimate
Coverage
lm
0.23
-0.02
0.01
0.64
(0.00)
(0.00)
(0.00)
(0.02)
optimal
0.25
0.00
0.03
0.89
(0.00)
(0.00)
(0.00)
(0.01)
1.7.11 Bandwidth tradeoff
rdd_design |>redesign(bandwidth =seq(from =0.05, to =0.5, by =0.05)) |>diagnose_designs()
As we increase the bandwidth, the lm bias gets worse, but slowly, while the error falls.
The best bandwidth is relatively wide.
This is more true for the optimal estimator.
1.7.12 Geographic RDs
Are popular in political science:
Put a lot of pressure on assumption of no alternative treatment—including “random” country level shocks!
Put a lot of pressure on no sorting assumptions (why was the border put where it was; why did units settle here or there?)
Put a lot of pressure on SUTVA: people on one side are literally proximate to people on another
Borusyak, Kirill, Xavier Jaravel, and Jann Spiess. 2021. “Revisiting Event Study Designs: Robust and Efficient Estimation.”arXiv Preprint arXiv:2108.12419.
Calonico, Sebastian, Matias D Cattaneo, and Rocio Titiunik. 2014. “Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs.”Econometrica 82 (6): 2295–2326.
Cinelli, Carlos, Andrew Forney, and Judea Pearl. 2022. “A Crash Course in Good and Bad Controls.”Sociological Methods & Research, 00491241221099552.
De Chaisemartin, Clément, and Xavier d’Haultfoeuille. 2020. “Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects.”American Economic Review 110 (9): 2964–96.
Goodman-Bacon, Andrew. 2021. “Difference-in-Differences with Variation in Treatment Timing.”Journal of Econometrics 225 (2): 254–77.
Ho, Daniel, Kosuke Imai, Gary King, and Elizabeth A Stuart. 2011. “MatchIt: Nonparametric Preprocessing for Parametric Causal Inference.”Journal of Statistical Software 42: 1–28.
Imai, Kosuke, and In Song Kim. 2021. “On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data.”Political Analysis 29 (3): 405–15.
Imbens, Guido W, and Donald B Rubin. 2015. Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press.
Keele, Luke J, and Rocio Titiunik. 2015. “Geographic Boundaries as Regression Discontinuities.”Political Analysis 23 (1): 127–55.
Lee, David S, and Thomas Lemieux. 2010. “Regression Discontinuity Designs in Economics.”Journal of Economic Literature 48 (2): 281–355.
Lin, Winston. 2012. “Agnostic Notes on Regression Adjustments to Experimental Data: Reexamining Freedman’s Critique.”arXiv Preprint arXiv:1208.2301.
Olden, Andreas, and Jarle Møen. 2022. “The Triple Difference Estimator.”The Econometrics Journal 25 (3): 531–53.
Robins, James M, Andrea Rotnitzky, and Lue Ping Zhao. 1994. “Estimation of Regression Coefficients When Some Regressors Are Not Always Observed.”Journal of the American Statistical Association 89 (427): 846–66.
Roth, Jonathan, Pedro HC Sant’Anna, Alyssa Bilinski, and John Poe. 2023. “What’s Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature.”Journal of Econometrics.
Stuart, Elizabeth A, and Kerry M Green. 2008. “Using Full Matching to Estimate Causal Effects in Nonexperimental Studies: Examining the Relationship Between Adolescent Marijuana Use and Adult Outcomes.”Developmental Psychology 44 (2): 395.