Experiments are investigations in which an intervention, in all its essential elements, is under the control of the investigator. (Cox & Reid)
Two major types of control:
control over assignment to treatment – this is at the heart of many field experiments
control over the treatment itself – this is at the heart of many lab experiments
Main focus today is on 1 and on the question: how does control over assignment to treatment allow you to make reasonable statements about causal effects?
1.1.2 Experiments
1.1.3 Basic randomization
Basic randomization is very simple. For example, say you want to assign 5 of 10 units to treatment. Here is simple code:
sample(0:1, 10, replace =TRUE)
[1] 0 1 0 1 1 1 1 1 1 0
1.1.4 …should be replicable
In general you might want to set things up so that your randomization is replicable. You can do this by setting a seed:
set.seed(20111112)sample(0:1, 10, replace =TRUE)
[1] 1 0 1 1 1 0 1 1 1 1
and again:
set.seed(20111112)sample(0:1, 10, replace =TRUE)
[1] 1 0 1 1 1 0 1 1 1 1
1.1.5 Basic randomization
Even better is to set it up so that it can reproduce lots of possible draws so that you can check the propensities for each unit.
Here the \(P\) matrix gives 1000 possible ways of allocating 5 of 10 units to treatment. We can then confirm that the average propensity is 0.5.
A huge advantage of this approach is that if you make a mess of the random assignment; you can still generate the P matrix and use that for all analyses!
1.1.6 Do it in advance
Unless you need them to keep subjects at ease, leave your spinners and your dice and your cards behind.
Especially when you have multiple or complex randomizations you are generally much better doing it with a computer in advance
A survey dictionary with results from a complex randomization presented in a simple way for enumerators
1.1.7 Did the randomization ‘’work’’?
People often wonder: did randomization work? Common practice is to implement a set of \(t\)-tests to see if there is balance. This makes no sense.
If you doubt whether it was implemented properly do an \(F\) test. If you worry about variance, specify controls in advance as a function of relation with outcomes (more on this later). If you worry about conditional bias then look at substantive differences between groups, not \(t\)–tests
If you want realizations to have particular properties: build it into the scheme in advance.
1.2 Cluster Randomization
1.2.1 Cluster Randomization
Simply place units into groups (clusters) and then randomly assign the groups to treatment and control.
All units in a given group get the same treatment
Note: clusters are part of your design, not part of the world.
1.2.2 Cluster Randomization
Often used if intervention has to function at the cluster level or if outcome defined at the cluster level.
Disadvantage: loss of statistical power
However: perfectly possible to assign some treatments at cluster level and then other treatments at the individual level
Principle: (unless you are worried about spillovers) generally make clusters as small as possible
Principle: Surprisingly, variability in cluster size makes analysis harder. Try to control assignment so that cluster sizes are similar in treatment and in control.
Be clear about whether you believe effects are operating at the cluster level or at the individual level. This matters for power calculations.
Be clear about whether spillover effects operate only within clusters or also across them. If within only you might be able to interpret treatment as the effect of being in a treated cluster…
1.2.3 Cluster Randomization: Block by cluster size
Surprisingly, if clusters are of different sizes the difference in means estimator is not unbiased, even if all units are assigned to treatment with the same probability.
Here’s the intuition.Say there are two clusters each with homogeneous treatment effects:
Cluster
Size
Y0
Y1
1
1000000
0
1
2
1
0
0
Then: What is the true average treatment effect? What do you expect to estimate from cluster random assignment?
1.3 Blocked assignments and other restricted randomizations
1.3.1 Blocking
There are more or less efficient ways to randomize.
Randomization helps ensure good balance on all covariates (observed and unobserved) in expectation.
But balance may not be so great in realization
Blocking can help ensure balance ex post on observables
1.3.2 Blocking
Consider a case with four units and two strata. There are 6 possible assignments of 2 units to treatment:
ID
X
Y(0)
Y(1)
R1
R2
R3
R4
R5
R6
1
1
0
1
1
1
1
0
0
0
2
1
0
1
1
0
0
1
1
0
3
2
1
2
0
1
0
1
0
1
4
2
1
2
0
0
1
0
1
1
–
–
–
–
–
–
–
–
–
–
\(\widehat{\tau}\):
0
1
1
1
1
2
Even with a constant treatment effect and everything uniform within blocks, there is variance in the estimation of \(\widehat{\tau}\). This can be eliminated by excluding R1 and R6.
1.3.3 Blocking
Simple blocking in R (5 pairs):
sapply(1:5, function(i) sample(0:1))
1
2
3
4
5
1
1
0
1
1
0
0
1
0
0
1.3.4 Of blocks and clusters
1.3.5 Blocking
Blocking is a case of restricted randomization. Although each unit is sampled with equal probability, the profiles of possible assignments are not.
You have to take account of this when doing analysis
There are many other approaches.
Matched Pairs are a particularly fine approach to blocking
You could also randomize and then replace the randomization if you do not like the balance. This sounds tricky (and it is) but it is OK as long as you understand the true lottery process you are employing and incorporate that into analysis
It is even possible to block on covariates for which you don’t have data ex ante, by using methods in which you allocate treatment over time as a function of features of your sample (also tricky)
1.3.6 Other types of restricted randomization
Really you can set whatever criterion you want for your set of treated units to have (eg no treated unit beside another treated unit; at least 5 from the north, 10 from the south, guaranteed balance by some continuous variable etc)
You just have to be sure that you understand the random process that was used and that you can use it in the analysis stage
But here be dragons
The more complex your design, the more complex your analysis.
In general you should make sure that a given randomization procedure coupled with a given estimation procedure will produce an unbiased estimate. DeclareDesign can help with this.
1.3.7 Challenges with re-randomization
We can see that blocked and clustered assignments are actually types of restricted randomizations: they limit the set of acceptable randomizations to those with good properties
You could therefore implement the equivalent distribution of assignments y specifying an acceptable rule and then re-randomizing when the rule is met
That’s fine but you would then have to take account of clustering and blocking just as you do when you actually cluster or block
1.4 Factorial Designs
1.4.1 Factorial Designs
Often when you set up an experiment you want to look at more than one treatment.
Should you do this or not? How should you use your power?
1.4.2 Factorial Designs
Often when you set up an experiment you want to look at more than one treatment.
Should you do this or not? How should you use your power?
Load up:
\(T2=0\)
\(T2=1\)
T1 = 0
\(50\%\)
\(0\%\)
T1 = 1
\(50\%\)
\(0\%\)
Spread out:
\(T2=0\)
\(T2=1\)
T1 = 0
\(25\%\)
\(25\%\)
T1 = 1
\(25\%\)
\(25\%\)
1.4.3 Factorial Designs
Often when you set up an experiment you want to look at more than one treatment.
Should you do this or not? How should you use your power?
Three arm it?:
\(T2=0\)
\(T2=1\)
T1 = 0
\(33.3\%\)
\(33.3\%\)
T1 = 1
\(33.3\%\)
\(0\%\)
Bunch it?:
\(T2=0\)
\(T2=1\)
T1 = 0
\(40\%\)
\(20\%\)
T1 = 1
\(20\%\)
\(20\%\)
1.4.4 Factorial Designs
Surprisingly, adding multiple treatments does not generally eat into your power (unless you are decomposing a complex treatment – then it can. Why?)
Especially when you use a fully crossed design like the middle one above.
Fisher: “No aphorism is more frequently repeated in connection with field trials, than that we must ask Nature few questions, or, ideally, one question, at a time. The writer is convinced that this view is wholly mistaken.”
However – adding multiple treatments does alter the interpretation of your average treatment effects. If T2 is an unusual treatment for example, then half the T1 effect is measured for unusual situations.
This speaks to “spreading out.” Note: the “bunching” example may not pay off and has undesireable feature of introducing a correlation between treatment assignments.
1.4.5 Factorial Designs
Two ways to do favtial assignments in DeclareDesign:
# Block the second assignmentdeclare_assignment(Z1 =complete_ra(N)) +declare_assignment(Z2 =block_ra(blocks = Z1)) +# Recode four arms declare_assignment(Z =complete_ra(N, num_arms =4)) +declare_measurement(Z1 = (Z =="T2"| Z =="T4"),Z2 = (Z =="T3"| Z =="T4"))
1.4.6 Factorial Designs: In practice
In practice if you have a lot of treatments it can be hard to do full factorial designs – there may be too many combinations.
In such cases people use fractional factorial designs, like the one below (5 treatments but only 8 units!)
Variation
T1
T2
T3
T4
T5
1
0
0
0
1
1
2
0
0
1
0
0
3
0
1
0
0
1
4
0
1
1
1
0
5
1
0
0
1
0
6
1
0
1
0
1
7
1
1
0
0
0
8
1
1
1
1
1
1.4.7 Factorial Designs: In practice
Then randomly assign units to rows. Note columns might also be blocking covariates.
In R, look at library(survey)
1.4.8 Factorial Designs: In practice
But be careful: you have to be comfortable with possibly not having any simple counterfactual unit for any unit (invoke sparsity-of-effects principle).
Factorial designs are widely used to study multiple treatments in one experiment. While t-tests using a fully-saturated “long” model provide valid inferences, “short” model t-tests (that ignore interactions) yield higher power if interactions are zero, but incorrect inferences otherwise. Of 27 factorial experiments published in top-5 journals (2007–2017), 19 use the short model. After including interactions, over half of their results lose significance. […]
1.5 External Validity: Can randomization strategies help?
1.5.1 Principle: Address external validity at the design stage
Anything to be done on randomization to address external validity concerns?
Note 1: There is little or nothing about field experiments that makes the external validity problem greater for these than for other forms of ‘’sample based’’ research
Note 2: Studies that use up the available universe (cross national studies) actually have a distinct external validity problem
Two ways to think about external validity issues:
Are things likely to operate in other units like they operate in these units? (even with the same intervention)
Are the processes in operation in this treatment likely to operate in other treatments? (even in this population)
1.5.2 Principle: Address external validity at the design stage
Two ways to think about external validity issues:
Are things likely to operate in other units like they operate in these units? (even with the same intervention) 2.Are the processes in operation in this treatment likely to operate in other treatments? (even in this population)
Two approaches for 1.
Try to sample cases and estimate population average treatment effects
Exploit internal variation: block on features that make the case unusal and assess importance of these (eg is unit poor? assess how effects differ in poor and wealthy components)
2 is harder and requires a sharp identification of context free primitives, if there are such things.
1.6 Assignments with DeclareDesign
1.6.1 A design: Multilevel data
A design with hierarchical data and different assignment schemes.
Note that subjects are sorted here after the assignment to make it easier to see that in this case blocking ensures that exactly 5 students within each classroom are assigned to treatment.
1.6.10 Clustering
But what if all students in a given class have to be assigned the same treatment?
In many designs you seek to assign an integer number of subjects to treatment from some set.
Sometimes however your assignment targets are not integers.
Example:
I have 12 subjects in four blocks of 3 and I want to assign each subject to treatment with a 50% probability.
Two strategies:
I randomly set a target of either 1 or 2 for each block and then do complete assignment in each block. This can result in the numbers treated varying from 4 to 8
I randomly assign a target of 1 for two blocks and 2 for the other two blocks: Intuition–set a floor for the minimal target and then distribue the residual probability across blocks
1.6.18 Nasty integer issues
# remotes::install_github("macartan/probra")library(probra)set.seed(1)blocks <-rep(1:4, each =3)table(blocks, prob_ra(blocks = blocks))
blocks 0 1
1 1 2
2 1 2
3 2 1
4 2 1
table(blocks, block_ra(blocks = blocks))
blocks 0 1
1 1 2
2 2 1
3 1 2
4 1 2
1.6.19 Nasty integer issues
Can also be used to set targets
# remotes::install_github("macartan/probra")library(probra)set.seed(1)fabricate(N =4, size =c(47, 53, 87, 25), n_treated =prob_ra(.5*size)) %>% janitor::adorn_totals("row") |>kable(caption ="Setting targets to get 50% targets with minimal variance")
Setting targets to get 50% targets with minimal variance
ID
size
n_treated
1
47
23
2
53
27
3
87
43
4
25
13
Total
212
106
1.6.20 Nasty integer issues
Can also be used to set for complete assignment with heterogeneous propensities
set.seed(1)df <-fabricate(N =100, p =seq(.1, .9, length =100), Z =prob_ra(p)) mean(df$Z)
[1] 0.5
df |>ggplot(aes(p, Z)) +geom_point() +theme_bw()
1.7 Indirect assignments
Indirect control
1.7.1 Indirect assignments
Indirect assignments are generally generated by applying a direct assignment and then figuring our an implied indirect assignment
Looks better: but there are trade offs between the direct and indirect distributions
Figuring out the optimal procedure requires full diagnosis
Muralidharan, Karthik, Mauricio Romero, and Kaspar Wüthrich. 2023. “Factorial Designs, Model Selection, and (Incorrect) Inference in Randomized Experiments.”Review of Economics and Statistics, 1–44.