Lectures on causal inference and experimental methods

Macartan Humphreys

1 Roadmap

  1. Intro | Course outline, tools | DeclareDesign
  2. Causality | Fundamental problems and solutions | Inquiries and identification
  3. Estimation and Inference 1: Frequentist
  4. Estimation and Inference 2: Bayesian
  5. Design | Experimental Design | Design evaluation
  6. Topics 1| Topics and techniques 1
  7. Topics 2| Topics and techniques 2

1.1 Getting started

  • General aims and structure
  • Expectations
  • Pointers for exercises
  • Quick DeclareDesign intro

1.2 Aims and items

  • Deep understanding of key ideas in causal inference
  • Transportable tools for understanding how to evaluate and improve design
  • Applied skills for design and analysis
  • Exposure to open science practices
  • Deeper dive into some specific topics (see survey)

1.2.1 Syllabus and resources

1.2.2 The topics: Fundamentals

Day 1: Intro

Day 2: Causality

1.2.3 Estimation, Inference, and Design

Day 3: Estimation and Inference 1

Day 4: Estimation and Inference 2

Day 5: Design

1.2.4 Topics

Day 6: Topics 1

Day 7: Topics 2

1.3 Responsibilities

1.3.1 Expectations

  • 5 tasks
  • (Required) Work in four “exercise teams”: 1 team (and typically 2 exercises) per session \(\times 4\)
  • (Optional) Prepare a research design or short paper, perhaps building on existing work. Typically this contains:
    • a problem statement
    • a description of a method to address the problem
    • analytic or simulation based results describing properties of the solution
    • a discussion of implications for practice.

A passing paper will illustrate subtle features of a method; a good paper will identify unknown properties of a method; en excellent paper will develop a new method.

  • Plus general reading and participation.

1.3.2 Exercise team job

Teams should prepare 15 - 20 minute presentations on set puzzles. Typically the task is to:

  • Take a puzzle

  • Declare and diagnose a design that shows the issue under study (e.g. some estimator produces unbiased estimates under some condition)

  • Modify the design to show behavior when conditions are violated

  • Share a report with the class. Best in self-contained documents for easy third party viewing. e.g. .html via .qmd or .Rmd

  • Presentations should be about 10 minutes for a given puzzle.

1.4 Tips

1.4.1 Good coding rules

1.4.2 Good coding rules

  • Metadata first
  • Call packages at the beginning: use pacman
  • Put options at the top
  • Call all data files once, at the top. Best to call directly from a public archive, when possible.
  • Use functions and define them at the top: comment them; useful sometimes to illustrate what they do
  • Replicate first, re-analyze second. Use sections.
  • (For replications) Have subsections named after specific tables, figures or analyses

1.4.3 Aim

  • First best: If someone has access to your .Rmd/.qmd file they can hit render or compile and the whole thing reproduces first time. So: Nothing local, everything relative: so please do not include hardcoded paths to your computer

  • But: often you need ancillary files for data and code. That’s OK but aims should still be that with a self contained folder someone can open a main.Rmd file, hit compile and get everything. I usually have an input and an output subfolder.

1.4.4 Collaborative coding / writing

  • Do not get in the business of passing attachments around
  • Documents in some cloud: git, osf, Dropbox, Drive, Nextcloud
  • General rule: only post non sensitive, non proprietary material
  • Share self contained folders; folders contain a small set of live documents plus an archive. Old versions of documents are in archive. Only one version of the most recent document is in a main folder.
  • Data is self contained folder (in) and is never edited directly
  • Update to github frequently

2 DeclareDesign

How to define and assess research designs

2.1 Roadmap

  1. The MIDA framework and the declaration-diagnosis-redesign cycle
  2. DeclareDesign: key resources
  3. The Declare-Diagnose-Redesign life cycle
  4. Using designs
  5. Hands-on declaration and diagnosis
  6. Illustration using power calculations
  7. A deeper dive into declaration functionality

2.2 The MIDA Framework

2.2.1 Four elements of any research design

  • Model: set of models of what causes what and how
  • Inquiry: a question stated in terms of the model
  • Data strategy: the set of procedures we use to gather information from the world (sampling, assignment, measurement)
  • Answer strategy: how we summarize the data produced by the data strategy

2.2.2 Four elements of any research design

2.2.3 Declaration

Design declaration is telling the computer (and readers) what M, I, D, and A are.

2.2.4 Diagnosis

  • Design diagnosis is figuring out how the design will perform under imagined conditions.

  • Estimating “diagnosands” like power, bias, rmse, error rates, ethical harm, “amount learned”.

  • Diagnosis takes account of model uncertainty: it aims to identify models for which the design works well and models for which it does not

2.2.5 Redesign

Redesign is the fine-tuning of features of the data- and answer strategies to understand how changing them affects the diagnosands

  • Different sample sizes
  • Different randomization procedures
  • Different estimation strategies
  • Implementation: effort into compliance versus more effort into sample size

2.2.6 Very often you have to simulate!

  • Doing all this is often too hard to work out from rules of thumb or power calculators
  • Specialized formulas exist for some diagnosands, but not all

2.3 DeclareDesign: Overview of key functions and resources

2.3.1 Key commands for making a design

  • declare_model()

  • declare_inquiry()

  • declare_sampling()

  • declare_assignment()

  • declare_measurement()

  • declare_estimator()

and there are more declare_ functions!

2.3.2 Key commands for using a design

  • draw_data(design)
  • draw_estimands(design)
  • draw_estimates(design)
  • get_estimates(design, data)
  • run_design(design), simulate_design(design)
  • diagnose_design(design)
  • redesign(design, N = 200)
  • compare_designs(), compare_diagnoses()

2.3.3 Pipeable commands

design |> 
  redesign(N = c(200, 400)) |>
  diagnose_designs() |> 
  tidy() |> 

2.3.4 Cheat sheet


2.3.5 Other resources

  • Full slide deck: https://macartan.github.io/dd_bootcamp/
  • The website: https://declaredesign.org/
  • The book: https://book.declaredesign.org
  • The console: ?DeclareDesign

2.4 Design declaration-diagnosis-redesign workflow: Design

2.4.1 The simplest possible (diagnosable) design?

mean <- 0
simplest_design <- 
  declare_model(N = 100, Y = rnorm(N, mean)) +
  declare_inquiry(Q = mean) +
  declare_estimator(Y ~ 1)
  • we draw 100 units from a standard normal distribution
  • we define our inquiry as the population expectation
  • we estimate the average using a regression with a constant term

2.4.2 The simplest possible design?

simplest_design <- 
  declare_model(N = 100, Y = rnorm(N, mean)) +
  declare_inquiry(Q = 0) +
  declare_estimator(Y ~ 1)
  • This design has three steps, with steps connected by a +
  • The design itself is just a list of steps and has class design
List of 3
 $ model    :design_step:    declare_model(N = 100, Y = rnorm(N, mean)) 
 $ Q        :design_step:    declare_inquiry(Q = 0) 
 $ estimator:design_step:    declare_estimator(Y ~ 1) 
 - attr(*, "call")= language construct_design(steps = steps)
 - attr(*, "class")= chr [1:2] "design" "dd"

2.4.3 The simplest possible design? It’s a pipe

Each step is a function (or rather: a function that generates functions) and each function presupposes what is created by previous functions.

  • The ordering of steps is quite important
  • Most steps take the main data frame in and push the main dataframe out; this data frame normally builds up as you move along the pipe.

2.4.4 The simplest possible design? It’s a pipe

Each step is a function (or rather: a function that generates functions) and each function presupposes what is created by previous functions.

  • The ordering of steps is quite important
  • declare_estimator steps take the main data frame in and send out an estimator_df dataframe
  • declare_inquiry steps take the main data frame in and send out an estimand_df dataframe.

2.4.5 The simplest possible design? It’s a pipe

  • You can run these functions one at a time if you like.
  • For instance the third step presupposes the data from the first step:
df <- simplest_design[[1]]()
A  <- simplest_design[[3]](df)

A |> kable(digits = 2) |> kable_styling(font_size = 20)
estimator term estimate std.error statistic p.value conf.low conf.high df outcome
estimator (Intercept) -0.1 0.09 -1.2 0.23 -0.27 0.07 99 Y
Estimand  <- simplest_design[[2]](df)

Estimand |> kable(digits = 2) |> kable_styling(font_size = 20)
inquiry estimand
Q 0

2.4.6 The simplest possible design? Run it once

You can also just run through the whole design once by typing the name of the design:


Research design declaration summary

Step 1 (model): declare_model(N = 100, Y = rnorm(N, mean)) ---------------------

Step 2 (inquiry): declare_inquiry(Q = 0) ---------------------------------------

Step 3 (estimator): declare_estimator(Y ~ 1) -----------------------------------

Run of the design:
 inquiry estimand estimator        term estimate std.error statistic p.value
       Q        0 estimator (Intercept)    0.155     0.108      1.43   0.155
 conf.low conf.high df outcome
  -0.0597     0.371 99       Y

2.4.7 The simplest possible design? Run it again

Or by asking for a run of the design

one_run <- simplest_design |> run_design()
one_run |> kable(digits = 2) |> kable_styling(font_size = 18)
inquiry estimand estimator term estimate std.error statistic p.value conf.low conf.high df outcome
Q 0 estimator (Intercept) 0.08 0.1 0.8 0.43 -0.12 0.28 99 Y

A single run creates data, calculates estimands (the answer to inquiries) and calculates estimates plus ancillary statistics.

2.4.8 The simplest possible design?: Simulation

Or by asking for a run of the design

some_runs <- simplest_design |> simulate_design(sims = 1000)
some_runs |> kable(digits = 2) |> kable_styling(font_size = 16)
design sim_ID inquiry estimand estimator term estimate std.error statistic p.value conf.low conf.high df outcome
simplest_design 1 Q 0 estimator (Intercept) 0.03 0.09 0.32 0.75 -0.15 0.21 99 Y
simplest_design 2 Q 0 estimator (Intercept) 0.00 0.10 0.03 0.98 -0.19 0.20 99 Y
simplest_design 3 Q 0 estimator (Intercept) -0.13 0.09 -1.46 0.15 -0.31 0.05 99 Y
simplest_design 4 Q 0 estimator (Intercept) -0.14 0.10 -1.36 0.18 -0.35 0.06 99 Y
simplest_design 5 Q 0 estimator (Intercept) 0.02 0.12 0.16 0.88 -0.22 0.26 99 Y
simplest_design 6 Q 0 estimator (Intercept) -0.07 0.09 -0.69 0.49 -0.25 0.12 99 Y
simplest_design 7 Q 0 estimator (Intercept) -0.09 0.09 -0.99 0.33 -0.28 0.09 99 Y
simplest_design 8 Q 0 estimator (Intercept) -0.10 0.09 -1.16 0.25 -0.27 0.07 99 Y
simplest_design 9 Q 0 estimator (Intercept) -0.01 0.10 -0.12 0.90 -0.22 0.19 99 Y
simplest_design 10 Q 0 estimator (Intercept) 0.02 0.11 0.14 0.89 -0.21 0.24 99 Y
simplest_design 11 Q 0 estimator (Intercept) -0.06 0.09 -0.70 0.49 -0.25 0.12 99 Y
simplest_design 12 Q 0 estimator (Intercept) 0.08 0.10 0.83 0.41 -0.11 0.28 99 Y
simplest_design 13 Q 0 estimator (Intercept) -0.26 0.10 -2.52 0.01 -0.46 -0.06 99 Y
simplest_design 14 Q 0 estimator (Intercept) 0.08 0.09 0.81 0.42 -0.11 0.26 99 Y
simplest_design 15 Q 0 estimator (Intercept) 0.07 0.09 0.72 0.47 -0.12 0.25 99 Y
simplest_design 16 Q 0 estimator (Intercept) 0.15 0.10 1.49 0.14 -0.05 0.35 99 Y
simplest_design 17 Q 0 estimator (Intercept) -0.14 0.09 -1.54 0.13 -0.32 0.04 99 Y
simplest_design 18 Q 0 estimator (Intercept) -0.10 0.10 -0.98 0.33 -0.30 0.10 99 Y
simplest_design 19 Q 0 estimator (Intercept) 0.06 0.09 0.62 0.54 -0.13 0.24 99 Y
simplest_design 20 Q 0 estimator (Intercept) 0.03 0.09 0.29 0.77 -0.15 0.20 99 Y
simplest_design 21 Q 0 estimator (Intercept) -0.08 0.09 -0.93 0.35 -0.27 0.10 99 Y
simplest_design 22 Q 0 estimator (Intercept) -0.09 0.11 -0.81 0.42 -0.29 0.12 99 Y
simplest_design 23 Q 0 estimator (Intercept) -0.11 0.10 -1.06 0.29 -0.31 0.10 99 Y
simplest_design 24 Q 0 estimator (Intercept) 0.10 0.09 1.07 0.29 -0.08 0.28 99 Y
simplest_design 25 Q 0 estimator (Intercept) 0.01 0.09 0.16 0.88 -0.17 0.20 99 Y
simplest_design 26 Q 0 estimator (Intercept) -0.03 0.10 -0.27 0.79 -0.23 0.18 99 Y
simplest_design 27 Q 0 estimator (Intercept) 0.22 0.10 2.17 0.03 0.02 0.42 99 Y
simplest_design 28 Q 0 estimator (Intercept) -0.35 0.10 -3.35 0.00 -0.55 -0.14 99 Y
simplest_design 29 Q 0 estimator (Intercept) -0.04 0.09 -0.39 0.69 -0.22 0.15 99 Y
simplest_design 30 Q 0 estimator (Intercept) 0.02 0.10 0.25 0.81 -0.17 0.22 99 Y
simplest_design 31 Q 0 estimator (Intercept) 0.21 0.10 2.05 0.04 0.01 0.40 99 Y
simplest_design 32 Q 0 estimator (Intercept) -0.01 0.10 -0.09 0.93 -0.20 0.18 99 Y
simplest_design 33 Q 0 estimator (Intercept) 0.15 0.09 1.64 0.10 -0.03 0.34 99 Y
simplest_design 34 Q 0 estimator (Intercept) -0.04 0.10 -0.42 0.68 -0.25 0.16 99 Y
simplest_design 35 Q 0 estimator (Intercept) 0.05 0.09 0.53 0.60 -0.13 0.23 99 Y
simplest_design 36 Q 0 estimator (Intercept) -0.15 0.11 -1.35 0.18 -0.36 0.07 99 Y
simplest_design 37 Q 0 estimator (Intercept) 0.14 0.11 1.27 0.21 -0.08 0.35 99 Y
simplest_design 38 Q 0 estimator (Intercept) 0.03 0.10 0.26 0.80 -0.18 0.23 99 Y
simplest_design 39 Q 0 estimator (Intercept) 0.22 0.09 2.38 0.02 0.04 0.40 99 Y
simplest_design 40 Q 0 estimator (Intercept) -0.05 0.11 -0.43 0.67 -0.26 0.17 99 Y
simplest_design 41 Q 0 estimator (Intercept) -0.01 0.10 -0.06 0.95 -0.19 0.18 99 Y
simplest_design 42 Q 0 estimator (Intercept) -0.01 0.10 -0.14 0.89 -0.21 0.18 99 Y
simplest_design 43 Q 0 estimator (Intercept) -0.12 0.10 -1.23 0.22 -0.33 0.08 99 Y
simplest_design 44 Q 0 estimator (Intercept) -0.01 0.10 -0.05 0.96 -0.20 0.19 99 Y
simplest_design 45 Q 0 estimator (Intercept) 0.05 0.10 0.48 0.63 -0.15 0.24 99 Y
simplest_design 46 Q 0 estimator (Intercept) 0.04 0.11 0.39 0.70 -0.17 0.26 99 Y
simplest_design 47 Q 0 estimator (Intercept) 0.06 0.11 0.58 0.56 -0.15 0.28 99 Y
simplest_design 48 Q 0 estimator (Intercept) -0.16 0.10 -1.52 0.13 -0.36 0.05 99 Y
simplest_design 49 Q 0 estimator (Intercept) 0.03 0.10 0.28 0.78 -0.17 0.22 99 Y
simplest_design 50 Q 0 estimator (Intercept) -0.08 0.10 -0.80 0.42 -0.28 0.12 99 Y
simplest_design 51 Q 0 estimator (Intercept) -0.07 0.11 -0.64 0.52 -0.29 0.15 99 Y
simplest_design 52 Q 0 estimator (Intercept) 0.09 0.11 0.80 0.43 -0.13 0.30 99 Y
simplest_design 53 Q 0 estimator (Intercept) -0.10 0.11 -0.96 0.34 -0.31 0.11 99 Y
simplest_design 54 Q 0 estimator (Intercept) -0.04 0.09 -0.42 0.68 -0.22 0.14 99 Y
simplest_design 55 Q 0 estimator (Intercept) 0.12 0.10 1.17 0.25 -0.08 0.31 99 Y
simplest_design 56 Q 0 estimator (Intercept) -0.02 0.11 -0.16 0.88 -0.23 0.19 99 Y
simplest_design 57 Q 0 estimator (Intercept) -0.11 0.10 -1.12 0.26 -0.31 0.09 99 Y
simplest_design 58 Q 0 estimator (Intercept) 0.12 0.11 1.06 0.29 -0.10 0.33 99 Y
simplest_design 59 Q 0 estimator (Intercept) 0.17 0.10 1.67 0.10 -0.03 0.37 99 Y
simplest_design 60 Q 0 estimator (Intercept) -0.10 0.09 -1.15 0.25 -0.28 0.08 99 Y
simplest_design 61 Q 0 estimator (Intercept) 0.16 0.10 1.68 0.10 -0.03 0.35 99 Y
simplest_design 62 Q 0 estimator (Intercept) 0.08 0.10 0.84 0.40 -0.11 0.27 99 Y
simplest_design 63 Q 0 estimator (Intercept) -0.05 0.10 -0.45 0.66 -0.25 0.16 99 Y
simplest_design 64 Q 0 estimator (Intercept) 0.17 0.11 1.58 0.12 -0.04 0.38 99 Y
simplest_design 65 Q 0 estimator (Intercept) -0.14 0.11 -1.26 0.21 -0.36 0.08 99 Y
simplest_design 66 Q 0 estimator (Intercept) 0.06 0.10 0.61 0.54 -0.13 0.25 99 Y
simplest_design 67 Q 0 estimator (Intercept) -0.05 0.10 -0.52 0.60 -0.24 0.14 99 Y
simplest_design 68 Q 0 estimator (Intercept) -0.08 0.09 -0.83 0.41 -0.26 0.11 99 Y
simplest_design 69 Q 0 estimator (Intercept) 0.04 0.10 0.42 0.68 -0.16 0.24 99 Y
simplest_design 70 Q 0 estimator (Intercept) 0.00 0.09 0.02 0.99 -0.17 0.18 99 Y
simplest_design 71 Q 0 estimator (Intercept) -0.05 0.09 -0.49 0.63 -0.23 0.14 99 Y
simplest_design 72 Q 0 estimator (Intercept) 0.13 0.11 1.14 0.26 -0.09 0.35 99 Y
simplest_design 73 Q 0 estimator (Intercept) 0.03 0.09 0.38 0.71 -0.15 0.21 99 Y
simplest_design 74 Q 0 estimator (Intercept) 0.05 0.10 0.45 0.66 -0.16 0.25 99 Y
simplest_design 75 Q 0 estimator (Intercept) -0.06 0.10 -0.56 0.58 -0.26 0.15 99 Y
simplest_design 76 Q 0 estimator (Intercept) -0.15 0.10 -1.54 0.13 -0.34 0.04 99 Y
simplest_design 77 Q 0 estimator (Intercept) -0.06 0.11 -0.58 0.56 -0.28 0.15 99 Y
simplest_design 78 Q 0 estimator (Intercept) -0.02 0.09 -0.25 0.80 -0.20 0.16 99 Y
simplest_design 79 Q 0 estimator (Intercept) 0.12 0.11 1.02 0.31 -0.11 0.34 99 Y
simplest_design 80 Q 0 estimator (Intercept) -0.07 0.11 -0.68 0.50 -0.28 0.14 99 Y
simplest_design 81 Q 0 estimator (Intercept) -0.02 0.09 -0.21 0.84 -0.20 0.17 99 Y
simplest_design 82 Q 0 estimator (Intercept) 0.00 0.11 -0.04 0.97 -0.22 0.21 99 Y
simplest_design 83 Q 0 estimator (Intercept) -0.11 0.11 -1.03 0.31 -0.33 0.11 99 Y
simplest_design 84 Q 0 estimator (Intercept) -0.11 0.11 -1.00 0.32 -0.33 0.11 99 Y
simplest_design 85 Q 0 estimator (Intercept) -0.02 0.11 -0.18 0.86 -0.24 0.20 99 Y
simplest_design 86 Q 0 estimator (Intercept) 0.17 0.08 2.00 0.05 0.00 0.33 99 Y
simplest_design 87 Q 0 estimator (Intercept) -0.17 0.11 -1.55 0.13 -0.38 0.05 99 Y
simplest_design 88 Q 0 estimator (Intercept) -0.23 0.11 -2.02 0.05 -0.45 0.00 99 Y
simplest_design 89 Q 0 estimator (Intercept) -0.06 0.09 -0.62 0.54 -0.23 0.12 99 Y
simplest_design 90 Q 0 estimator (Intercept) 0.16 0.09 1.78 0.08 -0.02 0.33 99 Y
simplest_design 91 Q 0 estimator (Intercept) 0.12 0.10 1.18 0.24 -0.08 0.33 99 Y
simplest_design 92 Q 0 estimator (Intercept) 0.12 0.10 1.22 0.22 -0.08 0.33 99 Y
simplest_design 93 Q 0 estimator (Intercept) -0.06 0.09 -0.69 0.49 -0.24 0.12 99 Y
simplest_design 94 Q 0 estimator (Intercept) 0.10 0.10 1.03 0.30 -0.09 0.29 99 Y
simplest_design 95 Q 0 estimator (Intercept) 0.10 0.10 0.98 0.33 -0.10 0.31 99 Y
simplest_design 96 Q 0 estimator (Intercept) -0.21 0.10 -2.16 0.03 -0.40 -0.02 99 Y
simplest_design 97 Q 0 estimator (Intercept) 0.00 0.10 0.01 1.00 -0.20 0.20 99 Y
simplest_design 98 Q 0 estimator (Intercept) 0.03 0.09 0.37 0.72 -0.14 0.21 99 Y
simplest_design 99 Q 0 estimator (Intercept) -0.01 0.10 -0.09 0.93 -0.21 0.19 99 Y
simplest_design 100 Q 0 estimator (Intercept) -0.03 0.09 -0.35 0.73 -0.22 0.15 99 Y
simplest_design 101 Q 0 estimator (Intercept) -0.15 0.11 -1.38 0.17 -0.36 0.06 99 Y
simplest_design 102 Q 0 estimator (Intercept) 0.06 0.09 0.63 0.53 -0.13 0.24 99 Y
simplest_design 103 Q 0 estimator (Intercept) 0.00 0.10 -0.02 0.98 -0.21 0.20 99 Y
simplest_design 104 Q 0 estimator (Intercept) -0.24 0.10 -2.29 0.02 -0.44 -0.03 99 Y
simplest_design 105 Q 0 estimator (Intercept) -0.05 0.11 -0.51 0.61 -0.27 0.16 99 Y
simplest_design 106 Q 0 estimator (Intercept) -0.16 0.09 -1.71 0.09 -0.35 0.03 99 Y
simplest_design 107 Q 0 estimator (Intercept) -0.11 0.10 -1.11 0.27 -0.31 0.09 99 Y
simplest_design 108 Q 0 estimator (Intercept) 0.02 0.11 0.18 0.86 -0.19 0.23 99 Y
simplest_design 109 Q 0 estimator (Intercept) 0.07 0.09 0.77 0.45 -0.11 0.26 99 Y
simplest_design 110 Q 0 estimator (Intercept) 0.03 0.10 0.30 0.77 -0.17 0.23 99 Y
simplest_design 111 Q 0 estimator (Intercept) 0.25 0.09 2.82 0.01 0.08 0.43 99 Y
simplest_design 112 Q 0 estimator (Intercept) 0.02 0.10 0.25 0.80 -0.17 0.22 99 Y
simplest_design 113 Q 0 estimator (Intercept) 0.01 0.11 0.13 0.90 -0.21 0.24 99 Y
simplest_design 114 Q 0 estimator (Intercept) -0.05 0.10 -0.54 0.59 -0.25 0.14 99 Y
simplest_design 115 Q 0 estimator (Intercept) -0.17 0.09 -1.87 0.06 -0.35 0.01 99 Y
simplest_design 116 Q 0 estimator (Intercept) 0.02 0.10 0.21 0.83 -0.18 0.22 99 Y
simplest_design 117 Q 0 estimator (Intercept) -0.10 0.10 -0.96 0.34 -0.30 0.10 99 Y
simplest_design 118 Q 0 estimator (Intercept) -0.14 0.10 -1.39 0.17 -0.34 0.06 99 Y
simplest_design 119 Q 0 estimator (Intercept) -0.09 0.10 -0.95 0.34 -0.28 0.10 99 Y
simplest_design 120 Q 0 estimator (Intercept) 0.06 0.09 0.70 0.48 -0.12 0.25 99 Y
simplest_design 121 Q 0 estimator (Intercept) 0.01 0.10 0.09 0.93 -0.19 0.20 99 Y
simplest_design 122 Q 0 estimator (Intercept) 0.01 0.10 0.14 0.89 -0.18 0.21 99 Y
simplest_design 123 Q 0 estimator (Intercept) 0.25 0.09 2.68 0.01 0.06 0.44 99 Y
simplest_design 124 Q 0 estimator (Intercept) 0.07 0.10 0.73 0.46 -0.13 0.27 99 Y
simplest_design 125 Q 0 estimator (Intercept) -0.14 0.10 -1.46 0.15 -0.33 0.05 99 Y
simplest_design 126 Q 0 estimator (Intercept) -0.07 0.10 -0.73 0.47 -0.27 0.13 99 Y
simplest_design 127 Q 0 estimator (Intercept) -0.18 0.10 -1.82 0.07 -0.38 0.02 99 Y
simplest_design 128 Q 0 estimator (Intercept) 0.03 0.10 0.25 0.80 -0.18 0.23 99 Y
simplest_design 129 Q 0 estimator (Intercept) -0.10 0.09 -1.07 0.29 -0.28 0.08 99 Y
simplest_design 130 Q 0 estimator (Intercept) 0.02 0.10 0.23 0.82 -0.17 0.22 99 Y
simplest_design 131 Q 0 estimator (Intercept) 0.00 0.11 -0.03 0.98 -0.22 0.21 99 Y
simplest_design 132 Q 0 estimator (Intercept) -0.05 0.11 -0.49 0.62 -0.26 0.16 99 Y
simplest_design 133 Q 0 estimator (Intercept) 0.17 0.09 1.83 0.07 -0.01 0.35 99 Y
simplest_design 134 Q 0 estimator (Intercept) -0.11 0.10 -1.11 0.27 -0.30 0.08 99 Y
simplest_design 135 Q 0 estimator (Intercept) 0.13 0.10 1.33 0.19 -0.06 0.32 99 Y
simplest_design 136 Q 0 estimator (Intercept) -0.05 0.11 -0.46 0.65 -0.26 0.16 99 Y
simplest_design 137 Q 0 estimator (Intercept) -0.05 0.09 -0.50 0.62 -0.23 0.14 99 Y
simplest_design 138 Q 0 estimator (Intercept) -0.11 0.10 -1.16 0.25 -0.31 0.08 99 Y
simplest_design 139 Q 0 estimator (Intercept) -0.12 0.09 -1.32 0.19 -0.31 0.06 99 Y
simplest_design 140 Q 0 estimator (Intercept) -0.06 0.10 -0.66 0.51 -0.26 0.13 99 Y
simplest_design 141 Q 0 estimator (Intercept) 0.05 0.09 0.64 0.53 -0.12 0.22 99 Y
simplest_design 142 Q 0 estimator (Intercept) 0.00 0.12 -0.04 0.97 -0.24 0.23 99 Y
simplest_design 143 Q 0 estimator (Intercept) 0.04 0.09 0.44 0.66 -0.14 0.22 99 Y
simplest_design 144 Q 0 estimator (Intercept) -0.01 0.11 -0.08 0.94 -0.23 0.21 99 Y
simplest_design 145 Q 0 estimator (Intercept) -0.05 0.12 -0.43 0.67 -0.28 0.18 99 Y
simplest_design 146 Q 0 estimator (Intercept) -0.11 0.10 -1.02 0.31 -0.31 0.10 99 Y
simplest_design 147 Q 0 estimator (Intercept) 0.03 0.10 0.28 0.78 -0.17 0.23 99 Y
simplest_design 148 Q 0 estimator (Intercept) -0.03 0.10 -0.34 0.74 -0.22 0.16 99 Y
simplest_design 149 Q 0 estimator (Intercept) -0.06 0.10 -0.53 0.59 -0.26 0.15 99 Y
simplest_design 150 Q 0 estimator (Intercept) -0.19 0.10 -1.88 0.06 -0.39 0.01 99 Y
simplest_design 151 Q 0 estimator (Intercept) 0.08 0.10 0.77 0.44 -0.12 0.28 99 Y
simplest_design 152 Q 0 estimator (Intercept) 0.02 0.11 0.16 0.88 -0.20 0.23 99 Y
simplest_design 153 Q 0 estimator (Intercept) -0.15 0.09 -1.58 0.12 -0.33 0.04 99 Y
simplest_design 154 Q 0 estimator (Intercept) -0.18 0.09 -1.94 0.06 -0.37 0.00 99 Y
simplest_design 155 Q 0 estimator (Intercept) 0.12 0.11 1.09 0.28 -0.10 0.33 99 Y
simplest_design 156 Q 0 estimator (Intercept) -0.18 0.10 -1.72 0.09 -0.38 0.03 99 Y
simplest_design 157 Q 0 estimator (Intercept) -0.04 0.11 -0.37 0.71 -0.26 0.18 99 Y
simplest_design 158 Q 0 estimator (Intercept) 0.01 0.11 0.07 0.94 -0.21 0.22 99 Y
simplest_design 159 Q 0 estimator (Intercept) -0.08 0.10 -0.89 0.38 -0.27 0.10 99 Y
simplest_design 160 Q 0 estimator (Intercept) -0.19 0.10 -1.92 0.06 -0.39 0.01 99 Y
simplest_design 161 Q 0 estimator (Intercept) 0.17 0.10 1.79 0.08 -0.02 0.36 99 Y
simplest_design 162 Q 0 estimator (Intercept) 0.07 0.10 0.72 0.47 -0.12 0.26 99 Y
simplest_design 163 Q 0 estimator (Intercept) 0.12 0.12 1.04 0.30 -0.11 0.35 99 Y
simplest_design 164 Q 0 estimator (Intercept) 0.13 0.09 1.44 0.15 -0.05 0.32 99 Y
simplest_design 165 Q 0 estimator (Intercept) 0.08 0.09 0.98 0.33 -0.09 0.25 99 Y
simplest_design 166 Q 0 estimator (Intercept) 0.23 0.11 2.12 0.04 0.01 0.44 99 Y
simplest_design 167 Q 0 estimator (Intercept) 0.03 0.10 0.28 0.78 -0.17 0.22 99 Y
simplest_design 168 Q 0 estimator (Intercept) 0.09 0.10 0.88 0.38 -0.12 0.30 99 Y
simplest_design 169 Q 0 estimator (Intercept) 0.04 0.11 0.33 0.74 -0.19 0.26 99 Y
simplest_design 170 Q 0 estimator (Intercept) 0.12 0.09 1.32 0.19 -0.06 0.29 99 Y
simplest_design 171 Q 0 estimator (Intercept) -0.03 0.09 -0.34 0.74 -0.20 0.14 99 Y
simplest_design 172 Q 0 estimator (Intercept) 0.07 0.10 0.69 0.49 -0.13 0.27 99 Y
simplest_design 173 Q 0 estimator (Intercept) -0.06 0.08 -0.73 0.47 -0.23 0.10 99 Y
simplest_design 174 Q 0 estimator (Intercept) 0.01 0.09 0.14 0.89 -0.17 0.20 99 Y
simplest_design 175 Q 0 estimator (Intercept) -0.09 0.10 -0.88 0.38 -0.29 0.11 99 Y
simplest_design 176 Q 0 estimator (Intercept) 0.18 0.11 1.62 0.11 -0.04 0.39 99 Y
simplest_design 177 Q 0 estimator (Intercept) -0.12 0.11 -1.14 0.26 -0.33 0.09 99 Y
simplest_design 178 Q 0 estimator (Intercept) -0.14 0.11 -1.23 0.22 -0.36 0.09 99 Y
simplest_design 179 Q 0 estimator (Intercept) -0.06 0.09 -0.71 0.48 -0.24 0.11 99 Y
simplest_design 180 Q 0 estimator (Intercept) -0.05 0.11 -0.41 0.68 -0.27 0.18 99 Y
simplest_design 181 Q 0 estimator (Intercept) -0.04 0.11 -0.40 0.69 -0.25 0.17 99 Y
simplest_design 182 Q 0 estimator (Intercept) -0.03 0.10 -0.32 0.75 -0.23 0.16 99 Y
simplest_design 183 Q 0 estimator (Intercept) 0.14 0.10 1.37 0.17 -0.06 0.35 99 Y
simplest_design 184 Q 0 estimator (Intercept) 0.01 0.11 0.14 0.89 -0.20 0.23 99 Y
simplest_design 185 Q 0 estimator (Intercept) 0.11 0.11 0.98 0.33 -0.12 0.34 99 Y
simplest_design 186 Q 0 estimator (Intercept) -0.04 0.10 -0.41 0.68 -0.24 0.16 99 Y
simplest_design 187 Q 0 estimator (Intercept) 0.17 0.10 1.63 0.11 -0.04 0.37 99 Y
simplest_design 188 Q 0 estimator (Intercept) 0.05 0.10 0.53 0.60 -0.14 0.24 99 Y
simplest_design 189 Q 0 estimator (Intercept) 0.11 0.11 1.03 0.31 -0.10 0.32 99 Y
simplest_design 190 Q 0 estimator (Intercept) 0.00 0.08 -0.01 0.99 -0.17 0.17 99 Y
simplest_design 191 Q 0 estimator (Intercept) 0.04 0.10 0.44 0.66 -0.15 0.24 99 Y
simplest_design 192 Q 0 estimator (Intercept) 0.17 0.10 1.60 0.11 -0.04 0.37 99 Y
simplest_design 193 Q 0 estimator (Intercept) -0.09 0.09 -1.09 0.28 -0.27 0.08 99 Y
simplest_design 194 Q 0 estimator (Intercept) -0.05 0.09 -0.60 0.55 -0.24 0.13 99 Y
simplest_design 195 Q 0 estimator (Intercept) -0.02 0.10 -0.23 0.82 -0.22 0.17 99 Y
simplest_design 196 Q 0 estimator (Intercept) -0.03 0.11 -0.28 0.78 -0.26 0.19 99 Y
simplest_design 197 Q 0 estimator (Intercept) 0.06 0.11 0.54 0.59 -0.16 0.28 99 Y
simplest_design 198 Q 0 estimator (Intercept) -0.07 0.09 -0.78 0.44 -0.25 0.11 99 Y
simplest_design 199 Q 0 estimator (Intercept) 0.18 0.09 1.94 0.05 0.00 0.37 99 Y
simplest_design 200 Q 0 estimator (Intercept) 0.06 0.09 0.75 0.46 -0.11 0.23 99 Y
simplest_design 201 Q 0 estimator (Intercept) 0.06 0.11 0.53 0.59 -0.15 0.26 99 Y
simplest_design 202 Q 0 estimator (Intercept) 0.03 0.09 0.30 0.77 -0.15 0.20 99 Y
simplest_design 203 Q 0 estimator (Intercept) 0.02 0.09 0.27 0.79 -0.16 0.21 99 Y
simplest_design 204 Q 0 estimator (Intercept) -0.09 0.09 -1.04 0.30 -0.27 0.09 99 Y
simplest_design 205 Q 0 estimator (Intercept) -0.05 0.11 -0.46 0.65 -0.26 0.16 99 Y
simplest_design 206 Q 0 estimator (Intercept) -0.04 0.10 -0.44 0.66 -0.23 0.15 99 Y
simplest_design 207 Q 0 estimator (Intercept) -0.08 0.09 -0.84 0.40 -0.25 0.10 99 Y
simplest_design 208 Q 0 estimator (Intercept) -0.14 0.10 -1.34 0.18 -0.35 0.07 99 Y
simplest_design 209 Q 0 estimator (Intercept) -0.04 0.10 -0.41 0.68 -0.24 0.16 99 Y
simplest_design 210 Q 0 estimator (Intercept) 0.12 0.11 1.02 0.31 -0.11 0.34 99 Y
simplest_design 211 Q 0 estimator (Intercept) -0.02 0.11 -0.19 0.85 -0.24 0.19 99 Y
simplest_design 212 Q 0 estimator (Intercept) 0.00 0.09 0.03 0.98 -0.17 0.18 99 Y
simplest_design 213 Q 0 estimator (Intercept) -0.11 0.10 -1.14 0.26 -0.31 0.08 99 Y
simplest_design 214 Q 0 estimator (Intercept) 0.01 0.09 0.06 0.95 -0.18 0.19 99 Y
simplest_design 215 Q 0 estimator (Intercept) 0.10 0.11 0.91 0.36 -0.12 0.32 99 Y
simplest_design 216 Q 0 estimator (Intercept) -0.02 0.10 -0.19 0.85 -0.22 0.19 99 Y
simplest_design 217 Q 0 estimator (Intercept) -0.06 0.10 -0.54 0.59 -0.26 0.15 99 Y
simplest_design 218 Q 0 estimator (Intercept) -0.16 0.09 -1.77 0.08 -0.33 0.02 99 Y
simplest_design 219 Q 0 estimator (Intercept) 0.05 0.10 0.54 0.59 -0.14 0.25 99 Y
simplest_design 220 Q 0 estimator (Intercept) 0.13 0.09 1.50 0.14 -0.04 0.31 99 Y
simplest_design 221 Q 0 estimator (Intercept) -0.01 0.10 -0.10 0.92 -0.20 0.18 99 Y
simplest_design 222 Q 0 estimator (Intercept) 0.10 0.10 1.05 0.30 -0.09 0.29 99 Y
simplest_design 223 Q 0 estimator (Intercept) -0.08 0.10 -0.79 0.43 -0.28 0.12 99 Y
simplest_design 224 Q 0 estimator (Intercept) -0.13 0.11 -1.17 0.24 -0.35 0.09 99 Y
simplest_design 225 Q 0 estimator (Intercept) 0.13 0.10 1.26 0.21 -0.07 0.33 99 Y
simplest_design 226 Q 0 estimator (Intercept) -0.14 0.11 -1.27 0.21 -0.36 0.08 99 Y
simplest_design 227 Q 0 estimator (Intercept) -0.13 0.10 -1.26 0.21 -0.33 0.07 99 Y
simplest_design 228 Q 0 estimator (Intercept) -0.20 0.11 -1.90 0.06 -0.41 0.01 99 Y
simplest_design 229 Q 0 estimator (Intercept) -0.01 0.09 -0.06 0.96 -0.19 0.18 99 Y
simplest_design 230 Q 0 estimator (Intercept) 0.21 0.10 2.07 0.04 0.01 0.42 99 Y
simplest_design 231 Q 0 estimator (Intercept) -0.21 0.10 -2.03 0.05 -0.42 0.00 99 Y
simplest_design 232 Q 0 estimator (Intercept) 0.12 0.09 1.35 0.18 -0.06 0.30 99 Y
simplest_design 233 Q 0 estimator (Intercept) -0.12 0.11 -1.14 0.26 -0.33 0.09 99 Y
simplest_design 234 Q 0 estimator (Intercept) -0.02 0.10 -0.20 0.84 -0.22 0.18 99 Y
simplest_design 235 Q 0 estimator (Intercept) 0.01 0.10 0.11 0.91 -0.18 0.20 99 Y
simplest_design 236 Q 0 estimator (Intercept) -0.05 0.09 -0.54 0.59 -0.24 0.14 99 Y
simplest_design 237 Q 0 estimator (Intercept) -0.12 0.10 -1.22 0.22 -0.31 0.07 99 Y
simplest_design 238 Q 0 estimator (Intercept) 0.10 0.09 1.12 0.26 -0.08 0.29 99 Y
simplest_design 239 Q 0 estimator (Intercept) 0.15 0.09 1.68 0.10 -0.03 0.34 99 Y
simplest_design 240 Q 0 estimator (Intercept) 0.04 0.10 0.45 0.65 -0.15 0.23 99 Y
simplest_design 241 Q 0 estimator (Intercept) 0.00 0.11 0.00 1.00 -0.21 0.21 99 Y
simplest_design 242 Q 0 estimator (Intercept) 0.03 0.10 0.33 0.74 -0.17 0.24 99 Y
simplest_design 243 Q 0 estimator (Intercept) -0.14 0.10 -1.44 0.15 -0.34 0.05 99 Y
simplest_design 244 Q 0 estimator (Intercept) 0.10 0.09 1.07 0.29 -0.09 0.29 99 Y
simplest_design 245 Q 0 estimator (Intercept) -0.01 0.10 -0.14 0.89 -0.21 0.18 99 Y
simplest_design 246 Q 0 estimator (Intercept) 0.02 0.10 0.26 0.80 -0.17 0.22 99 Y
simplest_design 247 Q 0 estimator (Intercept) -0.12 0.10 -1.23 0.22 -0.32 0.08 99 Y
simplest_design 248 Q 0 estimator (Intercept) 0.09 0.09 0.95 0.35 -0.10 0.27 99 Y
simplest_design 249 Q 0 estimator (Intercept) 0.03 0.10 0.32 0.75 -0.17 0.24 99 Y
simplest_design 250 Q 0 estimator (Intercept) 0.18 0.09 1.93 0.06 -0.01 0.36 99 Y
simplest_design 251 Q 0 estimator (Intercept) -0.01 0.10 -0.06 0.96 -0.19 0.18 99 Y
simplest_design 252 Q 0 estimator (Intercept) 0.08 0.09 0.92 0.36 -0.09 0.25 99 Y
simplest_design 253 Q 0 estimator (Intercept) 0.01 0.10 0.13 0.90 -0.19 0.22 99 Y
simplest_design 254 Q 0 estimator (Intercept) 0.09 0.09 0.96 0.34 -0.09 0.26 99 Y
simplest_design 255 Q 0 estimator (Intercept) -0.15 0.10 -1.44 0.15 -0.35 0.05 99 Y
simplest_design 256 Q 0 estimator (Intercept) -0.11 0.10 -1.14 0.26 -0.31 0.08 99 Y
simplest_design 257 Q 0 estimator (Intercept) 0.03 0.09 0.33 0.74 -0.15 0.21 99 Y
simplest_design 258 Q 0 estimator (Intercept) -0.04 0.09 -0.40 0.69 -0.23 0.15 99 Y
simplest_design 259 Q 0 estimator (Intercept) 0.22 0.10 2.15 0.03 0.02 0.42 99 Y
simplest_design 260 Q 0 estimator (Intercept) -0.01 0.10 -0.06 0.95 -0.21 0.20 99 Y
simplest_design 261 Q 0 estimator (Intercept) 0.18 0.10 1.75 0.08 -0.02 0.39 99 Y
simplest_design 262 Q 0 estimator (Intercept) -0.02 0.10 -0.20 0.85 -0.22 0.18 99 Y
simplest_design 263 Q 0 estimator (Intercept) 0.05 0.10 0.50 0.61 -0.14 0.24 99 Y
simplest_design 264 Q 0 estimator (Intercept) 0.05 0.12 0.45 0.66 -0.18 0.28 99 Y
simplest_design 265 Q 0 estimator (Intercept) -0.02 0.10 -0.21 0.84 -0.22 0.18 99 Y
simplest_design 266 Q 0 estimator (Intercept) 0.02 0.11 0.23 0.82 -0.19 0.24 99 Y
simplest_design 267 Q 0 estimator (Intercept) 0.02 0.08 0.19 0.85 -0.15 0.18 99 Y
simplest_design 268 Q 0 estimator (Intercept) 0.01 0.11 0.08 0.94 -0.20 0.22 99 Y
simplest_design 269 Q 0 estimator (Intercept) -0.28 0.10 -2.72 0.01 -0.48 -0.07 99 Y
simplest_design 270 Q 0 estimator (Intercept) 0.08 0.11 0.80 0.42 -0.12 0.29 99 Y
simplest_design 271 Q 0 estimator (Intercept) 0.08 0.09 0.90 0.37 -0.09 0.25 99 Y
simplest_design 272 Q 0 estimator (Intercept) 0.05 0.10 0.47 0.64 -0.15 0.24 99 Y
simplest_design 273 Q 0 estimator (Intercept) 0.09 0.09 0.99 0.32 -0.09 0.27 99 Y
simplest_design 274 Q 0 estimator (Intercept) -0.24 0.11 -2.15 0.03 -0.46 -0.02 99 Y
simplest_design 275 Q 0 estimator (Intercept) 0.05 0.11 0.49 0.62 -0.16 0.27 99 Y
simplest_design 276 Q 0 estimator (Intercept) -0.13 0.10 -1.26 0.21 -0.33 0.07 99 Y
simplest_design 277 Q 0 estimator (Intercept) -0.04 0.09 -0.40 0.69 -0.22 0.14 99 Y
simplest_design 278 Q 0 estimator (Intercept) 0.12 0.10 1.28 0.21 -0.07 0.31 99 Y
simplest_design 279 Q 0 estimator (Intercept) -0.24 0.11 -2.22 0.03 -0.45 -0.02 99 Y
simplest_design 280 Q 0 estimator (Intercept) -0.04 0.09 -0.42 0.68 -0.23 0.15 99 Y
simplest_design 281 Q 0 estimator (Intercept) 0.20 0.11 1.90 0.06 -0.01 0.41 99 Y
simplest_design 282 Q 0 estimator (Intercept) 0.04 0.09 0.47 0.64 -0.14 0.22 99 Y
simplest_design 283 Q 0 estimator (Intercept) 0.18 0.11 1.68 0.10 -0.03 0.40 99 Y
simplest_design 284 Q 0 estimator (Intercept) 0.02 0.11 0.19 0.85 -0.19 0.23 99 Y
simplest_design 285 Q 0 estimator (Intercept) -0.18 0.09 -2.01 0.05 -0.37 0.00 99 Y
simplest_design 286 Q 0 estimator (Intercept) -0.07 0.10 -0.69 0.49 -0.26 0.13 99 Y
simplest_design 287 Q 0 estimator (Intercept) 0.09 0.11 0.81 0.42 -0.13 0.31 99 Y
simplest_design 288 Q 0 estimator (Intercept) -0.02 0.11 -0.20 0.84 -0.24 0.19 99 Y
simplest_design 289 Q 0 estimator (Intercept) -0.24 0.09 -2.71 0.01 -0.42 -0.06 99 Y
simplest_design 290 Q 0 estimator (Intercept) -0.08 0.10 -0.75 0.46 -0.28 0.12 99 Y
simplest_design 291 Q 0 estimator (Intercept) 0.12 0.11 1.05 0.30 -0.11 0.35 99 Y
simplest_design 292 Q 0 estimator (Intercept) 0.06 0.11 0.58 0.57 -0.16 0.28 99 Y
simplest_design 293 Q 0 estimator (Intercept) -0.13 0.10 -1.37 0.17 -0.33 0.06 99 Y
simplest_design 294 Q 0 estimator (Intercept) -0.02 0.10 -0.21 0.83 -0.22 0.18 99 Y
simplest_design 295 Q 0 estimator (Intercept) -0.08 0.09 -0.85 0.40 -0.27 0.11 99 Y
simplest_design 296 Q 0 estimator (Intercept) -0.17 0.11 -1.60 0.11 -0.38 0.04 99 Y
simplest_design 297 Q 0 estimator (Intercept) -0.09 0.10 -0.90 0.37 -0.28 0.11 99 Y
simplest_design 298 Q 0 estimator (Intercept) 0.14 0.11 1.23 0.22 -0.08 0.36 99 Y
simplest_design 299 Q 0 estimator (Intercept) -0.01 0.12 -0.07 0.94 -0.24 0.23 99 Y
simplest_design 300 Q 0 estimator (Intercept) 0.15 0.12 1.28 0.20 -0.08 0.39 99 Y
simplest_design 301 Q 0 estimator (Intercept) -0.11 0.09 -1.25 0.21 -0.30 0.07 99 Y
simplest_design 302 Q 0 estimator (Intercept) -0.02 0.09 -0.20 0.84 -0.20 0.16 99 Y
simplest_design 303 Q 0 estimator (Intercept) 0.08 0.10 0.78 0.44 -0.13 0.29 99 Y
simplest_design 304 Q 0 estimator (Intercept) -0.09 0.10 -0.84 0.40 -0.29 0.12 99 Y
simplest_design 305 Q 0 estimator (Intercept) -0.09 0.10 -0.93 0.36 -0.28 0.10 99 Y
simplest_design 306 Q 0 estimator (Intercept) -0.09 0.10 -0.98 0.33 -0.28 0.10 99 Y
simplest_design 307 Q 0 estimator (Intercept) 0.05 0.10 0.48 0.63 -0.15 0.25 99 Y
simplest_design 308 Q 0 estimator (Intercept) -0.06 0.11 -0.60 0.55 -0.27 0.15 99 Y
simplest_design 309 Q 0 estimator (Intercept) -0.10 0.10 -0.92 0.36 -0.30 0.11 99 Y
simplest_design 310 Q 0 estimator (Intercept) 0.01 0.10 0.10 0.92 -0.19 0.21 99 Y
simplest_design 311 Q 0 estimator (Intercept) -0.03 0.11 -0.32 0.75 -0.24 0.17 99 Y
simplest_design 312 Q 0 estimator (Intercept) 0.13 0.10 1.24 0.22 -0.08 0.34 99 Y
simplest_design 313 Q 0 estimator (Intercept) 0.08 0.09 0.89 0.38 -0.09 0.25 99 Y
simplest_design 314 Q 0 estimator (Intercept) 0.05 0.09 0.49 0.62 -0.14 0.23 99 Y
simplest_design 315 Q 0 estimator (Intercept) -0.03 0.09 -0.37 0.71 -0.20 0.14 99 Y
simplest_design 316 Q 0 estimator (Intercept) -0.03 0.10 -0.30 0.76 -0.23 0.17 99 Y
simplest_design 317 Q 0 estimator (Intercept) 0.00 0.09 0.01 0.99 -0.17 0.17 99 Y
simplest_design 318 Q 0 estimator (Intercept) -0.10 0.10 -0.98 0.33 -0.31 0.11 99 Y
simplest_design 319 Q 0 estimator (Intercept) -0.05 0.09 -0.55 0.58 -0.22 0.12 99 Y
simplest_design 320 Q 0 estimator (Intercept) 0.04 0.10 0.44 0.66 -0.15 0.24 99 Y
simplest_design 321 Q 0 estimator (Intercept) -0.03 0.10 -0.34 0.74 -0.23 0.16 99 Y
simplest_design 322 Q 0 estimator (Intercept) 0.06 0.08 0.66 0.51 -0.11 0.22 99 Y
simplest_design 323 Q 0 estimator (Intercept) -0.04 0.11 -0.40 0.69 -0.25 0.17 99 Y
simplest_design 324 Q 0 estimator (Intercept) -0.10 0.10 -1.05 0.29 -0.29 0.09 99 Y
simplest_design 325 Q 0 estimator (Intercept) 0.00 0.10 -0.01 0.99 -0.20 0.20 99 Y
simplest_design 326 Q 0 estimator (Intercept) -0.05 0.09 -0.60 0.55 -0.22 0.12 99 Y
simplest_design 327 Q 0 estimator (Intercept) 0.03 0.10 0.26 0.79 -0.17 0.23 99 Y
simplest_design 328 Q 0 estimator (Intercept) 0.18 0.09 1.86 0.07 -0.01 0.36 99 Y
simplest_design 329 Q 0 estimator (Intercept) -0.03 0.11 -0.27 0.79 -0.24 0.19 99 Y
simplest_design 330 Q 0 estimator (Intercept) 0.00 0.10 0.03 0.98 -0.20 0.20 99 Y
simplest_design 331 Q 0 estimator (Intercept) 0.02 0.11 0.22 0.83 -0.19 0.24 99 Y
simplest_design 332 Q 0 estimator (Intercept) 0.00 0.09 0.04 0.97 -0.18 0.19 99 Y
simplest_design 333 Q 0 estimator (Intercept) -0.05 0.10 -0.47 0.64 -0.25 0.15 99 Y
simplest_design 334 Q 0 estimator (Intercept) 0.01 0.10 0.14 0.89 -0.19 0.22 99 Y
simplest_design 335 Q 0 estimator (Intercept) -0.01 0.09 -0.16 0.87 -0.19 0.16 99 Y
simplest_design 336 Q 0 estimator (Intercept) 0.21 0.11 1.97 0.05 0.00 0.43 99 Y
simplest_design 337 Q 0 estimator (Intercept) 0.14 0.10 1.38 0.17 -0.06 0.35 99 Y
simplest_design 338 Q 0 estimator (Intercept) 0.09 0.10 0.88 0.38 -0.11 0.29 99 Y
simplest_design 339 Q 0 estimator (Intercept) -0.17 0.11 -1.52 0.13 -0.39 0.05 99 Y
simplest_design 340 Q 0 estimator (Intercept) 0.05 0.11 0.49 0.62 -0.16 0.27 99 Y
simplest_design 341 Q 0 estimator (Intercept) -0.05 0.09 -0.51 0.61 -0.23 0.13 99 Y
simplest_design 342 Q 0 estimator (Intercept) -0.06 0.08 -0.79 0.43 -0.22 0.10 99 Y
simplest_design 343 Q 0 estimator (Intercept) 0.03 0.09 0.34 0.73 -0.15 0.22 99 Y
simplest_design 344 Q 0 estimator (Intercept) 0.06 0.10 0.59 0.55 -0.14 0.27 99 Y
simplest_design 345 Q 0 estimator (Intercept) 0.12 0.09 1.30 0.20 -0.06 0.31 99 Y
simplest_design 346 Q 0 estimator (Intercept) -0.16 0.09 -1.73 0.09 -0.35 0.02 99 Y
simplest_design 347 Q 0 estimator (Intercept) -0.18 0.10 -1.79 0.08 -0.38 0.02 99 Y
simplest_design 348 Q 0 estimator (Intercept) 0.07 0.11 0.66 0.51 -0.14 0.29 99 Y
simplest_design 349 Q 0 estimator (Intercept) -0.08 0.10 -0.85 0.40 -0.28 0.11 99 Y
simplest_design 350 Q 0 estimator (Intercept) -0.03 0.09 -0.37 0.71 -0.21 0.14 99 Y
simplest_design 351 Q 0 estimator (Intercept) -0.06 0.10 -0.61 0.54 -0.26 0.14 99 Y
simplest_design 352 Q 0 estimator (Intercept) -0.12 0.09 -1.25 0.21 -0.30 0.07 99 Y
simplest_design 353 Q 0 estimator (Intercept) 0.00 0.10 0.02 0.98 -0.20 0.20 99 Y
simplest_design 354 Q 0 estimator (Intercept) 0.09 0.11 0.75 0.45 -0.14 0.31 99 Y
simplest_design 355 Q 0 estimator (Intercept) 0.03 0.10 0.29 0.77 -0.17 0.22 99 Y
simplest_design 356 Q 0 estimator (Intercept) 0.05 0.10 0.47 0.64 -0.15 0.24 99 Y
simplest_design 357 Q 0 estimator (Intercept) 0.02 0.10 0.20 0.84 -0.18 0.22 99 Y
simplest_design 358 Q 0 estimator (Intercept) 0.06 0.09 0.73 0.47 -0.11 0.23 99 Y
simplest_design 359 Q 0 estimator (Intercept) 0.15 0.09 1.63 0.11 -0.03 0.32 99 Y
simplest_design 360 Q 0 estimator (Intercept) 0.09 0.11 0.83 0.41 -0.13 0.31 99 Y
simplest_design 361 Q 0 estimator (Intercept) 0.05 0.09 0.57 0.57 -0.13 0.24 99 Y
simplest_design 362 Q 0 estimator (Intercept) 0.01 0.11 0.11 0.91 -0.21 0.24 99 Y
simplest_design 363 Q 0 estimator (Intercept) -0.13 0.09 -1.49 0.14 -0.31 0.04 99 Y
simplest_design 364 Q 0 estimator (Intercept) -0.04 0.10 -0.39 0.70 -0.23 0.15 99 Y
simplest_design 365 Q 0 estimator (Intercept) 0.03 0.10 0.35 0.73 -0.16 0.23 99 Y
simplest_design 366 Q 0 estimator (Intercept) 0.07 0.09 0.74 0.46 -0.12 0.25 99 Y
simplest_design 367 Q 0 estimator (Intercept) -0.08 0.12 -0.72 0.47 -0.31 0.15 99 Y
simplest_design 368 Q 0 estimator (Intercept) 0.02 0.11 0.18 0.86 -0.19 0.23 99 Y
simplest_design 369 Q 0 estimator (Intercept) 0.15 0.10 1.52 0.13 -0.05 0.35 99 Y
simplest_design 370 Q 0 estimator (Intercept) -0.06 0.10 -0.61 0.54 -0.25 0.13 99 Y
simplest_design 371 Q 0 estimator (Intercept) 0.07 0.10 0.75 0.46 -0.12 0.27 99 Y
simplest_design 372 Q 0 estimator (Intercept) -0.12 0.11 -1.13 0.26 -0.34 0.09 99 Y
simplest_design 373 Q 0 estimator (Intercept) -0.06 0.09 -0.62 0.54 -0.24 0.13 99 Y
simplest_design 374 Q 0 estimator (Intercept) -0.08 0.09 -0.85 0.40 -0.27 0.11 99 Y
simplest_design 375 Q 0 estimator (Intercept) -0.05 0.12 -0.44 0.66 -0.28 0.18 99 Y
simplest_design 376 Q 0 estimator (Intercept) -0.08 0.10 -0.84 0.40 -0.27 0.11 99 Y
simplest_design 377 Q 0 estimator (Intercept) -0.07 0.10 -0.72 0.47 -0.27 0.13 99 Y
simplest_design 378 Q 0 estimator (Intercept) -0.01 0.11 -0.11 0.91 -0.23 0.21 99 Y
simplest_design 379 Q 0 estimator (Intercept) -0.05 0.11 -0.48 0.63 -0.26 0.16 99 Y
simplest_design 380 Q 0 estimator (Intercept) -0.01 0.10 -0.11 0.91 -0.22 0.19 99 Y
simplest_design 381 Q 0 estimator (Intercept) 0.09 0.09 0.90 0.37 -0.10 0.27 99 Y
simplest_design 382 Q 0 estimator (Intercept) -0.18 0.10 -1.78 0.08 -0.38 0.02 99 Y
simplest_design 383 Q 0 estimator (Intercept) 0.21 0.09 2.50 0.01 0.04 0.38 99 Y
simplest_design 384 Q 0 estimator (Intercept) -0.14 0.10 -1.37 0.17 -0.34 0.06 99 Y
simplest_design 385 Q 0 estimator (Intercept) -0.16 0.10 -1.59 0.11 -0.37 0.04 99 Y
simplest_design 386 Q 0 estimator (Intercept) -0.07 0.10 -0.66 0.51 -0.26 0.13 99 Y
simplest_design 387 Q 0 estimator (Intercept) -0.02 0.10 -0.21 0.83 -0.22 0.17 99 Y
simplest_design 388 Q 0 estimator (Intercept) -0.06 0.10 -0.58 0.56 -0.27 0.15 99 Y
simplest_design 389 Q 0 estimator (Intercept) -0.15 0.10 -1.52 0.13 -0.35 0.05 99 Y
simplest_design 390 Q 0 estimator (Intercept) -0.08 0.11 -0.75 0.46 -0.29 0.13 99 Y
simplest_design 391 Q 0 estimator (Intercept) 0.03 0.09 0.34 0.74 -0.15 0.22 99 Y
simplest_design 392 Q 0 estimator (Intercept) 0.08 0.10 0.78 0.44 -0.12 0.28 99 Y
simplest_design 393 Q 0 estimator (Intercept) 0.06 0.11 0.59 0.56 -0.15 0.28 99 Y
simplest_design 394 Q 0 estimator (Intercept) 0.09 0.08 1.06 0.29 -0.08 0.26 99 Y
simplest_design 395 Q 0 estimator (Intercept) -0.02 0.10 -0.20 0.84 -0.22 0.18 99 Y
simplest_design 396 Q 0 estimator (Intercept) -0.04 0.08 -0.52 0.61 -0.21 0.12 99 Y
simplest_design 397 Q 0 estimator (Intercept) 0.06 0.11 0.55 0.58 -0.15 0.27 99 Y
simplest_design 398 Q 0 estimator (Intercept) -0.10 0.10 -1.04 0.30 -0.29 0.09 99 Y
simplest_design 399 Q 0 estimator (Intercept) 0.01 0.10 0.07 0.95 -0.19 0.21 99 Y
simplest_design 400 Q 0 estimator (Intercept) 0.09 0.10 0.96 0.34 -0.10 0.29 99 Y
simplest_design 401 Q 0 estimator (Intercept) 0.12 0.11 1.16 0.25 -0.09 0.33 99 Y
simplest_design 402 Q 0 estimator (Intercept) 0.09 0.10 0.93 0.35 -0.11 0.29 99 Y
simplest_design 403 Q 0 estimator (Intercept) -0.15 0.12 -1.30 0.20 -0.38 0.08 99 Y
simplest_design 404 Q 0 estimator (Intercept) -0.08 0.10 -0.73 0.47 -0.28 0.13 99 Y
simplest_design 405 Q 0 estimator (Intercept) 0.02 0.09 0.20 0.84 -0.17 0.20 99 Y
simplest_design 406 Q 0 estimator (Intercept) 0.07 0.10 0.67 0.51 -0.14 0.28 99 Y
simplest_design 407 Q 0 estimator (Intercept) 0.04 0.09 0.46 0.65 -0.14 0.22 99 Y
simplest_design 408 Q 0 estimator (Intercept) 0.06 0.09 0.61 0.55 -0.13 0.24 99 Y
simplest_design 409 Q 0 estimator (Intercept) 0.01 0.09 0.15 0.88 -0.17 0.20 99 Y
simplest_design 410 Q 0 estimator (Intercept) -0.05 0.09 -0.55 0.58 -0.23 0.13 99 Y
simplest_design 411 Q 0 estimator (Intercept) 0.03 0.09 0.36 0.72 -0.15 0.22 99 Y
simplest_design 412 Q 0 estimator (Intercept) -0.18 0.10 -1.87 0.06 -0.37 0.01 99 Y
simplest_design 413 Q 0 estimator (Intercept) 0.13 0.10 1.23 0.22 -0.08 0.33 99 Y
simplest_design 414 Q 0 estimator (Intercept) 0.06 0.11 0.51 0.61 -0.17 0.28 99 Y
simplest_design 415 Q 0 estimator (Intercept) -0.14 0.09 -1.58 0.12 -0.33 0.04 99 Y
simplest_design 416 Q 0 estimator (Intercept) -0.11 0.10 -1.10 0.27 -0.30 0.09 99 Y
simplest_design 417 Q 0 estimator (Intercept) 0.17 0.10 1.74 0.09 -0.02 0.36 99 Y
simplest_design 418 Q 0 estimator (Intercept) 0.20 0.09 2.15 0.03 0.02 0.39 99 Y
simplest_design 419 Q 0 estimator (Intercept) -0.01 0.10 -0.05 0.96 -0.20 0.19 99 Y
simplest_design 420 Q 0 estimator (Intercept) -0.09 0.09 -0.92 0.36 -0.27 0.10 99 Y
simplest_design 421 Q 0 estimator (Intercept) -0.06 0.10 -0.65 0.52 -0.26 0.13 99 Y
simplest_design 422 Q 0 estimator (Intercept) 0.26 0.10 2.66 0.01 0.07 0.45 99 Y
simplest_design 423 Q 0 estimator (Intercept) -0.01 0.11 -0.05 0.96 -0.21 0.20 99 Y
simplest_design 424 Q 0 estimator (Intercept) -0.03 0.10 -0.34 0.73 -0.23 0.16 99 Y
simplest_design 425 Q 0 estimator (Intercept) 0.02 0.12 0.20 0.84 -0.21 0.25 99 Y
simplest_design 426 Q 0 estimator (Intercept) 0.06 0.10 0.61 0.54 -0.13 0.25 99 Y
simplest_design 427 Q 0 estimator (Intercept) 0.17 0.11 1.58 0.12 -0.04 0.39 99 Y
simplest_design 428 Q 0 estimator (Intercept) -0.03 0.09 -0.35 0.73 -0.22 0.15 99 Y
simplest_design 429 Q 0 estimator (Intercept) 0.27 0.09 2.93 0.00 0.09 0.46 99 Y
simplest_design 430 Q 0 estimator (Intercept) -0.19 0.09 -1.98 0.05 -0.38 0.00 99 Y
simplest_design 431 Q 0 estimator (Intercept) -0.10 0.11 -0.89 0.38 -0.33 0.13 99 Y
simplest_design 432 Q 0 estimator (Intercept) 0.04 0.10 0.38 0.71 -0.16 0.23 99 Y
simplest_design 433 Q 0 estimator (Intercept) 0.01 0.09 0.10 0.92 -0.17 0.19 99 Y
simplest_design 434 Q 0 estimator (Intercept) -0.04 0.10 -0.43 0.67 -0.23 0.15 99 Y
simplest_design 435 Q 0 estimator (Intercept) 0.00 0.10 0.01 0.99 -0.20 0.20 99 Y
simplest_design 436 Q 0 estimator (Intercept) 0.05 0.11 0.49 0.63 -0.16 0.26 99 Y
simplest_design 437 Q 0 estimator (Intercept) -0.14 0.11 -1.28 0.20 -0.35 0.07 99 Y
simplest_design 438 Q 0 estimator (Intercept) 0.00 0.11 0.02 0.98 -0.21 0.22 99 Y
simplest_design 439 Q 0 estimator (Intercept) -0.01 0.10 -0.14 0.89 -0.22 0.19 99 Y
simplest_design 440 Q 0 estimator (Intercept) -0.01 0.09 -0.17 0.86 -0.19 0.16 99 Y
simplest_design 441 Q 0 estimator (Intercept) 0.09 0.10 0.86 0.39 -0.11 0.29 99 Y
simplest_design 442 Q 0 estimator (Intercept) 0.01 0.09 0.16 0.87 -0.17 0.20 99 Y
simplest_design 443 Q 0 estimator (Intercept) 0.01 0.11 0.11 0.91 -0.20 0.22 99 Y
simplest_design 444 Q 0 estimator (Intercept) 0.04 0.10 0.38 0.70 -0.16 0.23 99 Y
simplest_design 445 Q 0 estimator (Intercept) 0.19 0.09 2.12 0.04 0.01 0.37 99 Y
simplest_design 446 Q 0 estimator (Intercept) -0.06 0.09 -0.71 0.48 -0.24 0.11 99 Y
simplest_design 447 Q 0 estimator (Intercept) -0.07 0.11 -0.62 0.53 -0.28 0.15 99 Y
simplest_design 448 Q 0 estimator (Intercept) 0.13 0.11 1.19 0.24 -0.09 0.34 99 Y
simplest_design 449 Q 0 estimator (Intercept) 0.06 0.10 0.58 0.56 -0.14 0.25 99 Y
simplest_design 450 Q 0 estimator (Intercept) -0.06 0.11 -0.61 0.54 -0.27 0.14 99 Y
simplest_design 451 Q 0 estimator (Intercept) 0.00 0.09 -0.03 0.98 -0.19 0.18 99 Y
simplest_design 452 Q 0 estimator (Intercept) -0.07 0.11 -0.62 0.53 -0.28 0.14 99 Y
simplest_design 453 Q 0 estimator (Intercept) 0.11 0.10 1.11 0.27 -0.09 0.32 99 Y
simplest_design 454 Q 0 estimator (Intercept) -0.13 0.10 -1.32 0.19 -0.32 0.07 99 Y
simplest_design 455 Q 0 estimator (Intercept) -0.15 0.10 -1.59 0.12 -0.35 0.04 99 Y
simplest_design 456 Q 0 estimator (Intercept) 0.05 0.10 0.51 0.61 -0.15 0.25 99 Y
simplest_design 457 Q 0 estimator (Intercept) -0.14 0.11 -1.29 0.20 -0.36 0.08 99 Y
simplest_design 458 Q 0 estimator (Intercept) -0.04 0.10 -0.36 0.72 -0.23 0.16 99 Y
simplest_design 459 Q 0 estimator (Intercept) 0.01 0.10 0.05 0.96 -0.18 0.19 99 Y
simplest_design 460 Q 0 estimator (Intercept) -0.02 0.10 -0.20 0.84 -0.22 0.18 99 Y
simplest_design 461 Q 0 estimator (Intercept) -0.08 0.11 -0.77 0.44 -0.29 0.13 99 Y
simplest_design 462 Q 0 estimator (Intercept) -0.04 0.09 -0.46 0.65 -0.22 0.14 99 Y
simplest_design 463 Q 0 estimator (Intercept) 0.03 0.10 0.30 0.77 -0.16 0.22 99 Y
simplest_design 464 Q 0 estimator (Intercept) 0.13 0.10 1.35 0.18 -0.06 0.33 99 Y
simplest_design 465 Q 0 estimator (Intercept) -0.13 0.10 -1.37 0.17 -0.33 0.06 99 Y
simplest_design 466 Q 0 estimator (Intercept) -0.25 0.11 -2.26 0.03 -0.46 -0.03 99 Y
simplest_design 467 Q 0 estimator (Intercept) -0.17 0.10 -1.70 0.09 -0.38 0.03 99 Y
simplest_design 468 Q 0 estimator (Intercept) 0.16 0.10 1.68 0.10 -0.03 0.35 99 Y
simplest_design 469 Q 0 estimator (Intercept) -0.02 0.09 -0.20 0.84 -0.21 0.17 99 Y
simplest_design 470 Q 0 estimator (Intercept) -0.04 0.09 -0.45 0.65 -0.22 0.14 99 Y
simplest_design 471 Q 0 estimator (Intercept) -0.03 0.11 -0.28 0.78 -0.24 0.18 99 Y
simplest_design 472 Q 0 estimator (Intercept) 0.02 0.09 0.19 0.85 -0.17 0.20 99 Y
simplest_design 473 Q 0 estimator (Intercept) 0.07 0.11 0.67 0.50 -0.14 0.29 99 Y
simplest_design 474 Q 0 estimator (Intercept) -0.02 0.10 -0.21 0.84 -0.21 0.17 99 Y
simplest_design 475 Q 0 estimator (Intercept) 0.00 0.12 -0.02 0.98 -0.23 0.23 99 Y
simplest_design 476 Q 0 estimator (Intercept) -0.04 0.09 -0.48 0.63 -0.22 0.13 99 Y
simplest_design 477 Q 0 estimator (Intercept) 0.05 0.10 0.48 0.63 -0.16 0.26 99 Y
simplest_design 478 Q 0 estimator (Intercept) -0.02 0.11 -0.20 0.84 -0.24 0.19 99 Y
simplest_design 479 Q 0 estimator (Intercept) -0.03 0.11 -0.27 0.79 -0.25 0.19 99 Y
simplest_design 480 Q 0 estimator (Intercept) 0.03 0.09 0.29 0.77 -0.16 0.22 99 Y
simplest_design 481 Q 0 estimator (Intercept) -0.06 0.09 -0.61 0.54 -0.24 0.13 99 Y
simplest_design 482 Q 0 estimator (Intercept) 0.12 0.10 1.15 0.25 -0.09 0.32 99 Y
simplest_design 483 Q 0 estimator (Intercept) 0.01 0.10 0.12 0.91 -0.19 0.22 99 Y
simplest_design 484 Q 0 estimator (Intercept) -0.19 0.09 -1.99 0.05 -0.38 0.00 99 Y
simplest_design 485 Q 0 estimator (Intercept) -0.02 0.09 -0.21 0.83 -0.20 0.16 99 Y
simplest_design 486 Q 0 estimator (Intercept) 0.02 0.11 0.17 0.86 -0.20 0.23 99 Y
simplest_design 487 Q 0 estimator (Intercept) 0.13 0.08 1.56 0.12 -0.04 0.30 99 Y
simplest_design 488 Q 0 estimator (Intercept) 0.08 0.10 0.82 0.42 -0.11 0.27 99 Y
simplest_design 489 Q 0 estimator (Intercept) -0.03 0.10 -0.29 0.77 -0.24 0.18 99 Y
simplest_design 490 Q 0 estimator (Intercept) 0.06 0.09 0.63 0.53 -0.12 0.24 99 Y
simplest_design 491 Q 0 estimator (Intercept) 0.28 0.08 3.24 0.00 0.11 0.44 99 Y
simplest_design 492 Q 0 estimator (Intercept) -0.16 0.10 -1.59 0.12 -0.36 0.04 99 Y
simplest_design 493 Q 0 estimator (Intercept) -0.12 0.10 -1.18 0.24 -0.32 0.08 99 Y
simplest_design 494 Q 0 estimator (Intercept) 0.00 0.11 0.01 0.99 -0.22 0.22 99 Y
simplest_design 495 Q 0 estimator (Intercept) 0.05 0.10 0.52 0.60 -0.14 0.24 99 Y
simplest_design 496 Q 0 estimator (Intercept) -0.03 0.11 -0.28 0.78 -0.25 0.19 99 Y
simplest_design 497 Q 0 estimator (Intercept) 0.02 0.09 0.18 0.86 -0.17 0.20 99 Y
simplest_design 498 Q 0 estimator (Intercept) 0.08 0.11 0.73 0.47 -0.13 0.29 99 Y
simplest_design 499 Q 0 estimator (Intercept) -0.11 0.09 -1.22 0.23 -0.30 0.07 99 Y
simplest_design 500 Q 0 estimator (Intercept) 0.06 0.09 0.67 0.51 -0.12 0.24 99 Y
simplest_design 501 Q 0 estimator (Intercept) 0.13 0.11 1.17 0.25 -0.09 0.36 99 Y
simplest_design 502 Q 0 estimator (Intercept) -0.12 0.11 -1.15 0.25 -0.33 0.09 99 Y
simplest_design 503 Q 0 estimator (Intercept) -0.04 0.09 -0.44 0.66 -0.22 0.14 99 Y
simplest_design 504 Q 0 estimator (Intercept) -0.02 0.10 -0.22 0.82 -0.21 0.17 99 Y
simplest_design 505 Q 0 estimator (Intercept) -0.06 0.10 -0.55 0.59 -0.26 0.15 99 Y
simplest_design 506 Q 0 estimator (Intercept) 0.03 0.11 0.30 0.76 -0.18 0.24 99 Y
simplest_design 507 Q 0 estimator (Intercept) -0.08 0.11 -0.75 0.45 -0.29 0.13 99 Y
simplest_design 508 Q 0 estimator (Intercept) -0.03 0.11 -0.24 0.81 -0.25 0.20 99 Y
simplest_design 509 Q 0 estimator (Intercept) -0.10 0.10 -0.95 0.35 -0.30 0.10 99 Y
simplest_design 510 Q 0 estimator (Intercept) -0.02 0.11 -0.21 0.84 -0.24 0.19 99 Y
simplest_design 511 Q 0 estimator (Intercept) -0.16 0.10 -1.59 0.11 -0.36 0.04 99 Y
simplest_design 512 Q 0 estimator (Intercept) 0.15 0.10 1.48 0.14 -0.05 0.34 99 Y
simplest_design 513 Q 0 estimator (Intercept) 0.09 0.10 0.90 0.37 -0.11 0.28 99 Y
simplest_design 514 Q 0 estimator (Intercept) 0.02 0.11 0.19 0.85 -0.19 0.23 99 Y
simplest_design 515 Q 0 estimator (Intercept) 0.17 0.10 1.79 0.08 -0.02 0.36 99 Y
simplest_design 516 Q 0 estimator (Intercept) -0.04 0.10 -0.43 0.67 -0.25 0.16 99 Y
simplest_design 517 Q 0 estimator (Intercept) -0.07 0.10 -0.69 0.49 -0.27 0.13 99 Y
simplest_design 518 Q 0 estimator (Intercept) 0.04 0.10 0.44 0.66 -0.16 0.25 99 Y
simplest_design 519 Q 0 estimator (Intercept) 0.06 0.10 0.57 0.57 -0.15 0.26 99 Y
simplest_design 520 Q 0 estimator (Intercept) 0.15 0.09 1.69 0.09 -0.03 0.34 99 Y
simplest_design 521 Q 0 estimator (Intercept) -0.04 0.11 -0.34 0.73 -0.26 0.19 99 Y
simplest_design 522 Q 0 estimator (Intercept) 0.10 0.10 1.00 0.32 -0.10 0.31 99 Y
simplest_design 523 Q 0 estimator (Intercept) 0.07 0.10 0.71 0.48 -0.12 0.26 99 Y
simplest_design 524 Q 0 estimator (Intercept) -0.05 0.11 -0.43 0.67 -0.26 0.17 99 Y
simplest_design 525 Q 0 estimator (Intercept) 0.01 0.10 0.12 0.90 -0.19 0.22 99 Y
simplest_design 526 Q 0 estimator (Intercept) 0.02 0.12 0.21 0.84 -0.21 0.25 99 Y
simplest_design 527 Q 0 estimator (Intercept) 0.16 0.10 1.49 0.14 -0.05 0.36 99 Y
simplest_design 528 Q 0 estimator (Intercept) -0.22 0.10 -2.18 0.03 -0.42 -0.02 99 Y
simplest_design 529 Q 0 estimator (Intercept) -0.02 0.10 -0.19 0.85 -0.21 0.18 99 Y
simplest_design 530 Q 0 estimator (Intercept) 0.08 0.11 0.76 0.45 -0.13 0.30 99 Y
simplest_design 531 Q 0 estimator (Intercept) -0.04 0.11 -0.38 0.71 -0.26 0.17 99 Y
simplest_design 532 Q 0 estimator (Intercept) -0.01 0.10 -0.12 0.90 -0.21 0.19 99 Y
simplest_design 533 Q 0 estimator (Intercept) 0.02 0.11 0.22 0.82 -0.19 0.24 99 Y
simplest_design 534 Q 0 estimator (Intercept) -0.04 0.11 -0.36 0.72 -0.27 0.19 99 Y
simplest_design 535 Q 0 estimator (Intercept) -0.14 0.10 -1.36 0.18 -0.33 0.06 99 Y
simplest_design 536 Q 0 estimator (Intercept) -0.16 0.09 -1.67 0.10 -0.34 0.03 99 Y
simplest_design 537 Q 0 estimator (Intercept) -0.08 0.10 -0.82 0.42 -0.28 0.12 99 Y
simplest_design 538 Q 0 estimator (Intercept) -0.06 0.09 -0.63 0.53 -0.24 0.12 99 Y
simplest_design 539 Q 0 estimator (Intercept) 0.02 0.10 0.23 0.82 -0.17 0.22 99 Y
simplest_design 540 Q 0 estimator (Intercept) 0.02 0.11 0.20 0.84 -0.20 0.24 99 Y
simplest_design 541 Q 0 estimator (Intercept) 0.00 0.10 0.03 0.98 -0.19 0.20 99 Y
simplest_design 542 Q 0 estimator (Intercept) -0.05 0.10 -0.55 0.58 -0.25 0.14 99 Y
simplest_design 543 Q 0 estimator (Intercept) -0.02 0.10 -0.15 0.88 -0.22 0.19 99 Y
simplest_design 544 Q 0 estimator (Intercept) -0.03 0.10 -0.35 0.72 -0.22 0.16 99 Y
simplest_design 545 Q 0 estimator (Intercept) -0.13 0.09 -1.54 0.13 -0.30 0.04 99 Y
simplest_design 546 Q 0 estimator (Intercept) 0.00 0.10 0.04 0.97 -0.20 0.21 99 Y
simplest_design 547 Q 0 estimator (Intercept) 0.03 0.10 0.26 0.80 -0.17 0.22 99 Y
simplest_design 548 Q 0 estimator (Intercept) 0.07 0.11 0.69 0.49 -0.14 0.29 99 Y
simplest_design 549 Q 0 estimator (Intercept) -0.06 0.10 -0.64 0.53 -0.26 0.13 99 Y
simplest_design 550 Q 0 estimator (Intercept) -0.06 0.10 -0.62 0.54 -0.27 0.14 99 Y
simplest_design 551 Q 0 estimator (Intercept) -0.07 0.09 -0.75 0.45 -0.25 0.11 99 Y
simplest_design 552 Q 0 estimator (Intercept) 0.01 0.10 0.13 0.89 -0.19 0.21 99 Y
simplest_design 553 Q 0 estimator (Intercept) -0.02 0.08 -0.20 0.84 -0.18 0.15 99 Y
simplest_design 554 Q 0 estimator (Intercept) -0.08 0.10 -0.72 0.47 -0.28 0.13 99 Y
simplest_design 555 Q 0 estimator (Intercept) 0.08 0.10 0.78 0.44 -0.12 0.27 99 Y
simplest_design 556 Q 0 estimator (Intercept) 0.00 0.09 -0.02 0.98 -0.18 0.18 99 Y
simplest_design 557 Q 0 estimator (Intercept) -0.04 0.10 -0.45 0.65 -0.23 0.15 99 Y
simplest_design 558 Q 0 estimator (Intercept) 0.02 0.10 0.21 0.83 -0.17 0.21 99 Y
simplest_design 559 Q 0 estimator (Intercept) 0.16 0.09 1.76 0.08 -0.02 0.35 99 Y
simplest_design 560 Q 0 estimator (Intercept) -0.07 0.10 -0.72 0.47 -0.26 0.12 99 Y
simplest_design 561 Q 0 estimator (Intercept) 0.08 0.10 0.81 0.42 -0.12 0.28 99 Y
simplest_design 562 Q 0 estimator (Intercept) -0.23 0.10 -2.15 0.03 -0.43 -0.02 99 Y
simplest_design 563 Q 0 estimator (Intercept) 0.01 0.11 0.13 0.90 -0.20 0.23 99 Y
simplest_design 564 Q 0 estimator (Intercept) -0.07 0.11 -0.63 0.53 -0.29 0.15 99 Y
simplest_design 565 Q 0 estimator (Intercept) -0.02 0.09 -0.20 0.84 -0.19 0.16 99 Y
simplest_design 566 Q 0 estimator (Intercept) -0.13 0.11 -1.20 0.23 -0.34 0.08 99 Y
simplest_design 567 Q 0 estimator (Intercept) -0.18 0.10 -1.86 0.07 -0.37 0.01 99 Y
simplest_design 568 Q 0 estimator (Intercept) -0.16 0.10 -1.59 0.12 -0.37 0.04 99 Y
simplest_design 569 Q 0 estimator (Intercept) 0.06 0.10 0.59 0.56 -0.14 0.26 99 Y
simplest_design 570 Q 0 estimator (Intercept) 0.09 0.09 1.01 0.31 -0.09 0.28 99 Y
simplest_design 571 Q 0 estimator (Intercept) 0.21 0.10 2.05 0.04 0.01 0.41 99 Y
simplest_design 572 Q 0 estimator (Intercept) -0.07 0.10 -0.69 0.49 -0.26 0.12 99 Y
simplest_design 573 Q 0 estimator (Intercept) 0.18 0.09 1.87 0.06 -0.01 0.36 99 Y
simplest_design 574 Q 0 estimator (Intercept) 0.12 0.10 1.25 0.22 -0.07 0.31 99 Y
simplest_design 575 Q 0 estimator (Intercept) 0.09 0.09 1.03 0.30 -0.09 0.27 99 Y
simplest_design 576 Q 0 estimator (Intercept) 0.04 0.09 0.45 0.66 -0.14 0.22 99 Y
simplest_design 577 Q 0 estimator (Intercept) 0.07 0.10 0.65 0.52 -0.14 0.27 99 Y
simplest_design 578 Q 0 estimator (Intercept) 0.12 0.10 1.18 0.24 -0.08 0.31 99 Y
simplest_design 579 Q 0 estimator (Intercept) -0.17 0.11 -1.64 0.10 -0.39 0.04 99 Y
simplest_design 580 Q 0 estimator (Intercept) -0.16 0.09 -1.71 0.09 -0.35 0.03 99 Y
simplest_design 581 Q 0 estimator (Intercept) 0.13 0.12 1.02 0.31 -0.12 0.37 99 Y
simplest_design 582 Q 0 estimator (Intercept) 0.11 0.11 1.04 0.30 -0.10 0.32 99 Y
simplest_design 583 Q 0 estimator (Intercept) -0.01 0.10 -0.05 0.96 -0.20 0.19 99 Y
simplest_design 584 Q 0 estimator (Intercept) 0.07 0.11 0.67 0.51 -0.14 0.29 99 Y
simplest_design 585 Q 0 estimator (Intercept) -0.02 0.10 -0.24 0.81 -0.22 0.17 99 Y
simplest_design 586 Q 0 estimator (Intercept) -0.07 0.10 -0.70 0.48 -0.26 0.12 99 Y
simplest_design 587 Q 0 estimator (Intercept) 0.01 0.12 0.05 0.96 -0.23 0.25 99 Y
simplest_design 588 Q 0 estimator (Intercept) -0.25 0.11 -2.32 0.02 -0.46 -0.04 99 Y
simplest_design 589 Q 0 estimator (Intercept) -0.13 0.10 -1.34 0.18 -0.33 0.06 99 Y
simplest_design 590 Q 0 estimator (Intercept) -0.04 0.09 -0.46 0.64 -0.23 0.14 99 Y
simplest_design 591 Q 0 estimator (Intercept) -0.03 0.10 -0.28 0.78 -0.22 0.17 99 Y
simplest_design 592 Q 0 estimator (Intercept) 0.02 0.09 0.25 0.80 -0.16 0.21 99 Y
simplest_design 593 Q 0 estimator (Intercept) 0.23 0.10 2.21 0.03 0.02 0.43 99 Y
simplest_design 594 Q 0 estimator (Intercept) 0.05 0.11 0.40 0.69 -0.18 0.27 99 Y
simplest_design 595 Q 0 estimator (Intercept) 0.03 0.10 0.30 0.77 -0.17 0.23 99 Y
simplest_design 596 Q 0 estimator (Intercept) 0.10 0.09 1.10 0.27 -0.08 0.29 99 Y
simplest_design 597 Q 0 estimator (Intercept) -0.11 0.09 -1.18 0.24 -0.30 0.08 99 Y
simplest_design 598 Q 0 estimator (Intercept) 0.08 0.11 0.77 0.44 -0.13 0.30 99 Y
simplest_design 599 Q 0 estimator (Intercept) -0.02 0.09 -0.18 0.86 -0.20 0.16 99 Y
simplest_design 600 Q 0 estimator (Intercept) 0.08 0.10 0.76 0.45 -0.12 0.27 99 Y
simplest_design 601 Q 0 estimator (Intercept) -0.03 0.09 -0.36 0.72 -0.21 0.14 99 Y
simplest_design 602 Q 0 estimator (Intercept) -0.14 0.10 -1.44 0.15 -0.34 0.05 99 Y
simplest_design 603 Q 0 estimator (Intercept) -0.03 0.10 -0.33 0.74 -0.23 0.17 99 Y
simplest_design 604 Q 0 estimator (Intercept) -0.10 0.09 -1.05 0.30 -0.28 0.09 99 Y
simplest_design 605 Q 0 estimator (Intercept) 0.00 0.10 0.04 0.97 -0.20 0.21 99 Y
simplest_design 606 Q 0 estimator (Intercept) 0.08 0.08 0.93 0.36 -0.09 0.24 99 Y
simplest_design 607 Q 0 estimator (Intercept) -0.07 0.10 -0.70 0.48 -0.26 0.13 99 Y
simplest_design 608 Q 0 estimator (Intercept) 0.08 0.10 0.73 0.47 -0.13 0.28 99 Y
simplest_design 609 Q 0 estimator (Intercept) -0.07 0.11 -0.65 0.51 -0.28 0.14 99 Y
simplest_design 610 Q 0 estimator (Intercept) 0.19 0.10 1.91 0.06 -0.01 0.40 99 Y
simplest_design 611 Q 0 estimator (Intercept) -0.11 0.10 -1.11 0.27 -0.32 0.09 99 Y
simplest_design 612 Q 0 estimator (Intercept) -0.22 0.11 -2.04 0.04 -0.43 -0.01 99 Y
simplest_design 613 Q 0 estimator (Intercept) 0.01 0.11 0.12 0.90 -0.20 0.23 99 Y
simplest_design 614 Q 0 estimator (Intercept) 0.01 0.09 0.08 0.94 -0.18 0.19 99 Y
simplest_design 615 Q 0 estimator (Intercept) 0.08 0.10 0.88 0.38 -0.11 0.27 99 Y
simplest_design 616 Q 0 estimator (Intercept) -0.13 0.12 -1.11 0.27 -0.37 0.10 99 Y
simplest_design 617 Q 0 estimator (Intercept) -0.08 0.11 -0.74 0.46 -0.30 0.14 99 Y
simplest_design 618 Q 0 estimator (Intercept) 0.02 0.10 0.18 0.85 -0.18 0.22 99 Y
simplest_design 619 Q 0 estimator (Intercept) 0.03 0.11 0.31 0.76 -0.18 0.24 99 Y
simplest_design 620 Q 0 estimator (Intercept) 0.10 0.09 1.11 0.27 -0.08 0.28 99 Y
simplest_design 621 Q 0 estimator (Intercept) -0.11 0.10 -1.13 0.26 -0.30 0.08 99 Y
simplest_design 622 Q 0 estimator (Intercept) 0.03 0.10 0.32 0.75 -0.17 0.23 99 Y
simplest_design 623 Q 0 estimator (Intercept) -0.17 0.09 -1.83 0.07 -0.35 0.01 99 Y
simplest_design 624 Q 0 estimator (Intercept) 0.17 0.10 1.68 0.10 -0.03 0.36 99 Y
simplest_design 625 Q 0 estimator (Intercept) 0.16 0.09 1.75 0.08 -0.02 0.35 99 Y
simplest_design 626 Q 0 estimator (Intercept) 0.02 0.10 0.17 0.87 -0.19 0.22 99 Y
simplest_design 627 Q 0 estimator (Intercept) -0.15 0.11 -1.42 0.16 -0.37 0.06 99 Y
simplest_design 628 Q 0 estimator (Intercept) -0.08 0.10 -0.83 0.41 -0.27 0.11 99 Y
simplest_design 629 Q 0 estimator (Intercept) 0.01 0.10 0.08 0.93 -0.18 0.20 99 Y
simplest_design 630 Q 0 estimator (Intercept) -0.15 0.10 -1.45 0.15 -0.36 0.06 99 Y
simplest_design 631 Q 0 estimator (Intercept) -0.01 0.09 -0.11 0.91 -0.19 0.17 99 Y
simplest_design 632 Q 0 estimator (Intercept) 0.01 0.10 0.13 0.90 -0.18 0.21 99 Y
simplest_design 633 Q 0 estimator (Intercept) -0.11 0.10 -1.16 0.25 -0.31 0.08 99 Y
simplest_design 634 Q 0 estimator (Intercept) -0.06 0.10 -0.59 0.56 -0.25 0.13 99 Y
simplest_design 635 Q 0 estimator (Intercept) 0.06 0.10 0.62 0.54 -0.13 0.26 99 Y
simplest_design 636 Q 0 estimator (Intercept) -0.04 0.09 -0.46 0.65 -0.22 0.14 99 Y
simplest_design 637 Q 0 estimator (Intercept) -0.05 0.10 -0.47 0.64 -0.26 0.16 99 Y
simplest_design 638 Q 0 estimator (Intercept) 0.08 0.10 0.73 0.47 -0.13 0.28 99 Y
simplest_design 639 Q 0 estimator (Intercept) 0.07 0.11 0.67 0.51 -0.15 0.30 99 Y
simplest_design 640 Q 0 estimator (Intercept) -0.03 0.10 -0.34 0.74 -0.23 0.16 99 Y
simplest_design 641 Q 0 estimator (Intercept) 0.10 0.10 0.96 0.34 -0.10 0.30 99 Y
simplest_design 642 Q 0 estimator (Intercept) -0.01 0.09 -0.11 0.92 -0.19 0.17 99 Y
simplest_design 643 Q 0 estimator (Intercept) -0.09 0.10 -0.90 0.37 -0.30 0.11 99 Y
simplest_design 644 Q 0 estimator (Intercept) -0.04 0.08 -0.52 0.60 -0.21 0.12 99 Y
simplest_design 645 Q 0 estimator (Intercept) 0.00 0.11 0.04 0.97 -0.21 0.22 99 Y
simplest_design 646 Q 0 estimator (Intercept) -0.17 0.11 -1.53 0.13 -0.40 0.05 99 Y
simplest_design 647 Q 0 estimator (Intercept) -0.02 0.10 -0.18 0.86 -0.22 0.19 99 Y
simplest_design 648 Q 0 estimator (Intercept) -0.03 0.11 -0.28 0.78 -0.24 0.18 99 Y
simplest_design 649 Q 0 estimator (Intercept) 0.10 0.11 0.89 0.38 -0.12 0.31 99 Y
simplest_design 650 Q 0 estimator (Intercept) 0.02 0.10 0.24 0.81 -0.17 0.22 99 Y
simplest_design 651 Q 0 estimator (Intercept) -0.06 0.11 -0.52 0.60 -0.27 0.16 99 Y
simplest_design 652 Q 0 estimator (Intercept) 0.07 0.10 0.66 0.51 -0.13 0.26 99 Y
simplest_design 653 Q 0 estimator (Intercept) -0.14 0.09 -1.57 0.12 -0.32 0.04 99 Y
simplest_design 654 Q 0 estimator (Intercept) -0.01 0.10 -0.11 0.91 -0.21 0.19 99 Y
simplest_design 655 Q 0 estimator (Intercept) 0.10 0.10 1.05 0.29 -0.09 0.29 99 Y
simplest_design 656 Q 0 estimator (Intercept) -0.23 0.09 -2.50 0.01 -0.41 -0.05 99 Y
simplest_design 657 Q 0 estimator (Intercept) -0.19 0.12 -1.63 0.11 -0.42 0.04 99 Y
simplest_design 658 Q 0 estimator (Intercept) 0.00 0.10 0.05 0.96 -0.19 0.19 99 Y
simplest_design 659 Q 0 estimator (Intercept) 0.11 0.10 1.18 0.24 -0.08 0.30 99 Y
simplest_design 660 Q 0 estimator (Intercept) 0.00 0.09 -0.01 0.99 -0.18 0.17 99 Y
simplest_design 661 Q 0 estimator (Intercept) 0.11 0.10 1.09 0.28 -0.09 0.32 99 Y
simplest_design 662 Q 0 estimator (Intercept) -0.03 0.11 -0.30 0.77 -0.25 0.18 99 Y
simplest_design 663 Q 0 estimator (Intercept) 0.15 0.10 1.48 0.14 -0.05 0.35 99 Y
simplest_design 664 Q 0 estimator (Intercept) 0.07 0.11 0.64 0.52 -0.15 0.29 99 Y
simplest_design 665 Q 0 estimator (Intercept) 0.11 0.10 1.13 0.26 -0.08 0.30 99 Y
simplest_design 666 Q 0 estimator (Intercept) -0.01 0.10 -0.08 0.94 -0.22 0.20 99 Y
simplest_design 667 Q 0 estimator (Intercept) 0.11 0.10 1.10 0.27 -0.09 0.32 99 Y
simplest_design 668 Q 0 estimator (Intercept) -0.03 0.11 -0.32 0.75 -0.25 0.18 99 Y
simplest_design 669 Q 0 estimator (Intercept) -0.03 0.11 -0.29 0.77 -0.25 0.19 99 Y
simplest_design 670 Q 0 estimator (Intercept) -0.07 0.10 -0.72 0.48 -0.26 0.12 99 Y
simplest_design 671 Q 0 estimator (Intercept) -0.13 0.10 -1.36 0.18 -0.33 0.06 99 Y
simplest_design 672 Q 0 estimator (Intercept) -0.06 0.09 -0.60 0.55 -0.24 0.13 99 Y
simplest_design 673 Q 0 estimator (Intercept) 0.07 0.09 0.71 0.48 -0.12 0.25 99 Y
simplest_design 674 Q 0 estimator (Intercept) -0.03 0.11 -0.28 0.78 -0.24 0.18 99 Y
simplest_design 675 Q 0 estimator (Intercept) -0.05 0.11 -0.47 0.64 -0.27 0.16 99 Y
simplest_design 676 Q 0 estimator (Intercept) -0.05 0.10 -0.56 0.58 -0.24 0.14 99 Y
simplest_design 677 Q 0 estimator (Intercept) -0.11 0.10 -1.12 0.27 -0.31 0.09 99 Y
simplest_design 678 Q 0 estimator (Intercept) 0.14 0.11 1.25 0.22 -0.08 0.36 99 Y
simplest_design 679 Q 0 estimator (Intercept) 0.12 0.09 1.26 0.21 -0.07 0.31 99 Y
simplest_design 680 Q 0 estimator (Intercept) 0.06 0.10 0.67 0.51 -0.13 0.25 99 Y
simplest_design 681 Q 0 estimator (Intercept) -0.06 0.10 -0.53 0.59 -0.26 0.15 99 Y
simplest_design 682 Q 0 estimator (Intercept) 0.23 0.10 2.31 0.02 0.03 0.43 99 Y
simplest_design 683 Q 0 estimator (Intercept) -0.08 0.10 -0.72 0.47 -0.28 0.13 99 Y
simplest_design 684 Q 0 estimator (Intercept) -0.09 0.10 -0.94 0.35 -0.29 0.10 99 Y
simplest_design 685 Q 0 estimator (Intercept) 0.01 0.10 0.13 0.90 -0.19 0.22 99 Y
simplest_design 686 Q 0 estimator (Intercept) 0.08 0.10 0.83 0.41 -0.12 0.29 99 Y
simplest_design 687 Q 0 estimator (Intercept) -0.05 0.10 -0.47 0.64 -0.25 0.16 99 Y
simplest_design 688 Q 0 estimator (Intercept) 0.01 0.11 0.05 0.96 -0.21 0.22 99 Y
simplest_design 689 Q 0 estimator (Intercept) 0.03 0.09 0.28 0.78 -0.16 0.21 99 Y
simplest_design 690 Q 0 estimator (Intercept) 0.07 0.10 0.67 0.50 -0.13 0.26 99 Y
simplest_design 691 Q 0 estimator (Intercept) -0.10 0.10 -0.94 0.35 -0.31 0.11 99 Y
simplest_design 692 Q 0 estimator (Intercept) -0.15 0.10 -1.55 0.12 -0.34 0.04 99 Y
simplest_design 693 Q 0 estimator (Intercept) 0.00 0.09 -0.01 0.99 -0.19 0.19 99 Y
simplest_design 694 Q 0 estimator (Intercept) -0.11 0.11 -1.07 0.29 -0.32 0.10 99 Y
simplest_design 695 Q 0 estimator (Intercept) -0.03 0.10 -0.34 0.73 -0.23 0.16 99 Y
simplest_design 696 Q 0 estimator (Intercept) -0.03 0.09 -0.36 0.72 -0.22 0.15 99 Y
simplest_design 697 Q 0 estimator (Intercept) 0.09 0.09 0.99 0.32 -0.09 0.28 99 Y
simplest_design 698 Q 0 estimator (Intercept) 0.00 0.12 -0.04 0.97 -0.23 0.22 99 Y
simplest_design 699 Q 0 estimator (Intercept) -0.01 0.10 -0.11 0.91 -0.21 0.19 99 Y
simplest_design 700 Q 0 estimator (Intercept) 0.04 0.10 0.42 0.68 -0.15 0.23 99 Y
simplest_design 701 Q 0 estimator (Intercept) -0.33 0.10 -3.20 0.00 -0.53 -0.12 99 Y
simplest_design 702 Q 0 estimator (Intercept) -0.07 0.10 -0.74 0.46 -0.27 0.12 99 Y
simplest_design 703 Q 0 estimator (Intercept) -0.09 0.10 -0.89 0.37 -0.30 0.11 99 Y
simplest_design 704 Q 0 estimator (Intercept) -0.05 0.10 -0.48 0.63 -0.25 0.15 99 Y
simplest_design 705 Q 0 estimator (Intercept) 0.22 0.09 2.41 0.02 0.04 0.40 99 Y
simplest_design 706 Q 0 estimator (Intercept) 0.02 0.09 0.21 0.83 -0.17 0.21 99 Y
simplest_design 707 Q 0 estimator (Intercept) 0.23 0.09 2.49 0.01 0.05 0.40 99 Y
simplest_design 708 Q 0 estimator (Intercept) 0.12 0.10 1.20 0.23 -0.08 0.31 99 Y
simplest_design 709 Q 0 estimator (Intercept) -0.02 0.10 -0.15 0.88 -0.21 0.18 99 Y
simplest_design 710 Q 0 estimator (Intercept) -0.14 0.09 -1.55 0.13 -0.33 0.04 99 Y
simplest_design 711 Q 0 estimator (Intercept) -0.04 0.09 -0.45 0.65 -0.21 0.13 99 Y
simplest_design 712 Q 0 estimator (Intercept) 0.00 0.10 -0.04 0.96 -0.20 0.19 99 Y
simplest_design 713 Q 0 estimator (Intercept) 0.13 0.10 1.27 0.21 -0.07 0.33 99 Y
simplest_design 714 Q 0 estimator (Intercept) -0.05 0.09 -0.53 0.60 -0.23 0.14 99 Y
simplest_design 715 Q 0 estimator (Intercept) 0.00 0.11 -0.04 0.97 -0.23 0.22 99 Y
simplest_design 716 Q 0 estimator (Intercept) 0.14 0.11 1.30 0.20 -0.07 0.36 99 Y
simplest_design 717 Q 0 estimator (Intercept) 0.03 0.09 0.39 0.70 -0.14 0.21 99 Y
simplest_design 718 Q 0 estimator (Intercept) 0.06 0.09 0.61 0.54 -0.13 0.25 99 Y
simplest_design 719 Q 0 estimator (Intercept) -0.04 0.10 -0.46 0.65 -0.24 0.15 99 Y
simplest_design 720 Q 0 estimator (Intercept) 0.09 0.09 0.97 0.34 -0.09 0.27 99 Y
simplest_design 721 Q 0 estimator (Intercept) -0.02 0.10 -0.20 0.84 -0.22 0.18 99 Y
simplest_design 722 Q 0 estimator (Intercept) 0.07 0.10 0.71 0.48 -0.12 0.26 99 Y
simplest_design 723 Q 0 estimator (Intercept) -0.12 0.10 -1.16 0.25 -0.32 0.08 99 Y
simplest_design 724 Q 0 estimator (Intercept) -0.04 0.09 -0.47 0.64 -0.21 0.13 99 Y
simplest_design 725 Q 0 estimator (Intercept) 0.04 0.11 0.35 0.73 -0.17 0.25 99 Y
simplest_design 726 Q 0 estimator (Intercept) -0.09 0.10 -0.89 0.37 -0.30 0.11 99 Y
simplest_design 727 Q 0 estimator (Intercept) 0.07 0.10 0.68 0.50 -0.13 0.26 99 Y
simplest_design 728 Q 0 estimator (Intercept) 0.11 0.10 1.09 0.28 -0.09 0.32 99 Y
simplest_design 729 Q 0 estimator (Intercept) -0.03 0.09 -0.37 0.71 -0.22 0.15 99 Y
simplest_design 730 Q 0 estimator (Intercept) 0.08 0.11 0.77 0.44 -0.13 0.30 99 Y
simplest_design 731 Q 0 estimator (Intercept) -0.12 0.09 -1.37 0.17 -0.30 0.05 99 Y
simplest_design 732 Q 0 estimator (Intercept) 0.02 0.09 0.27 0.79 -0.16 0.21 99 Y
simplest_design 733 Q 0 estimator (Intercept) -0.16 0.10 -1.64 0.10 -0.36 0.03 99 Y
simplest_design 734 Q 0 estimator (Intercept) -0.01 0.09 -0.08 0.94 -0.19 0.18 99 Y
simplest_design 735 Q 0 estimator (Intercept) 0.01 0.10 0.09 0.93 -0.18 0.20 99 Y
simplest_design 736 Q 0 estimator (Intercept) 0.03 0.10 0.32 0.75 -0.16 0.23 99 Y
simplest_design 737 Q 0 estimator (Intercept) -0.05 0.10 -0.49 0.63 -0.24 0.14 99 Y
simplest_design 738 Q 0 estimator (Intercept) -0.17 0.10 -1.76 0.08 -0.36 0.02 99 Y
simplest_design 739 Q 0 estimator (Intercept) -0.33 0.10 -3.35 0.00 -0.52 -0.13 99 Y
simplest_design 740 Q 0 estimator (Intercept) -0.08 0.10 -0.81 0.42 -0.28 0.12 99 Y
simplest_design 741 Q 0 estimator (Intercept) 0.00 0.11 0.01 0.99 -0.22 0.22 99 Y
simplest_design 742 Q 0 estimator (Intercept) 0.01 0.10 0.06 0.96 -0.18 0.19 99 Y
simplest_design 743 Q 0 estimator (Intercept) 0.06 0.10 0.66 0.51 -0.13 0.26 99 Y
simplest_design 744 Q 0 estimator (Intercept) 0.00 0.12 -0.04 0.97 -0.23 0.22 99 Y
simplest_design 745 Q 0 estimator (Intercept) 0.12 0.09 1.27 0.21 -0.07 0.30 99 Y
simplest_design 746 Q 0 estimator (Intercept) 0.00 0.10 0.03 0.98 -0.19 0.20 99 Y
simplest_design 747 Q 0 estimator (Intercept) 0.02 0.10 0.18 0.85 -0.18 0.22 99 Y
simplest_design 748 Q 0 estimator (Intercept) -0.17 0.09 -1.83 0.07 -0.35 0.01 99 Y
simplest_design 749 Q 0 estimator (Intercept) -0.18 0.10 -1.83 0.07 -0.38 0.01 99 Y
simplest_design 750 Q 0 estimator (Intercept) -0.07 0.10 -0.72 0.47 -0.27 0.13 99 Y
simplest_design 751 Q 0 estimator (Intercept) 0.10 0.10 0.99 0.32 -0.10 0.29 99 Y
simplest_design 752 Q 0 estimator (Intercept) 0.14 0.11 1.32 0.19 -0.07 0.36 99 Y
simplest_design 753 Q 0 estimator (Intercept) -0.12 0.10 -1.23 0.22 -0.31 0.07 99 Y
simplest_design 754 Q 0 estimator (Intercept) -0.02 0.10 -0.18 0.86 -0.23 0.19 99 Y
simplest_design 755 Q 0 estimator (Intercept) 0.01 0.10 0.11 0.92 -0.18 0.20 99 Y
simplest_design 756 Q 0 estimator (Intercept) 0.00 0.11 -0.02 0.99 -0.21 0.21 99 Y
simplest_design 757 Q 0 estimator (Intercept) 0.00 0.10 0.04 0.97 -0.20 0.21 99 Y
simplest_design 758 Q 0 estimator (Intercept) -0.23 0.10 -2.30 0.02 -0.42 -0.03 99 Y
simplest_design 759 Q 0 estimator (Intercept) 0.18 0.10 1.78 0.08 -0.02 0.38 99 Y
simplest_design 760 Q 0 estimator (Intercept) 0.01 0.10 0.09 0.93 -0.20 0.22 99 Y
simplest_design 761 Q 0 estimator (Intercept) -0.12 0.10 -1.17 0.24 -0.31 0.08 99 Y
simplest_design 762 Q 0 estimator (Intercept) 0.13 0.10 1.30 0.20 -0.07 0.33 99 Y
simplest_design 763 Q 0 estimator (Intercept) -0.07 0.10 -0.70 0.49 -0.28 0.14 99 Y
simplest_design 764 Q 0 estimator (Intercept) -0.07 0.09 -0.76 0.45 -0.26 0.12 99 Y
simplest_design 765 Q 0 estimator (Intercept) 0.00 0.09 0.03 0.98 -0.17 0.18 99 Y
simplest_design 766 Q 0 estimator (Intercept) -0.13 0.10 -1.21 0.23 -0.33 0.08 99 Y
simplest_design 767 Q 0 estimator (Intercept) -0.02 0.10 -0.21 0.83 -0.22 0.18 99 Y
simplest_design 768 Q 0 estimator (Intercept) -0.04 0.11 -0.37 0.72 -0.25 0.17 99 Y
simplest_design 769 Q 0 estimator (Intercept) -0.24 0.10 -2.45 0.02 -0.43 -0.05 99 Y
simplest_design 770 Q 0 estimator (Intercept) -0.06 0.11 -0.50 0.62 -0.28 0.16 99 Y
simplest_design 771 Q 0 estimator (Intercept) 0.13 0.10 1.31 0.19 -0.07 0.33 99 Y
simplest_design 772 Q 0 estimator (Intercept) -0.05 0.11 -0.43 0.67 -0.27 0.17 99 Y
simplest_design 773 Q 0 estimator (Intercept) -0.01 0.10 -0.07 0.94 -0.20 0.19 99 Y
simplest_design 774 Q 0 estimator (Intercept) -0.06 0.11 -0.60 0.55 -0.28 0.15 99 Y
simplest_design 775 Q 0 estimator (Intercept) 0.00 0.09 0.02 0.98 -0.18 0.18 99 Y
simplest_design 776 Q 0 estimator (Intercept) 0.03 0.11 0.28 0.78 -0.18 0.24 99 Y
simplest_design 777 Q 0 estimator (Intercept) 0.03 0.11 0.27 0.79 -0.18 0.24 99 Y
simplest_design 778 Q 0 estimator (Intercept) 0.04 0.10 0.37 0.71 -0.16 0.23 99 Y
simplest_design 779 Q 0 estimator (Intercept) 0.05 0.11 0.49 0.63 -0.17 0.28 99 Y
simplest_design 780 Q 0 estimator (Intercept) 0.06 0.09 0.61 0.54 -0.13 0.24 99 Y
simplest_design 781 Q 0 estimator (Intercept) -0.08 0.10 -0.83 0.41 -0.28 0.11 99 Y
simplest_design 782 Q 0 estimator (Intercept) 0.04 0.09 0.44 0.66 -0.13 0.21 99 Y
simplest_design 783 Q 0 estimator (Intercept) 0.12 0.11 1.17 0.25 -0.09 0.33 99 Y
simplest_design 784 Q 0 estimator (Intercept) 0.00 0.10 0.01 0.99 -0.20 0.20 99 Y
simplest_design 785 Q 0 estimator (Intercept) 0.10 0.10 1.02 0.31 -0.09 0.29 99 Y
simplest_design 786 Q 0 estimator (Intercept) -0.07 0.10 -0.78 0.44 -0.27 0.12 99 Y
simplest_design 787 Q 0 estimator (Intercept) 0.02 0.11 0.22 0.82 -0.20 0.25 99 Y
simplest_design 788 Q 0 estimator (Intercept) 0.08 0.10 0.82 0.41 -0.12 0.28 99 Y
simplest_design 789 Q 0 estimator (Intercept) -0.12 0.10 -1.13 0.26 -0.32 0.09 99 Y
simplest_design 790 Q 0 estimator (Intercept) -0.31 0.10 -3.03 0.00 -0.51 -0.11 99 Y
simplest_design 791 Q 0 estimator (Intercept) -0.04 0.09 -0.43 0.67 -0.22 0.14 99 Y
simplest_design 792 Q 0 estimator (Intercept) 0.20 0.09 2.14 0.03 0.02 0.39 99 Y
simplest_design 793 Q 0 estimator (Intercept) 0.16 0.09 1.83 0.07 -0.01 0.34 99 Y
simplest_design 794 Q 0 estimator (Intercept) -0.01 0.10 -0.11 0.91 -0.20 0.18 99 Y
simplest_design 795 Q 0 estimator (Intercept) -0.02 0.10 -0.20 0.84 -0.22 0.18 99 Y
simplest_design 796 Q 0 estimator (Intercept) -0.04 0.10 -0.44 0.66 -0.24 0.16 99 Y
simplest_design 797 Q 0 estimator (Intercept) -0.01 0.10 -0.06 0.95 -0.20 0.19 99 Y
simplest_design 798 Q 0 estimator (Intercept) 0.10 0.10 0.93 0.36 -0.11 0.30 99 Y
simplest_design 799 Q 0 estimator (Intercept) 0.19 0.10 1.94 0.06 0.00 0.38 99 Y
simplest_design 800 Q 0 estimator (Intercept) -0.01 0.10 -0.13 0.89 -0.21 0.18 99 Y
simplest_design 801 Q 0 estimator (Intercept) 0.09 0.10 0.96 0.34 -0.10 0.28 99 Y
simplest_design 802 Q 0 estimator (Intercept) 0.01 0.11 0.07 0.95 -0.20 0.22 99 Y
simplest_design 803 Q 0 estimator (Intercept) -0.07 0.10 -0.75 0.46 -0.26 0.12 99 Y
simplest_design 804 Q 0 estimator (Intercept) 0.09 0.10 0.94 0.35 -0.10 0.29 99 Y
simplest_design 805 Q 0 estimator (Intercept) -0.09 0.11 -0.86 0.39 -0.30 0.12 99 Y
simplest_design 806 Q 0 estimator (Intercept) 0.05 0.12 0.46 0.64 -0.18 0.28 99 Y
simplest_design 807 Q 0 estimator (Intercept) -0.22 0.11 -2.00 0.05 -0.44 0.00 99 Y
simplest_design 808 Q 0 estimator (Intercept) -0.28 0.09 -3.21 0.00 -0.45 -0.11 99 Y
simplest_design 809 Q 0 estimator (Intercept) -0.16 0.11 -1.49 0.14 -0.37 0.05 99 Y
simplest_design 810 Q 0 estimator (Intercept) 0.02 0.11 0.15 0.88 -0.20 0.24 99 Y
simplest_design 811 Q 0 estimator (Intercept) 0.00 0.10 -0.05 0.96 -0.20 0.19 99 Y
simplest_design 812 Q 0 estimator (Intercept) 0.05 0.10 0.47 0.64 -0.16 0.25 99 Y
simplest_design 813 Q 0 estimator (Intercept) -0.05 0.11 -0.46 0.65 -0.26 0.16 99 Y
simplest_design 814 Q 0 estimator (Intercept) 0.09 0.10 0.98 0.33 -0.10 0.29 99 Y
simplest_design 815 Q 0 estimator (Intercept) 0.09 0.10 0.89 0.38 -0.11 0.28 99 Y
simplest_design 816 Q 0 estimator (Intercept) 0.00 0.11 -0.02 0.99 -0.23 0.23 99 Y
simplest_design 817 Q 0 estimator (Intercept) -0.03 0.10 -0.25 0.80 -0.23 0.18 99 Y
simplest_design 818 Q 0 estimator (Intercept) 0.10 0.10 1.03 0.31 -0.09 0.30 99 Y
simplest_design 819 Q 0 estimator (Intercept) 0.20 0.09 2.26 0.03 0.02 0.38 99 Y
simplest_design 820 Q 0 estimator (Intercept) -0.01 0.09 -0.15 0.88 -0.20 0.17 99 Y
simplest_design 821 Q 0 estimator (Intercept) -0.07 0.10 -0.74 0.46 -0.26 0.12 99 Y
simplest_design 822 Q 0 estimator (Intercept) -0.05 0.11 -0.42 0.68 -0.26 0.17 99 Y
simplest_design 823 Q 0 estimator (Intercept) 0.07 0.10 0.77 0.44 -0.12 0.26 99 Y
simplest_design 824 Q 0 estimator (Intercept) 0.16 0.09 1.71 0.09 -0.03 0.34 99 Y
simplest_design 825 Q 0 estimator (Intercept) -0.04 0.10 -0.43 0.67 -0.24 0.16 99 Y
simplest_design 826 Q 0 estimator (Intercept) -0.04 0.10 -0.36 0.72 -0.24 0.17 99 Y
simplest_design 827 Q 0 estimator (Intercept) 0.09 0.11 0.84 0.40 -0.13 0.31 99 Y
simplest_design 828 Q 0 estimator (Intercept) 0.20 0.10 2.05 0.04 0.01 0.40 99 Y
simplest_design 829 Q 0 estimator (Intercept) 0.10 0.09 1.12 0.27 -0.07 0.26 99 Y
simplest_design 830 Q 0 estimator (Intercept) 0.04 0.10 0.41 0.68 -0.16 0.25 99 Y
simplest_design 831 Q 0 estimator (Intercept) 0.08 0.09 0.85 0.40 -0.11 0.27 99 Y
simplest_design 832 Q 0 estimator (Intercept) -0.12 0.10 -1.18 0.24 -0.31 0.08 99 Y
simplest_design 833 Q 0 estimator (Intercept) -0.04 0.13 -0.32 0.75 -0.29 0.21 99 Y
simplest_design 834 Q 0 estimator (Intercept) 0.02 0.11 0.16 0.87 -0.20 0.24 99 Y
simplest_design 835 Q 0 estimator (Intercept) -0.01 0.09 -0.07 0.94 -0.19 0.18 99 Y
simplest_design 836 Q 0 estimator (Intercept) -0.04 0.11 -0.33 0.74 -0.25 0.18 99 Y
simplest_design 837 Q 0 estimator (Intercept) -0.16 0.10 -1.61 0.11 -0.35 0.04 99 Y
simplest_design 838 Q 0 estimator (Intercept) -0.21 0.11 -1.81 0.07 -0.43 0.02 99 Y
simplest_design 839 Q 0 estimator (Intercept) -0.05 0.10 -0.54 0.59 -0.25 0.14 99 Y
simplest_design 840 Q 0 estimator (Intercept) 0.04 0.11 0.38 0.70 -0.18 0.26 99 Y
simplest_design 841 Q 0 estimator (Intercept) -0.02 0.10 -0.24 0.81 -0.21 0.17 99 Y
simplest_design 842 Q 0 estimator (Intercept) -0.08 0.10 -0.76 0.45 -0.27 0.12 99 Y
simplest_design 843 Q 0 estimator (Intercept) 0.16 0.10 1.51 0.13 -0.05 0.37 99 Y
simplest_design 844 Q 0 estimator (Intercept) -0.05 0.10 -0.45 0.65 -0.26 0.16 99 Y
simplest_design 845 Q 0 estimator (Intercept) -0.12 0.11 -1.07 0.29 -0.34 0.10 99 Y
simplest_design 846 Q 0 estimator (Intercept) 0.13 0.10 1.29 0.20 -0.07 0.33 99 Y
simplest_design 847 Q 0 estimator (Intercept) 0.02 0.10 0.17 0.87 -0.18 0.21 99 Y
simplest_design 848 Q 0 estimator (Intercept) -0.03 0.11 -0.31 0.75 -0.25 0.18 99 Y
simplest_design 849 Q 0 estimator (Intercept) 0.13 0.11 1.26 0.21 -0.08 0.34 99 Y
simplest_design 850 Q 0 estimator (Intercept) 0.26 0.10 2.58 0.01 0.06 0.45 99 Y
simplest_design 851 Q 0 estimator (Intercept) -0.16 0.11 -1.51 0.13 -0.38 0.05 99 Y
simplest_design 852 Q 0 estimator (Intercept) 0.05 0.10 0.56 0.58 -0.14 0.24 99 Y
simplest_design 853 Q 0 estimator (Intercept) 0.07 0.09 0.80 0.43 -0.11 0.26 99 Y
simplest_design 854 Q 0 estimator (Intercept) -0.04 0.10 -0.36 0.72 -0.25 0.17 99 Y
simplest_design 855 Q 0 estimator (Intercept) -0.08 0.10 -0.73 0.47 -0.28 0.13 99 Y
simplest_design 856 Q 0 estimator (Intercept) 0.05 0.09 0.54 0.59 -0.14 0.24 99 Y
simplest_design 857 Q 0 estimator (Intercept) -0.10 0.11 -0.89 0.38 -0.31 0.12 99 Y
simplest_design 858 Q 0 estimator (Intercept) 0.11 0.11 1.03 0.30 -0.10 0.32 99 Y
simplest_design 859 Q 0 estimator (Intercept) -0.02 0.10 -0.17 0.86 -0.22 0.18 99 Y
simplest_design 860 Q 0 estimator (Intercept) 0.06 0.11 0.58 0.57 -0.15 0.27 99 Y
simplest_design 861 Q 0 estimator (Intercept) -0.09 0.10 -0.88 0.38 -0.29 0.11 99 Y
simplest_design 862 Q 0 estimator (Intercept) -0.12 0.11 -1.11 0.27 -0.33 0.09 99 Y
simplest_design 863 Q 0 estimator (Intercept) 0.02 0.08 0.27 0.79 -0.15 0.19 99 Y
simplest_design 864 Q 0 estimator (Intercept) -0.09 0.10 -0.86 0.39 -0.30 0.12 99 Y
simplest_design 865 Q 0 estimator (Intercept) 0.08 0.09 0.84 0.40 -0.11 0.26 99 Y
simplest_design 866 Q 0 estimator (Intercept) 0.02 0.10 0.17 0.87 -0.17 0.21 99 Y
simplest_design 867 Q 0 estimator (Intercept) 0.03 0.09 0.27 0.79 -0.16 0.21 99 Y
simplest_design 868 Q 0 estimator (Intercept) -0.06 0.10 -0.59 0.56 -0.27 0.15 99 Y
simplest_design 869 Q 0 estimator (Intercept) 0.11 0.09 1.18 0.24 -0.07 0.29 99 Y
simplest_design 870 Q 0 estimator (Intercept) -0.06 0.10 -0.58 0.56 -0.24 0.13 99 Y
simplest_design 871 Q 0 estimator (Intercept) 0.21 0.10 2.06 0.04 0.01 0.41 99 Y
simplest_design 872 Q 0 estimator (Intercept) 0.15 0.09 1.58 0.12 -0.04 0.33 99 Y
simplest_design 873 Q 0 estimator (Intercept) 0.19 0.09 2.08 0.04 0.01 0.37 99 Y
simplest_design 874 Q 0 estimator (Intercept) -0.11 0.10 -1.16 0.25 -0.31 0.08 99 Y
simplest_design 875 Q 0 estimator (Intercept) 0.01 0.10 0.07 0.94 -0.19 0.21 99 Y
simplest_design 876 Q 0 estimator (Intercept) 0.10 0.10 1.02 0.31 -0.10 0.30 99 Y
simplest_design 877 Q 0 estimator (Intercept) -0.07 0.11 -0.67 0.50 -0.28 0.14 99 Y
simplest_design 878 Q 0 estimator (Intercept) 0.04 0.08 0.53 0.60 -0.12 0.21 99 Y
simplest_design 879 Q 0 estimator (Intercept) 0.13 0.11 1.17 0.24 -0.09 0.34 99 Y
simplest_design 880 Q 0 estimator (Intercept) -0.04 0.11 -0.37 0.71 -0.25 0.17 99 Y
simplest_design 881 Q 0 estimator (Intercept) 0.08 0.10 0.84 0.40 -0.11 0.27 99 Y
simplest_design 882 Q 0 estimator (Intercept) -0.10 0.09 -1.08 0.28 -0.28 0.08 99 Y
simplest_design 883 Q 0 estimator (Intercept) 0.00 0.11 -0.01 0.99 -0.22 0.21 99 Y
simplest_design 884 Q 0 estimator (Intercept) -0.21 0.10 -2.16 0.03 -0.40 -0.02 99 Y
simplest_design 885 Q 0 estimator (Intercept) -0.01 0.10 -0.07 0.94 -0.20 0.18 99 Y
simplest_design 886 Q 0 estimator (Intercept) 0.02 0.10 0.24 0.81 -0.18 0.23 99 Y
simplest_design 887 Q 0 estimator (Intercept) -0.08 0.10 -0.81 0.42 -0.27 0.12 99 Y
simplest_design 888 Q 0 estimator (Intercept) 0.04 0.08 0.47 0.64 -0.13 0.21 99 Y
simplest_design 889 Q 0 estimator (Intercept) 0.16 0.09 1.68 0.10 -0.03 0.34 99 Y
simplest_design 890 Q 0 estimator (Intercept) 0.01 0.10 0.09 0.93 -0.19 0.20 99 Y
simplest_design 891 Q 0 estimator (Intercept) 0.02 0.09 0.26 0.79 -0.16 0.20 99 Y
simplest_design 892 Q 0 estimator (Intercept) -0.04 0.10 -0.38 0.70 -0.24 0.16 99 Y
simplest_design 893 Q 0 estimator (Intercept) 0.02 0.10 0.22 0.82 -0.17 0.22 99 Y
simplest_design 894 Q 0 estimator (Intercept) -0.11 0.10 -1.12 0.27 -0.31 0.09 99 Y
simplest_design 895 Q 0 estimator (Intercept) 0.06 0.10 0.61 0.54 -0.13 0.25 99 Y
simplest_design 896 Q 0 estimator (Intercept) -0.07 0.10 -0.68 0.50 -0.27 0.13 99 Y
simplest_design 897 Q 0 estimator (Intercept) -0.14 0.10 -1.38 0.17 -0.35 0.06 99 Y
simplest_design 898 Q 0 estimator (Intercept) 0.12 0.10 1.27 0.21 -0.07 0.31 99 Y
simplest_design 899 Q 0 estimator (Intercept) 0.02 0.11 0.19 0.85 -0.20 0.25 99 Y
simplest_design 900 Q 0 estimator (Intercept) 0.09 0.10 0.89 0.38 -0.11 0.28 99 Y
simplest_design 901 Q 0 estimator (Intercept) -0.01 0.10 -0.06 0.95 -0.20 0.19 99 Y
simplest_design 902 Q 0 estimator (Intercept) -0.11 0.11 -1.06 0.29 -0.32 0.10 99 Y
simplest_design 903 Q 0 estimator (Intercept) -0.08 0.09 -0.91 0.36 -0.26 0.10 99 Y
simplest_design 904 Q 0 estimator (Intercept) 0.07 0.10 0.70 0.48 -0.13 0.28 99 Y
simplest_design 905 Q 0 estimator (Intercept) -0.06 0.11 -0.59 0.55 -0.28 0.15 99 Y
simplest_design 906 Q 0 estimator (Intercept) 0.03 0.10 0.33 0.74 -0.16 0.22 99 Y
simplest_design 907 Q 0 estimator (Intercept) -0.01 0.11 -0.14 0.89 -0.23 0.20 99 Y
simplest_design 908 Q 0 estimator (Intercept) -0.17 0.10 -1.73 0.09 -0.38 0.03 99 Y
simplest_design 909 Q 0 estimator (Intercept) -0.06 0.11 -0.52 0.60 -0.28 0.16 99 Y
simplest_design 910 Q 0 estimator (Intercept) -0.02 0.10 -0.21 0.84 -0.21 0.17 99 Y
simplest_design 911 Q 0 estimator (Intercept) -0.02 0.10 -0.24 0.81 -0.23 0.18 99 Y
simplest_design 912 Q 0 estimator (Intercept) -0.04 0.11 -0.36 0.72 -0.25 0.18 99 Y
simplest_design 913 Q 0 estimator (Intercept) 0.17 0.09 1.80 0.08 -0.02 0.36 99 Y
simplest_design 914 Q 0 estimator (Intercept) 0.00 0.10 -0.04 0.96 -0.21 0.20 99 Y
simplest_design 915 Q 0 estimator (Intercept) -0.05 0.10 -0.52 0.61 -0.25 0.15 99 Y
simplest_design 916 Q 0 estimator (Intercept) -0.07 0.10 -0.68 0.50 -0.27 0.13 99 Y
simplest_design 917 Q 0 estimator (Intercept) 0.20 0.09 2.15 0.03 0.02 0.39 99 Y
simplest_design 918 Q 0 estimator (Intercept) -0.09 0.09 -1.00 0.32 -0.26 0.08 99 Y
simplest_design 919 Q 0 estimator (Intercept) -0.05 0.11 -0.41 0.68 -0.27 0.18 99 Y
simplest_design 920 Q 0 estimator (Intercept) 0.03 0.09 0.37 0.71 -0.15 0.22 99 Y
simplest_design 921 Q 0 estimator (Intercept) 0.13 0.10 1.30 0.20 -0.07 0.33 99 Y
simplest_design 922 Q 0 estimator (Intercept) -0.04 0.10 -0.36 0.72 -0.24 0.17 99 Y
simplest_design 923 Q 0 estimator (Intercept) 0.11 0.10 1.11 0.27 -0.08 0.30 99 Y
simplest_design 924 Q 0 estimator (Intercept) 0.05 0.09 0.48 0.63 -0.14 0.23 99 Y
simplest_design 925 Q 0 estimator (Intercept) 0.02 0.11 0.18 0.86 -0.20 0.24 99 Y
simplest_design 926 Q 0 estimator (Intercept) -0.05 0.09 -0.54 0.59 -0.22 0.12 99 Y
simplest_design 927 Q 0 estimator (Intercept) -0.18 0.10 -1.78 0.08 -0.38 0.02 99 Y
simplest_design 928 Q 0 estimator (Intercept) -0.03 0.10 -0.30 0.76 -0.23 0.17 99 Y
simplest_design 929 Q 0 estimator (Intercept) 0.08 0.11 0.69 0.49 -0.15 0.30 99 Y
simplest_design 930 Q 0 estimator (Intercept) -0.03 0.10 -0.25 0.80 -0.23 0.18 99 Y
simplest_design 931 Q 0 estimator (Intercept) 0.04 0.11 0.39 0.70 -0.18 0.27 99 Y
simplest_design 932 Q 0 estimator (Intercept) -0.02 0.10 -0.20 0.84 -0.22 0.18 99 Y
simplest_design 933 Q 0 estimator (Intercept) 0.02 0.09 0.26 0.80 -0.16 0.21 99 Y
simplest_design 934 Q 0 estimator (Intercept) 0.05 0.09 0.48 0.63 -0.14 0.23 99 Y
simplest_design 935 Q 0 estimator (Intercept) 0.02 0.10 0.22 0.82 -0.17 0.21 99 Y
simplest_design 936 Q 0 estimator (Intercept) 0.24 0.10 2.42 0.02 0.04 0.43 99 Y
simplest_design 937 Q 0 estimator (Intercept) -0.21 0.09 -2.30 0.02 -0.39 -0.03 99 Y
simplest_design 938 Q 0 estimator (Intercept) -0.28 0.10 -2.86 0.01 -0.47 -0.08 99 Y
simplest_design 939 Q 0 estimator (Intercept) 0.06 0.12 0.52 0.60 -0.17 0.29 99 Y
simplest_design 940 Q 0 estimator (Intercept) 0.04 0.10 0.43 0.67 -0.15 0.23 99 Y
simplest_design 941 Q 0 estimator (Intercept) 0.17 0.09 1.83 0.07 -0.01 0.36 99 Y
simplest_design 942 Q 0 estimator (Intercept) -0.12 0.10 -1.24 0.22 -0.31 0.07 99 Y
simplest_design 943 Q 0 estimator (Intercept) -0.16 0.09 -1.85 0.07 -0.33 0.01 99 Y
simplest_design 944 Q 0 estimator (Intercept) 0.16 0.10 1.55 0.13 -0.05 0.37 99 Y
simplest_design 945 Q 0 estimator (Intercept) 0.13 0.11 1.18 0.24 -0.09 0.36 99 Y
simplest_design 946 Q 0 estimator (Intercept) -0.10 0.10 -0.97 0.33 -0.30 0.10 99 Y
simplest_design 947 Q 0 estimator (Intercept) 0.16 0.10 1.56 0.12 -0.04 0.36 99 Y
simplest_design 948 Q 0 estimator (Intercept) 0.18 0.10 1.84 0.07 -0.01 0.37 99 Y
simplest_design 949 Q 0 estimator (Intercept) 0.06 0.09 0.62 0.54 -0.13 0.25 99 Y
simplest_design 950 Q 0 estimator (Intercept) -0.02 0.09 -0.22 0.83 -0.20 0.16 99 Y
simplest_design 951 Q 0 estimator (Intercept) -0.07 0.10 -0.67 0.50 -0.28 0.14 99 Y
simplest_design 952 Q 0 estimator (Intercept) -0.07 0.10 -0.72 0.47 -0.26 0.12 99 Y
simplest_design 953 Q 0 estimator (Intercept) 0.18 0.10 1.74 0.08 -0.02 0.38 99 Y
simplest_design 954 Q 0 estimator (Intercept) -0.14 0.10 -1.43 0.16 -0.35 0.06 99 Y
simplest_design 955 Q 0 estimator (Intercept) -0.12 0.10 -1.28 0.20 -0.31 0.07 99 Y
simplest_design 956 Q 0 estimator (Intercept) -0.26 0.10 -2.54 0.01 -0.45 -0.06 99 Y
simplest_design 957 Q 0 estimator (Intercept) 0.00 0.09 -0.04 0.97 -0.18 0.17 99 Y
simplest_design 958 Q 0 estimator (Intercept) -0.05 0.09 -0.53 0.60 -0.24 0.14 99 Y
simplest_design 959 Q 0 estimator (Intercept) -0.11 0.09 -1.23 0.22 -0.29 0.07 99 Y
simplest_design 960 Q 0 estimator (Intercept) -0.25 0.11 -2.37 0.02 -0.47 -0.04 99 Y
simplest_design 961 Q 0 estimator (Intercept) 0.00 0.09 -0.04 0.97 -0.19 0.18 99 Y
simplest_design 962 Q 0 estimator (Intercept) -0.03 0.10 -0.27 0.78 -0.23 0.18 99 Y
simplest_design 963 Q 0 estimator (Intercept) -0.12 0.11 -1.10 0.28 -0.33 0.10 99 Y
simplest_design 964 Q 0 estimator (Intercept) -0.06 0.10 -0.60 0.55 -0.26 0.14 99 Y
simplest_design 965 Q 0 estimator (Intercept) 0.09 0.10 0.83 0.41 -0.12 0.29 99 Y
simplest_design 966 Q 0 estimator (Intercept) -0.10 0.09 -1.10 0.27 -0.27 0.08 99 Y
simplest_design 967 Q 0 estimator (Intercept) -0.01 0.09 -0.13 0.90 -0.20 0.17 99 Y
simplest_design 968 Q 0 estimator (Intercept) 0.10 0.10 1.04 0.30 -0.09 0.30 99 Y
simplest_design 969 Q 0 estimator (Intercept) -0.07 0.10 -0.72 0.47 -0.28 0.13 99 Y
simplest_design 970 Q 0 estimator (Intercept) -0.02 0.09 -0.20 0.85 -0.20 0.17 99 Y
simplest_design 971 Q 0 estimator (Intercept) -0.09 0.10 -0.91 0.36 -0.29 0.11 99 Y
simplest_design 972 Q 0 estimator (Intercept) 0.05 0.09 0.54 0.59 -0.13 0.23 99 Y
simplest_design 973 Q 0 estimator (Intercept) 0.03 0.11 0.30 0.76 -0.19 0.25 99 Y
simplest_design 974 Q 0 estimator (Intercept) 0.00 0.10 -0.05 0.96 -0.20 0.19 99 Y
simplest_design 975 Q 0 estimator (Intercept) -0.11 0.09 -1.15 0.25 -0.29 0.08 99 Y
simplest_design 976 Q 0 estimator (Intercept) 0.24 0.09 2.55 0.01 0.05 0.43 99 Y
simplest_design 977 Q 0 estimator (Intercept) -0.08 0.10 -0.80 0.43 -0.28 0.12 99 Y
simplest_design 978 Q 0 estimator (Intercept) 0.04 0.11 0.42 0.68 -0.16 0.25 99 Y
simplest_design 979 Q 0 estimator (Intercept) 0.00 0.10 0.05 0.96 -0.20 0.21 99 Y
simplest_design 980 Q 0 estimator (Intercept) 0.18 0.10 1.74 0.08 -0.02 0.38 99 Y
simplest_design 981 Q 0 estimator (Intercept) -0.04 0.09 -0.45 0.65 -0.21 0.13 99 Y
simplest_design 982 Q 0 estimator (Intercept) -0.11 0.10 -1.04 0.30 -0.31 0.10 99 Y
simplest_design 983 Q 0 estimator (Intercept) -0.10 0.11 -0.94 0.35 -0.31 0.11 99 Y
simplest_design 984 Q 0 estimator (Intercept) 0.02 0.09 0.22 0.83 -0.16 0.20 99 Y
simplest_design 985 Q 0 estimator (Intercept) -0.08 0.10 -0.81 0.42 -0.27 0.11 99 Y
simplest_design 986 Q 0 estimator (Intercept) 0.11 0.10 1.12 0.26 -0.09 0.31 99 Y
simplest_design 987 Q 0 estimator (Intercept) 0.03 0.10 0.28 0.78 -0.17 0.22 99 Y
simplest_design 988 Q 0 estimator (Intercept) -0.11 0.10 -1.16 0.25 -0.31 0.08 99 Y
simplest_design 989 Q 0 estimator (Intercept) 0.05 0.10 0.55 0.59 -0.14 0.25 99 Y
simplest_design 990 Q 0 estimator (Intercept) -0.13 0.11 -1.26 0.21 -0.34 0.08 99 Y
simplest_design 991 Q 0 estimator (Intercept) -0.06 0.09 -0.67 0.50 -0.24 0.12 99 Y
simplest_design 992 Q 0 estimator (Intercept) -0.14 0.10 -1.35 0.18 -0.34 0.06 99 Y
simplest_design 993 Q 0 estimator (Intercept) 0.05 0.10 0.53 0.60 -0.15 0.26 99 Y
simplest_design 994 Q 0 estimator (Intercept) -0.07 0.10 -0.68 0.50 -0.28 0.14 99 Y
simplest_design 995 Q 0 estimator (Intercept) 0.14 0.11 1.29 0.20 -0.07 0.35 99 Y
simplest_design 996 Q 0 estimator (Intercept) -0.01 0.10 -0.13 0.89 -0.20 0.18 99 Y
simplest_design 997 Q 0 estimator (Intercept) -0.02 0.10 -0.23 0.82 -0.23 0.18 99 Y
simplest_design 998 Q 0 estimator (Intercept) 0.21 0.11 1.97 0.05 0.00 0.42 99 Y
simplest_design 999 Q 0 estimator (Intercept) 0.16 0.10 1.67 0.10 -0.03 0.35 99 Y
simplest_design 1000 Q 0 estimator (Intercept) 0.13 0.11 1.17 0.24 -0.09 0.34 99 Y

2.4.9 The simplest possible design?: Diagnosis

Once you have simulated many times you can “diagnose”.

This is the next topic

2.5 Design declaration-diagnosis-redesign workflow: Diagnosis

2.5.1 Diagnosis by hand

Once you have simulated many times you can “diagnose”.

For instance we can ask about bias: the average difference between the estimand and the estimate:

some_runs |> mutate(error = estimate - estimand) |>
  summarize(mean_estimate = mean(estimate), 
            mean_estimand = mean(estimand), 
            bias = mean(error)) 
mean_estimate mean_estimand bias
0 0 0

2.5.2 The simplest possible design?

diagnose_design() does this in one step for a set of common “diagnosands”:

diagnosis <-
  simplest_design |>
Design N Sims Mean Estimand Mean Estimate Bias SD Estimate RMSE Power Coverage
simplest_design 500 0.00 -0.00 -0.00 0.10 0.10 0.05 0.95
(0.00) (0.00) (0.00) (0.00) (0.00) (0.01) (0.01)

2.5.3 What is the diagnosis object?

The diagnosis object is also a list; of class diagnosis

[1] "simulations_df"       "diagnosands_df"       "diagnosand_names"    
[4] "group_by_set"         "parameters_df"        "bootstrap_replicates"
[7] "bootstrap_sims"       "duration"            
[1] "diagnosis"

2.5.4 What is the diagnosis object?

diagnosis$simulations_df |> 
design sim_ID inquiry estimand estimator term estimate std.error statistic p.value conf.low conf.high df outcome
simplest_design 1 Q 0 estimator (Intercept) 0.03 0.09 0.31 0.76 -0.16 0.21 99 Y
simplest_design 2 Q 0 estimator (Intercept) 0.10 0.09 1.07 0.29 -0.09 0.29 99 Y
simplest_design 3 Q 0 estimator (Intercept) -0.16 0.10 -1.54 0.13 -0.37 0.05 99 Y
simplest_design 4 Q 0 estimator (Intercept) -0.08 0.11 -0.72 0.48 -0.30 0.14 99 Y
simplest_design 5 Q 0 estimator (Intercept) -0.14 0.10 -1.34 0.18 -0.34 0.07 99 Y
simplest_design 6 Q 0 estimator (Intercept) -0.08 0.09 -0.90 0.37 -0.26 0.10 99 Y

2.5.5 What is the diagnosis object?

diagnosis$diagnosands_df |> 
design inquiry estimator outcome term mean_estimand se(mean_estimand) mean_estimate se(mean_estimate) bias se(bias) sd_estimate se(sd_estimate) rmse se(rmse) power se(power) coverage se(coverage) n_sims
simplest_design Q estimator Y (Intercept) 0 0 0 0 0 0 0.1 0 0.1 0 0.05 0.01 0.95 0.01 500

2.5.6 What is the diagnosis object?

diagnosis$bootstrap_replicates |> 
design bootstrap_id inquiry estimator outcome term mean_estimand mean_estimate bias sd_estimate rmse power coverage
simplest_design 1 Q estimator Y (Intercept) 0 0.00 0.00 0.1 0.10 0.05 0.95
simplest_design 2 Q estimator Y (Intercept) 0 -0.01 -0.01 0.1 0.11 0.06 0.94
simplest_design 3 Q estimator Y (Intercept) 0 -0.01 -0.01 0.1 0.10 0.05 0.95
simplest_design 4 Q estimator Y (Intercept) 0 -0.01 -0.01 0.1 0.10 0.05 0.95
simplest_design 5 Q estimator Y (Intercept) 0 0.00 0.00 0.1 0.10 0.05 0.95
simplest_design 6 Q estimator Y (Intercept) 0 0.00 0.00 0.1 0.10 0.05 0.95

2.5.7 Diagnosis: Bootstraps

  • The bootstraps dataframe is produced by resampling from the simulations dataframe and producing a diagnosis dataframe from each resampling.

  • This lets us generate estimates of uncertainty around our diagnosands.

  • It can be controlled thus:

  bootstrap_sims = 100

2.5.8 After Diagnosis

It’s reshapeable: as a tidy dataframe, ready for graphing

diagnosis |> 
design inquiry estimator outcome term diagnosand estimate std.error conf.low conf.high
simplest_design Q estimator Y (Intercept) mean_estimand 0.00 0.00 0.00 0.00
simplest_design Q estimator Y (Intercept) mean_estimate 0.00 0.00 -0.01 0.00
simplest_design Q estimator Y (Intercept) bias 0.00 0.00 -0.01 0.00
simplest_design Q estimator Y (Intercept) sd_estimate 0.10 0.00 0.10 0.11
simplest_design Q estimator Y (Intercept) rmse 0.10 0.00 0.10 0.11
simplest_design Q estimator Y (Intercept) power 0.05 0.01 0.03 0.07
simplest_design Q estimator Y (Intercept) coverage 0.95 0.01 0.93 0.97

2.5.9 After Diagnosis

It’s reshapeable: as a tidy dataframe, ready for graphing

diagnosis |> 
  tidy() |> 
  ggplot(aes(estimate, diagnosand)) + geom_point() + 
  geom_errorbarh(aes(xmax = conf.high, xmin = conf.low, height = .2))

2.5.10 After Diagnosis: Tables

Or turn into a formatted table:

diagnosis |> 
Design Inquiry Estimator Outcome Term N Sims Mean Estimand Mean Estimate Bias SD Estimate RMSE Power Coverage
simplest_design Q estimator Y (Intercept) 500 0.00 -0.00 -0.00 0.10 0.10 0.05 0.95
(0.00) (0.00) (0.00) (0.00) (0.00) (0.01) (0.01)

2.5.11 Advanced Diagnosis: Variations

    mean_estimand <- mean(estimand)
    mean_estimate <- mean(estimate)
    bias <- mean(estimate - estimand)
    sd_estimate <- sd(estimate)
    rmse <- sqrt(mean((estimate - estimand)^2))
    power <- mean(p.value <= alpha)
    coverage <- mean(estimand <= conf.high & estimand >= conf.low)

2.5.12 Advanced Diagnosis: Other diagnosands

    mean_se = mean(std.error)
    type_s_rate = mean((sign(estimate) != sign(estimand))[p.value <= alpha])
    exaggeration_ratio = mean((estimate/estimand)[p.value <= alpha])
    var_estimate = pop.var(estimate)
    mean_var_hat = mean(std.error^2)
    prop_pos_sig = estimate > 0 & p.value <= alpha
    mean_ci_length = mean(conf.high - conf.low)

2.5.13 Advanced Diagnosis: Custom diagnosands

my_diagnosands <-
  declare_diagnosands(median_bias = median(estimate - estimand))

diagnose_design(simplest_design, diagnosands = my_diagnosands, sims = 10) |>
  reshape_diagnosis() |> kable() |> kable_styling(font_size = 20)
Design Inquiry Estimator Outcome Term N Sims Median Bias
simplest_design Q estimator Y (Intercept) 10 -0.04

2.5.14 Advanced Diagnosis: Adding diagnosands to a design

simplest_design <- 
  set_diagnosands(simplest_design, my_diagnosands)

simplest_design |> diagnose_design(sims = 10)|>
  reshape_diagnosis() |> kable() |> kable_styling(font_size = 20)
Design Inquiry Estimator Outcome Term N Sims Median Bias
simplest_design Q estimator Y (Intercept) 10 0.03

2.5.15 Advanced Diagnosis: Diagnosing multiple designs

You can diagnose multiple designs or a list of designs

list(dum = simplest_design, dee = simplest_design) |>
  diagnose_design(sims = 5) |>
  reshape_diagnosis() |> 
  kable() |> 
  kable_styling(font_size = 20)
Design Inquiry Estimator Outcome Term N Sims Median Bias
dum Q estimator Y (Intercept) 5 0.02
dee Q estimator Y (Intercept) 5 -0.03

2.5.16 Advanced Diagnosis: Diagnosing in groups

You can partition the simulations data frame into groups before calculating diagnosands.

grouped_diagnosis <- 
  simplest_design |>
    make_groups = vars(significant = p.value <= 0.05),
    sims = 500
Design Significant N Sims Mean Estimand Mean Estimate Bias SD Estimate RMSE Power Coverage
design_1 FALSE 474 0.00 -0.00 -0.00 0.09 0.09 0.00 1.00
(0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
design_1 TRUE 26 0.00 -0.02 -0.02 0.23 0.23 1.00 0.00
(0.00) (0.04) (0.04) (0.01) (0.01) (0.00) (0.00)

Note especially the mean estimate, the power, the coverage, the RMSE, and the bias. (Bias is not large because we have both under and over estimates)

2.5.17 Significance filter

grouped_diagnosis$simulations_df |>
  ggplot(aes(estimate, p.value, color = significant)) + geom_point()

2.5.18 Advanced Diagnosis: Multistage simulation

  • Usually a design simulation simulates “from the top”: going from the beginning to the end of the design in each run and repeating
  • But sometimes you might want to follow a tree like structure and simulate different steps a different number of times

2.5.19 Advanced Diagnosis: Multistage simulation

Consider for instance this sampling design:

sampling_design <- 
  declare_model(N = 500, Y = 1 + rnorm(N, sd = 10)) +
  declare_inquiry(Q = mean(Y)) +
  declare_sampling(S = complete_rs(N = N, n = 100)) + 
  declare_estimator(Y ~ 1)

2.5.20 Advanced Diagnosis: Multistage simulation

Compare these two diagnoses:

diagnosis_1 <- diagnose_design(sampling_design, sims = c(5000, 1, 1, 1)) 
diagnosis_2 <- diagnose_design(sampling_design, sims = c(1, 5000, 1, 1))
diagnosis N Sims Mean Estimand Mean Estimate Bias SD Estimate RMSE Power Coverage
diagnosis_1 5000 1.00 1.00 -0.00 1.01 0.90 0.17 0.97
diagnosis_1 (0.01) (0.01) (0.01) (0.01) (0.01) (0.01) (0.00)
diagnosis_2 5000 1.22 1.22 -0.00 0.91 0.91 0.20 0.97
diagnosis_2 (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)

In the second the estimate is drawn just once. The SD of the estimate is lower. But the RMSE is not very different.

2.5.21 Spotting design problems with diagnosis

Diagnosis alerts to problems in a design. Consider the following simple alternative design.

simplest_design_2 <- 
  declare_model(N = 100, Y = rnorm(N)) +
  declare_inquiry(Q = mean(Y)) +
  declare_estimator(Y ~ 1)

Here we define the inquiry as the sample average \(Y\) (instead of the population mean). But otherwise things stay the same.

What do we think of this design?

2.5.22 Spotting design problems with diagnosis

Here is the diagnosis

Design N Sims Mean Estimand Mean Estimate Bias SD Estimate RMSE Power Coverage
simplest_design_2 500 -0.00 -0.00 0.00 0.10 0.00 0.04 1.00
(0.00) (0.00) (0.00) (0.00) (0.00) (0.01) (0.00)
  • Why is coverage so high? is that OK?
  • Why is the RMSE 0 but the SD of the estimate > 0? is that OK?
    • Is it because the RMSE is too low?
    • Or the standard error is too large?

2.5.23 It depends on the inquiry

  • If we are really interested in the sample average then our standard error is off: we should have no error at all!
  • If we are really interested in the population average then our inquiry is badly defined: it should not be redefined on each run!

2.6 Design declaration-diagnosis-redesign workflow: Redesign

2.6.1 Redesign

Redesign is the process of taking a design and modifying it in some way.

There are a few ways to do this:

  1. Just make a new design using modified code
  2. Take a design and alter some steps using replace_step, insert_step or delete_step
  3. Modify a design parameter using redesign

we will focus on the third approach

2.6.2 Redesign

  • A design parameter is a modifiable quantity of a design.

  • These quantities are objects that were in your global environment when you made your design, get referred to explicitly in your design, and got scooped up when the design was formed.

  • In our simplest design above we had a fixed N, but we could make N a modifiable quantity like this:

N <- 100

simplest_design_N <- 
  declare_model(N = N, Y = rnorm(N)) +
  declare_inquiry(Q = 0) +
  declare_estimator(Y ~ 1)

2.6.3 Redesign

N <- 100

simplest_design_N <- 
  declare_model(N = N, Y = rnorm(N)) +
  declare_inquiry(Q = 0) +
  declare_estimator(Y ~ 1)

Note that N is defined in memory; and it gets called in one of the steps. It has now become a parameter of the design and it can be modified using redesign.

2.6.4 Simple Redesign

Here is a version of the design with N = 200:

design_200 <- simplest_design_N |> redesign(N = 200)
design_200 |> draw_data() |> nrow()
[1] 200

2.6.5 Redesigning to a list

Here is a list of three different designs with different Ns.

design_Ns <- simplest_design_N |> redesign(N = c(200, 400, 800))

design_Ns |> lapply(draw_data) |> lapply(nrow)
[1] 200

[1] 400

[1] 800

2.6.6 Redesigning to a list

The good thing here is that it is now easy to diagnose over multiple designs and compare diagnoses. The parameter names then end up in the diagnosis_df

Consider this:

N <- 100
m <- 0

design <- 
  declare_model(N = N, Y = rnorm(N, m)) +
  declare_inquiry(Q = m) +
  declare_estimator(Y ~ 1) 


designs <-  redesign(design, N = c(100, 200, 300), m = c(0, .1, .2))
designs |> diagnose_design() |> tidy() 

2.6.7 Redesigning to a list


designs |> diagnose_design() |> tidy() 
N m diagnosand estimate std.error conf.low conf.high
100 0.0 mean_estimand 0.00 0.00 0.00 0.00
100 0.0 mean_estimate 0.00 0.00 -0.01 0.01
100 0.0 bias 0.00 0.00 -0.01 0.01
100 0.0 sd_estimate 0.10 0.00 0.10 0.11
200 0.0 mean_estimand 0.00 0.00 0.00 0.00
200 0.0 mean_estimate 0.00 0.00 -0.01 0.00
200 0.1 mean_estimand 0.10 0.00 0.10 0.10
200 0.1 mean_estimate 0.10 0.00 0.09 0.10
300 0.2 bias 0.00 0.00 0.00 0.00
300 0.2 sd_estimate 0.06 0.00 0.05 0.06
300 0.2 rmse 0.06 0.00 0.05 0.06
300 0.2 power 0.93 0.01 0.91 0.95
300 0.2 coverage 0.95 0.01 0.92 0.97

2.6.8 Redesigning to a list

Graphing after redesign is especially easy:

designs |> diagnose_design() |> 
  tidy() |>
  filter(diagnosand %in% c("power", "rmse")) |> 
  ggplot(aes(N, estimate, color = factor(m))) + 
  geom_line() + 

Power depends on N and m, rmse depends on N only

2.6.9 Redesign with vector arguments

When redesigning with arguments that are vectors, use list() in redesign, with each list item representing a design you wish to create

prob_each <- c(.1, .5, .4)

design_multi  <- 
  declare_model(N = 10) +
  declare_assignment(Z = complete_ra(N = N, prob_each = prob_each))

## returns two designs

designs <- design_multi |> 
  redesign(prob_each = list(c(.2, .5, .3), c(0, .5, .5)))
designs |> lapply(draw_data)

2.6.10 Redesign warnings

A parameter has to be called correctly. And you get no warning if you misname.

simplest_design_N  |> redesign(n = 200) |> draw_data() |> nrow()
[1] 100

why not 200?

2.6.11 Redesign warnings

A parameter has to be called explicitly

N <- 100

my_N <- function(n = N) n

simplest_design_N2 <- 
  declare_model(N = my_N(), Y = rnorm(N)) +
  declare_inquiry(Q = 0) +
  declare_estimator(Y ~ 1)

simplest_design_N2 |> redesign(N = 200) |> draw_data() |> nrow()
[1] 100

why not 200?

2.6.12 Redesign warnings

A parameter has to be called explicitly

N <- 100

my_N <- function(n = N) n

simplest_design_N2 <- 
  declare_model(N = my_N(N), Y = rnorm(N)) +
  declare_inquiry(Q = 0) +
  declare_estimator(Y ~ 1)

simplest_design_N2 |> redesign(N = 200) |> draw_data() |> nrow()
[1] 200


2.6.13 Redesign with a function

Here is an example of redesigning where the “parameter” is a function

new_N <- function(n, factor = 1.31) n*factor

simplest_design_N2 |> redesign(my_N = new_N) |> draw_data() |> nrow()
[1] 131

2.7 Using a design

What can you do with a design once you have it?

We will start with a very simple experimental design (more on the components of this later)

b <-1
N <- 100
design <- 
  declare_model(N = N, U = rnorm(N), potential_outcomes(Y ~ b * Z + U)) + 
  declare_assignment(Z = simple_ra(N), Y = reveal_outcomes(Y ~ Z)) + 
  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) + 
  declare_estimator(Y ~ Z, inquiry = "ate", .method = lm_robust)

2.7.1 Make data from the design

data <- draw_data(design)

data |> head () |> kable() |> kable_styling(font_size = 20)
ID U Y_Z_0 Y_Z_1 Z Y
001 0.8939241 0.8939241 1.8939241 1 1.8939241
002 1.3350334 1.3350334 2.3350334 1 2.3350334
003 0.8329075 0.8329075 1.8329075 1 1.8329075
004 -0.2886946 -0.2886946 0.7113054 0 -0.2886946
005 -0.3062044 -0.3062044 0.6937956 1 0.6937956
006 0.6443779 0.6443779 1.6443779 1 1.6443779

2.7.2 Make data from the design

Play with the data:

lm_robust(Y ~ Z, data = data) |>
  tidy() |>
  kable(digits = 2) |> 
  kable_styling(font_size = 20)
term estimate std.error statistic p.value conf.low conf.high df outcome
(Intercept) -0.21 0.14 -1.50 0.14 -0.48 0.07 98 Y
Z 1.27 0.19 6.69 0.00 0.89 1.65 98 Y

2.7.3 Draw estimands

draw_estimands(design) |>
  kable(digits = 2) |> 
  kable_styling(font_size = 20)
inquiry estimand
ate 1

2.7.4 Draw estimates

draw_estimates(design) |> 
  kable(digits = 2) |> 
  kable_styling(font_size = 20)
estimator term estimate std.error statistic p.value conf.low conf.high df outcome inquiry
estimator Z 1.06 0.19 5.57 0 0.69 1.44 98 Y ate

2.7.5 Get estimates

Using your actual data:

get_estimates(design, data) |>
  kable(digits = 2) |> 
  kable_styling(font_size = 20)
estimator term estimate std.error statistic p.value conf.low conf.high df outcome inquiry
estimator Z 1.27 0.19 6.69 0 0.89 1.65 98 Y ate

2.7.6 Simulate design

simulate_design(design, sims = 3) |>
  kable(digits = 2) |> 
  kable_styling(font_size = 16)
design sim_ID inquiry estimand estimator term estimate std.error statistic p.value conf.low conf.high df outcome
design 1 ate 1 estimator Z 1.13 0.19 5.99 0 0.76 1.51 98 Y
design 2 ate 1 estimator Z 1.18 0.20 5.88 0 0.78 1.57 98 Y
design 3 ate 1 estimator Z 1.21 0.19 6.21 0 0.82 1.60 98 Y

2.7.7 Diagnose design

design |> 
  diagnose_design(sims = 100) 
Mean Estimate Bias SD Estimate RMSE Power Coverage
0.99 -0.01 0.20 0.20 1.00 0.95
(0.02) (0.02) (0.02) (0.02) (0.00) (0.02)

2.7.8 Redesign

new_design <-
  design |> redesign(b = 0)
  • Modify any arguments that are explicitly called on by design steps.
  • Or add, remove, or replace steps

2.7.9 Compare designs

redesign(design, N = 50) %>%
diagnosand mean_1 mean_2 mean_difference conf.low conf.high
mean_estimand 0.50 0.50 0.00 0.00 0.00
mean_estimate 0.48 0.50 0.02 -0.01 0.04
bias -0.02 0.00 0.02 -0.01 0.04
sd_estimate 0.28 0.20 -0.08 -0.10 -0.06
rmse 0.28 0.20 -0.08 -0.10 -0.06
power 0.38 0.71 0.32 0.26 0.37
coverage 0.97 0.96 -0.01 -0.04 0.01

2.7.10 Illustration of power calculation

Recall?: The power of a design is the probability that you will reject a null hypothesis

N <- 100
b <- .5

design <- 
  declare_model(N = N, 
    U = rnorm(N),
    potential_outcomes(Y ~ b * Z + U)) + 
  declare_assignment(Z = simple_ra(N),
                     Y = reveal_outcomes(Y ~ Z)) + 
  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) + 
  declare_estimator(Y ~ Z, inquiry = "ate", .method = lm_robust)

2.7.11 “Run” the design once

Summary of a single 'run' of the design
inquiry estimand estimator term estimate std.error statistic p.value conf.low conf.high df outcome
ate 0.5 estimator Z 0.28 0.2 1.39 0.17 -0.12 0.69 98 Y

2.7.12 Run it many times

sims_1 <- simulate_design(design) 

sims_1 |> select(sim_ID, estimate, p.value)
sim_ID estimate p.value
1 0.81 0.00
2 0.40 0.04
3 0.88 0.00
4 0.72 0.00
5 0.38 0.05
6 0.44 0.02

2.7.13 Power is mass of the sampling distribution of decisions under the model

sims_1 |>
  ggplot(aes(p.value)) + 
  geom_histogram(boundary = 0) +
  geom_vline(xintercept = .05, color = "red")

2.7.14 Power is mass of the sampling distribution of decisions under the model

redesign(design, b = 0) |> 
  simulate_design(sims = 10000) 

2.7.15 Design diagnosis does it all (over multiple designs)

Mean Estimate Bias SD Estimate RMSE Power Coverage
0.50 0.00 0.20 0.20 0.70 0.95
(0.00) (0.00) (0.00) (0.00) (0.00) (0.00)

2.7.16 Design diagnosis does it all

design |>
  redesign(b = c(0, 0.25, 0.5, 1)) |>
b Mean Estimate Bias SD Estimate RMSE Power Coverage
0 -0.00 -0.00 0.20 0.20 0.05 0.95
(0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
0.25 0.25 -0.00 0.20 0.20 0.23 0.95
(0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
0.5 0.50 0.00 0.20 0.20 0.70 0.95
(0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
1 1.00 0.00 0.20 0.20 1.00 0.95
(0.00) (0.00) (0.00) (0.00) (0.00) (0.00)

2.8 Declaration: a deeper dive (Reference)

We start with a simple experimental design and then show ways to extend.

  • Variations to M and I are supported by the fabricatr package (and others)
  • Variations to D are supported by the randomizr package (and others)
  • Variations to A are supported by the estimatr package (and others)

2.8.1 Steps: A simple experimental design

N <- 100
b <- .5

design <- 
  declare_model(N = N, U = rnorm(N), 
                potential_outcomes(Y ~ b * Z + U)) + 
  declare_assignment(Z = simple_ra(N), Y = reveal_outcomes(Y ~ Z)) + 
  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) + 
  declare_estimator(Y ~ Z, inquiry = "ate", .method = lm_robust)

A few new elements here:

  • declare_model can be used much like mutate with multiple columns created in sequence
  • the potential_outcomes function is a special function that creates potential outcome columns
  • when you assign a treatment that affects an outcome you can use reveal_outcome to reveal the outcome; Z and Y are default

2.8.2 Steps: A simple experimental design

N <- 100
b <- .5

design <-
  declare_model(N = N, U = rnorm(N),
                potential_outcomes(Y ~ b * Z + U)) +
  declare_assignment(Z = simple_ra(N), Y = reveal_outcomes(Y ~ Z)) +
  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) +
  declare_estimator(Y ~ Z,
                    inquiry = "ate",
                    .method = lm_robust,
                    label = "estimator 1")

A few new elements here:

  • when you declare an estimator you should normally associate an inquiry with the estimator and provide the method to be used; lm_robust is default
  • you should generally label estimators as you may have many

2.8.3 Steps: Order matters

e.g. If you sample before defining the inquiry you get a different inquiry to if you sample after you define the inquiry

design_1 <- 
  declare_model(N = 1000, X = rep(0:1, N/2), Y = X + rnorm(N)) + 
  declare_sampling(S= strata_rs(strata = X, strata_prob = c(.2, .8))) +
  declare_inquiry(m = mean(Y))

design_1 |> draw_estimands()
  inquiry  estimand
1       m 0.7907839

2.8.4 Steps: Order matters

e.g. If you sample before defining the inquiry you get a different inquiry to if you sample after you define the inquiry

design_2 <- 
  declare_model(N = 1000, X = rep(0:1, N/2), Y = X + rnorm(N)) + 
  declare_inquiry(m = mean(Y)) +
  declare_sampling(S= strata_rs(strata = X, strata_prob = c(.2, .8))) 

design_2 |> draw_estimands()
  inquiry  estimand
1       m 0.5467558

2.8.5 M: Key extensions to model declaration

You can generate hierarchical data like this:

M <- 
    households = add_level(
      N = 100, 
      N_members = sample(c(1, 2, 3, 4), N, 
                         prob = c(0.2, 0.3, 0.25, 0.25), replace = TRUE)
    individuals = add_level(
      N = N_members, 
      age = sample(18:90, N, replace = TRUE)

2.8.6 M: Key extensions to model declaration

You can generate hierarchical data like this:

M() |> head() |> kable(digits = 2) |> kable_styling(font_size = 20)
households N_members individuals age
001 1 001 57
002 4 002 34
002 4 003 40
002 4 004 57
002 4 005 31
003 3 006 41

2.8.7 M: Key extensions to model declaration

You can generate panel data like this:

M <- 
    countries = add_level(
      N = 196, 
      country_shock = rnorm(N)
    years = add_level(
      N = 100, 
      time_trend = 1:N,
      year_shock = runif(N, 1, 10), 
      nest = FALSE
    observation = cross_levels(
      by = join_using(countries, years),
      observation_shock = rnorm(N),
      Y = 0.01 * time_trend + country_shock + year_shock + observation_shock 

2.8.8 M: Key extensions to model declaration

You can generate panel data like this:

M() |> head() |> kable(digits = 2) |> kable_styling(font_size = 20)
countries country_shock years time_trend year_shock observation observation_shock Y
001 -1.01 001 1 7.24 00001 0.14 6.38
002 1.59 001 1 7.24 00002 1.10 9.94
003 0.18 001 1 7.24 00003 0.94 8.37
004 -2.07 001 1 7.24 00004 0.21 5.40
005 0.22 001 1 7.24 00005 1.08 8.55
006 -0.37 001 1 7.24 00006 1.22 8.11

2.8.9 M: You can pull in preexisting data

M <- 
    data = baseline_data,
    attitudes = sample(1:5, N, replace = TRUE)

2.8.10 M: A simple experimental design

You can repeat steps and play with the order, always conscious of the direction of the pipe

design <- 
  declare_model(N = N, X = rep(0:1, N/2)) +
  declare_model(U = rnorm(N), potential_outcomes(Y ~ b * Z * X + U)) + 
  declare_assignment(Z = block_ra(blocks = X), Y = reveal_outcomes(Y ~ Z)) + 
  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) + 
  declare_inquiry(cate = mean(Y_Z_1[X==0] - Y_Z_0[X==0])) + 
  declare_estimator(Y ~ Z, inquiry = "ate", label = "ols") + 
  declare_estimator(Y ~ Z*X, inquiry = "cate", label = "fe")

2.8.11 M: You can generate multiple columns together

M2 <-
    draw_multivariate(c(X1, X2) ~ MASS::mvrnorm(
      n = 1000,
      mu = c(0, 0),
      Sigma = matrix(c(1, 0.3, 0.3, 1), nrow = 2)

2.8.12 M: You can generate multiple columns together

M2() |> head() |> kable(digits = 2) |> kable_styling(font_size = 28) 
X1 X2
-1.37 -0.97
1.33 -0.04
-0.56 1.33
0.29 -1.29
0.41 -0.58
-1.22 -1.16

2.8.13 M: Cluster structures with cluster correlations

M <-
  declare_model(households = add_level(N = 1000),
                individuals = add_level(
                  N = 4,
                  X = draw_normal_icc(
                    mean = 0,
                    clusters = households,
                    ICC = 0.65

2.8.14 M: Cluster structures with cluster correlations

model <- lm_robust(X ~ households, data = M())
[1] 0.6709427

2.8.15 I: Inquiries

Many causal inquiries are simple summaries of potential outcomes:

Inquiry Units Code
Average treatment effect in a finite population (PATE) Units in the population mean(Y_D_1 - Y_D_0)
Conditional average treatment effect (CATE) for X = 1 Units for whom X = 1 mean(Y_D_1[X == 1] - Y_D_0[X == 1])
Complier average causal effect (CACE) Complier units mean(Y_D_1[D_Z_1 > D_Z_0] - Y_D_0[D_Z_1 > D_Z_0])
Causal interactions of \(D_1\) and \(D_2\) Units in the population mean((Y_D1_1_D2_1 - Y_D1_0_D2_1) - (Y_D1_1_D2_0 - Y_D1_0_D2_0))

Generating potential outcomes columns gets you far

2.8.16 I: Inquiries

Often though we need to define inquiries as a function of continuous variables. For this generating a potential outcomes function can make life easier. This helps for:

  • Continuous quantities
  • Spillover quantities
  • Complex counterfactuals

2.8.17 I: Inquiries: Complex counterfactuals

Here is an example of using functions to define complex counterfactuals:

f_M <- function(X, UM) 1*(UM < X)
f_Y <- function(X, M, UY) X + M - .4*X*M + UY

design <- 
  declare_model(N = 100,
                X = simple_rs(N),
                UM = runif(N),
                UY = rnorm(N),
                M = f_M(X, UM),
                Y = f_Y(X, M, UY)) +
  declare_inquiry(Q1 = mean(f_Y(1, f_M(0, UM), UY) - f_Y(0, f_M(0, UM), UY)))

design |> draw_estimands() |> kable() |> kable_styling(font_size = 20)
inquiry estimand
Q1 1

2.8.18 I: Inquiries: Complex counterfactuals

Here is an example of using functions to define effects of continuous treatments.

f_Y <- function(X, UY) X - .25*X^2 + UY

design <- 
  declare_model(N = 100,
                X  = rnorm(N),
                UY = rnorm(N),
                Y = f_Y(X, UY)) +
    Q1 = mean(f_Y(X+1, UY) - f_Y(X, UY)),
    Q2 = mean(f_Y(1, UY) - f_Y(0, UY)),
    Q3 = (lm_robust(Y ~ X)|> tidy())[2,2]

design |> draw_estimands() |> kable() |> kable_styling(font_size = 20)
inquiry estimand
Q1 0.857143
Q2 0.750000
Q3 1.363886

which one is the ATE?

2.8.19 D: Assignment schemes

The randomizr package has a set of functions for different types of block and cluster assignments.

  • Simple random assignment: “Coin flip” or Bernoulli random assignment. All units have the same probability of assignment: simple_ra(N = 100, prob = 0.25)
  • Complete random assignment: Exactly m of N units are assigned to treatment, and all units have the same probability of assignment m/N complete_ra(N = 100, m = 40)

2.8.20 D: Assignment schemes

  • Block random assignment: Complete random assignment within pre-defined blocks. Units within the same block have the same probability of assignment m_b / N_b block_ra(blocks = regions)
  • Cluster random assignment: Whole groups of units are assigned to the same treatment condition. cluster_ra(clusters = households) * Block-and-cluster assignment: Cluster random assignment within blocks of clusters block_and_cluster_ra(blocks = regions, clusters = villages)

2.8.21 D: Assignment schemes

You can combine these in various ways. For examples with saturation random assignment first clusters are assigned to a saturation level, then units within clusters are assigned to treatment conditions according to the saturation level:

saturation = cluster_ra(clusters = villages, conditions = c(0, 0.25, 0.5, 0.75))
block_ra(blocks = villages, prob_unit = saturation)

2.8.22 A: Answers: terms

By default declare_estimates() assumes you are interested in the first term after the constant from the output of an estimation procedure.

But you can say what you are interested in directly using term and you can also associate different terms with different quantities of interest using inquiry.

design <-
    N = 100,
    X1 = rnorm(N),
    X2 = rnorm(N),
    X3 = rnorm(N),
    Y = X1 - X2 + X3 + rnorm(N)
  ) +
  declare_inquiries(ate_2 = -1, ate_3 = 1) +
  declare_estimator(Y ~ X1 + X2 + X3,
                    term = c("X2", "X3"),
                    inquiry = c("ate_2", "ate_3"))

design  |> run_design()  |> kable(digits = 2) |> kable_styling(font_size = 20)
inquiry estimand term estimator estimate std.error statistic p.value conf.low conf.high df outcome
ate_2 -1 X2 estimator -1.11 0.09 -11.83 0 -1.29 -0.92 96 Y
ate_3 1 X3 estimator 0.91 0.11 8.63 0 0.70 1.12 96 Y

2.8.23 A: Answers: terms

Sometimes it can be confusing what the names of a term is but you can figure this by running the estimation strategy directly. Here’s an example where the names of a term might be confusing.

lm_robust(Y ~ A*B, 
          data = data.frame(A = rep(c("a",  "b"), 3), 
                            B = rep(c("p", "q"), each = 3), 
                            Y = rnorm(6))) |>
  coef() |> kable() |> kable_styling(font_size = 20)
(Intercept) 0.984547
Ab -1.172676
Bq -1.976603
Ab:Bq 2.115862

The names are they appear in the output here is the name of the term that declare_estimator will look for.

2.8.24 A: Answers: other packages

DeclareDesign works natively with estimatr but you you can use whatever packages you like. You do have to make sure though that estimatr gets as input a nice tidy dataframe of estimates, and that might require some tidying.

design <- 
  declare_model(N = 1000, U = runif(N), 
                potential_outcomes(Y ~ as.numeric(U < .5 + Z/3))) + 
  declare_assignment(Z = simple_ra(N), Y = reveal_outcomes(Y ~ Z)) + 
  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) + 
  declare_estimator(Y ~ Z, inquiry = "ate", 
                    .method = glm, 
                    family = binomial(link = "probit"))

Note that we passed additional arguments to glm; that’s easy.

It’s not a good design though. Just look at the diagnosis:

2.8.25 A: Answers: other packages

  diagnose_design(design) |> write_rds("saved/probit.rds")

read_rds("saved/probit.rds") |> 
  reshape_diagnosis() |>
  kable() |> 
  kable_styling(font_size = 20)
Design Inquiry Estimator Term N Sims Mean Estimand Mean Estimate Bias SD Estimate RMSE Power Coverage
design ate estimator Z 500 0.33 0.97 0.64 0.09 0.64 1.00 0.00
(0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)

Why is it so terrible?

2.8.26 A: Answers: other packages

Because the probit estimate does not target the ATE directly; you need to do more work to get there.

You essentially have to write a function to get the estimates, calculate the quantity of interest and other stats, and turn these into a nice dataframe.

Luckily you can use the margins package with tidy to create a .summary function which you can pass to declare_estimator to do all this for you

tidy_margins <- function(x) 
  broom::tidy(margins::margins(x, data = x$data), conf.int = TRUE)

design <- design +  
  declare_estimator(Y ~ Z, inquiry = "ate", 
                    .method = glm, 
                    family = binomial(link = "probit"),
                    .summary = tidy_margins,
                    label = "margins")

2.8.27 A: Answers: other packages

  diagnose_design(design) |> write_rds("saved/probit_2.rds")

read_rds("saved/probit_2.rds") |> reshape_diagnosis() |> kable() |> 
  kable_styling(font_size = 20)
Design Inquiry Estimator Term N Sims Mean Estimand Mean Estimate Bias SD Estimate RMSE Power Coverage
design ate estimator Z 500 0.33 0.97 0.64 0.09 0.64 1.00 0.00
(0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
design ate margins Z 500 0.33 0.31 -0.02 0.02 0.03 1.00 0.90
(0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.01)

Much better

3 Causality. What’s a cause?

3.1 Potential outcomes and the counterfactual approach

Causation as difference making

3.1.1 Motivation

The intervention based motivation for understanding causal effects:

  • We want to know if a particular intervention (like aid) caused a particular outcome (like reduced corruption).
  • We need to know:
    1. What happened?
    2. What would the outcome have been if there were no intervention?
  • The problem:
    1. … this is hard
    2. … this is impossible

The problem in 2 is that you need to know what would have happened if things were different. You need information on a counterfactual.

3.1.2 Potential Outcomes

  • For each unit, we assume that there are two post-treatment outcomes: \(Y_i(1)\) and \(Y_i(0)\).
    • \(Y(1)\) is the outcome that would obtain if the unit received the treatment.
    • \(Y(0)\) is the outcome that would obtain if it did not.
  • The causal effect of Treatment (relative to Control) is: \(\tau_i = Y_i(1) - Y_i(0)\)
  • Note:
    • The causal effect is defined at the individual level.
    • There is no “data generating process” or functional form.
    • The causal effect is defined relative to something else, so a counterfactual must be conceivable (did Germany cause the second world war?).
    • Are there any substantive assumptions made here so far?

3.1.3 Potential Outcomes

Idea: A causal claim is (in part) a claim about something that did not happen. This makes it metaphysical.

3.1.4 Potential Outcomes

Now that we have a concept of causal effects available, let’s answer two questions:

  • TRANSITIVITY: If for a given unit \(A\) causes \(B\) and \(B\) causes \(C\), does that mean that \(A\) causes \(C\)?

3.1.5 Potential Outcomes

Now that we have a concept of causal effects available, let’s answer two questions:

  • TRANSITIVITY: If for a given unit \(A\) causes \(B\) and \(B\) causes \(C\), does that mean that \(A\) causes \(C\)?

  • A boulder is flying down a mountain. You duck. This saves your life.

  • So the boulder caused the ducking and the ducking caused you to survive.

  • So: did the boulder cause you to survive?

3.1.6 Potential Outcomes

CONNECTEDNESS Say \(A\) causes \(B\) — does that mean that there is a spatiotemporally continuous sequence of causal intermediates?

3.1.7 Potential Outcomes

CONNECTEDNESS Say \(A\) causes \(B\) — does that mean that there is a spatiotemporally continuous sequence of causal intermediates?

  • Person A is planning some action \(Y\); Person B sets out to stop them; person X intervenes and prevents person B from stopping person A. In this case Person A may complete their action, producing Y, without any knowledge that B and X even exist; in particular B and X need not be anywhere close to the action. So: did X cause Y?

3.1.8 Causal claims: Contribution or attribution?

The counterfactual model is about contribution and attribution in a very specific sense.

  • Focus is on non-rival contributions
  • Focus is on conditional attribution. Not: “what caused \(Y\)?” or “What is the cause of \(Y\)?”, but “did \(X\) cause \(Y\) given all other factors were what they were?”

3.1.9 Causal claims: Contribution or attribution?

Consider an outcome \(Y\) that might depend on two causes \(X_1\) and \(X_2\):

\[Y(0,0) = 0\] \[Y(1,0) = 0\] \[Y(0,1) = 0\] \[Y(1,1) = 1\]

What caused \(Y\)? Which cause was most important?

3.1.10 Causal claims: Contribution or attribution?

The counterfactual model is about attribution in a very conditional sense.

  • This is problem for research programs that define “explanation” in terms of figuring out the things that cause \(Y\)

  • Real difficulties conceptualizing what it means to say one cause is more important than another cause. What does that mean?

3.1.11 Causal claims: Contribution or attribution?

Erdogan’s increasing authoritarianism was the most important reason for the attempted coup

  • More important than Turkey’s history of coups?
  • What does that mean?

3.1.12 Causal claims: No causation without manipulation

  • Some seemingly causal claims not admissible.
  • To get the definition off the ground, manipulation must be imaginable (whether practical or not)
  • This renders thinking about effects of race and gender difficult
  • What does it mean to say that Aunt Pat voted for Brexit because she is old?

3.1.13 Causal claims: No causation without manipulation

  • Some seemingly causal claims not admissible.
  • To get the definition off the ground, manipulation must be imaginable (whether practical or not)
  • This renders thinking about effects of race and gender difficult
  • Compare: What does it mean to say that Southern counties voted for Brexit because they have many old people?

3.1.14 Causal claims: No causation without manipulation

More uncomfortably:

What does it mean to say that the tides are caused by the moon? What exactly do we have to imagine…

3.1.15 Causal claims: Causal claims are everywhere

  • Jack exploited Jill

  • It’s Jill’s fault that bucket fell

  • Jack is the most obstructionist member of Congress

  • Melania Trump stole from Michelle Obama’s speech

  • Activists need causal claims

3.1.16 Causal claims: What is actually seen?

  • We have talked about what’s potential, now what do we observe?
  • Say \(Z_i\) indicates whether the unit \(i\) is assigned to treatment \((Z_i=1)\) or not \((Z_i=0)\). It describes the treatment process. Then what we observe is: \[ Y_i = Z_iY_i(1) + (1-Z_i)Y_i(0) \]

This is sometimes called a “switching equation”

In DeclareDesign \(Y\) is realised from potential outcomes and assignment in this way using reveal_outcomes

3.1.17 Causal claims: What is actually seen?

  • Say \(Z\) is a random variable, then this is a sort of data generating process. BUT the key thing to note is

    • \(Y_i\) is random but the randomness comes from \(Z_i\) — the potential outcomes, \(Y_i(1)\), \(Y_i(0)\) are fixed
    • Compare this to a regression approach in which \(Y\) is random but the \(X\)’s are fixed. eg: \[ Y \sim N(\beta X, \sigma^2) \text{ or } Y=\alpha+\beta X+\epsilon, \epsilon\sim N(0, \sigma^2) \]

3.1.18 Causal claims: The estimand and the rub

  • The causal effect of Treatment (relative to Control) is: \[\tau_i = Y_i(1) - Y_i(0)\]
  • This is what we want to estimate.
  • BUT: We never can observe both \(Y_i(1)\) and \(Y_i(0)\)!
  • This is the fundamental problem (Holland (1986))

3.1.19 Causal claims: The rub and the solution

  • Now for some magic. We really want to estimate: \[ \tau_i = Y_i(1) - Y_i(0)\]

  • BUT: We never can observe both \(Y_i(1)\) and \(Y_i(0)\)

  • Say we lower our sights and try to estimate an average treatment effect: \[ \tau = \mathbb{E} [Y(1)-Y(0)]\]

  • Now make use of the fact that \[\mathbb E[Y(1)-Y(0)] = \mathbb E[Y(1)]- \mathbb E [Y(0)] \]

  • In words: The average of differences is equal to the difference of averages.

  • The magic is that while we can’t hope to measure the differences; we are good at measuring averages.

3.1.20 Causal claims: The rub and the solution

  • So we want to estimate \(\mathbb{E} [Y(1)]\) and \(\mathbb{E} [Y(0)]\).
  • We know that we can estimate averages of a quantity by taking the average value from a random sample of units
  • To do this here we need to select a random sample of the \(Y(1)\) values and a random sample of the \(Y(0)\) values, in other words, we randomly assign subjects to treatment and control conditions.
  • When we do that we can in fact estimate: \[ \mathbb {E}_N[Y_i(1) | Z_i = 1) - \mathbb {E}_N(Y_i(0) | Z_i = 0]\] which in expectation equals: \[ \mathbb{E} [Y_i(1) | Z_i = 1 \text{ or } Z_i = 0] - \mathbb{E} [Y_i(0) | Z_i = 1 \text{ or } Z_i = 0]\]

3.1.21 Causal claims: The rub and the solution

  • This highlights a deep connection between random assignment and random sampling: when we do random assignment we are in fact randomly sampling from different possible worlds.

3.1.22 Causal claims: The rub and the solution

This provides a positive argument for causal inference from randomization, rather than simply saying with randomization “everything else is controlled for”

Let’s discuss:

  • Does the fact that an estimate is unbiased mean that it is right?
  • Can a randomization “fail”?
  • Where are the covariates?

3.1.23 Causal claims: The rub and the solution

Idea: random assignment is random sampling from potential worlds: to understand anything you find, you need to know the sampling weights

3.1.24 Reflection

Idea: We now have a positive argument for claiming unbiased estimation of the average treatment effect following random assignment

But is the average treatment effect a quantity of social scientific interest?

3.1.25 Potential outcomes: why randomization works

The average of the differences \(\approx\) difference of averages

3.1.26 Potential outcomes: heterogeneous effects

The average of the differences \(\approx\) difference of averages

3.1.27 Potential outcomes: heterogeneous effects

Question: \(\approx\) or \(=\)?

3.1.28 Exercise your potential outcomes 1

Consider the following potential outcomes table:

Unit Y(0) Y(1) \(\tau_i\)
1 4 3
2 2 3
3 1 3
4 1 3
5 2 3

Questions for us: What are the unit level treatment effects? What is the average treatment effect?

3.1.29 Exercise your potential outcomes 2

Consider the following potential outcomes table:

In treatment? Y(0) Y(1)
Yes 2
No 3
No 1
Yes 3
Yes 3
No 2

Questions for us: Fill in the blanks.

  • Assuming a constant treatment effect of \(+1\)
  • Assuming a constant treatment effect of \(-1\)
  • Assuming an average treatment effect of \(0\)

What is the actual treatment effect?

3.2 Pause

Take a short break!

3.3 Endogeneous subgroups

3.3.1 Endogeneous Subgroups

Experiments often give rise to endogenous subgroups. The potential outcomes framework can make it clear why this can cause problems.

3.3.2 Heterogeneous Effects with Endogeneous Categories

  • Problems arise in analyses of subgroups when the categories themselves are affected by treatment

  • Example from our work:

    • You want to know if an intervention affects reporting on violence against women
    • You measure the share of all subjects that experienced violence that file reports
    • The problem is that which subjects experienced violence is itself a function of treatment

3.3.3 Heterogeneous Effects with Endogeneous Categories

  • V(t): Violence(Treatment)
  • R(t, v): Reporting(Treatment, Violence)
V(0) V(1) R(0,1) R(1,1) R(0,0) R(1,0)
Type 1 (reporter) 1 1 1 1 0 0
Type 2 (non reporter) 1 0 0 0 0 0
  • Expected reporting given violence in control = Pr(Type 1) (explanation: both types see violence but only Type 1 reports)

  • Expected reporting given violence in treatment = 100% (explanation: only Type 1 sees violence and this type also reports)

So you might infer a large effect on violence reporting.

Question: What is the actual effect of treatment on the propensity to report violence?

3.3.4 Heterogeneous Effects with Endogeneous Categories

It is possible that in truth no one’s reporting behavior has changed, what has changed is the propensity of people with different propensities to report to experience violence:

Reporter No Violence Violence % Report
Control Yes
Treatment Yes

3.3.5 Heterogeneous Effects with Endogeneous Categories

This problem can arise as easily in seemingly simple field experiments. Example:

  • In one study we provided constituents with information about performance of politicians
  • We told politicians in advance so that they could take action
  • We wanted to see whether voters punished poorly performing politicians

What’s the problem?

3.3.6 Endogeneous Categories: Test yourself

Question for us:

  • Quotas for women are randomly placed in a set of constituencies in year 1. All winners in these areas are women; in other areas only some are.
  • In year 2 these quotas are then lifted.

Which problems face an endogenous subgroup issue?:

3.3.7 Endogeneous Categories: Test yourself

Which problems face an endogenous subgroup issue?:

  1. You want to estimate the likelihood that a woman will stand for reelection in treatment versus control areas in year 2.
  2. You want to estimate whether incumbents are more likely to be reelected in treatment versus control areas in year 2
  3. You want to estimate how much treatment areas have more re-elected incumbents in elections in year 2 compared to control

3.3.8 Endogeneous Categories: Responses

In such cases you can:

  • Examine the joint distribution of multiple outcomes
  • Condition on pretreatment features only
  • Engage in mediation analysis

3.3.9 Missing data can create an endogeneous subgroup problem

  • It is well known that missing data can undo the magic of random assignment.
  • One seemingly promising approach is to match into pairs ex ante and drop pairs together ex post.
  • Say potential outcomes looked like this (2 pairs of 2 units):
Pair I I II II
Unit 1 2 3 4 Average
Y(0) 0 0 0 0
Y(1) -3 1 1 1
\(\tau\) -3 1 1 1 0

3.3.10 Missing data

  • Say though that treated cases are likely to drop out of the sample if things go badly (e.g. they get a negative score or die)
  • Then you might see no attrition if those would-be attritors are not treated.
  • You might assume you have no problem (after all, no attrition).
  • No missing data when the normal cases happens to be selected
Pair I I II II
Unit 1 2 3 4 Average
Y(0) 0 0 0
Y(1) 1 1 1
\(\hat{\tau}\) 1

3.3.11 Missing data

  • But in cases in which you have attrition, dropping the pair doesn’t necessarily help.
  • The problem is potential missingness still depends on potential outcomes
  • The kicker is that the method can produce bias even if (in fact) there is no attrition!
  • But missing data when the vulnerable cases happens to be selected
Pair I I II II
Unit 1 2 3 4 Average
Y(0) [0] 0 0
Y(1) [-3] 1 1
\(\hat{\tau}\) 1

3.3.12 Missing data

Note: The right way to think about this is that bias is a property of the strategy over possible realizations of data and not normally a property of the estimator conditional on the data.

3.3.13 Multistage games

Multistage games can also present an endogenous group problem since collections of late stage players facing a given choice have been created by early stage players.

3.3.14 Multistage games

Question: Does visibility alter the extent to which subjects follow norms to punish antisocial behavior (and reward prosocial behavior)? Consider a trust game in which we are interested in how information on receivers affects their actions

Table 1: Return rates given investments under different conditions.
Return rates given investments under different conditions
Average % returned
Visibility Treatment % invested (average) ...when 10% invested ...when 50% invested
Control: Masked information on respondents 30% 20% 40%
Treatment: Full information on respondents 30% 0% 60%

What do we think? Does visibility make people react more to investments?

3.3.15 Multistage games

Imagine you could see all the potential outcomes, and they looked like this:

Table 2: Potential outcomes with (and without) identity protection.
Potential outcomes with (and without) identity protection
Responder’s return decision (given type)
Offered behavior Nice 1 Nice 2 Nice 3 Mean 1 Mean 2 Mean 3
Invest 10% 60% 60% 60% 0% 0% 0% 30%
Invest 50% 60% 60% 60% 0% 0% 0% 30%

Conclusion: Both the offer and the information condition are completely irrelevant for all subjects.

3.3.16 Multistage games

Unfortunately you only see a sample of the potential outcomes, and that looks like this:

Table 3: Outcomes when respondent is visible.
Outcomes when respondent is visible
Responder’s return decision (given type)
Offered behavior Nice 1 Nice 2 Nice 3 Mean 1 Mean 2 Mean 3
Invest 10% 0% 0% 0% 0%
Invest 50% 60% 60% 60% 60%

False Conclusion: When not protected, responders condition behavior strongly on offers (because offerers can select on type accurately)

In fact: The nice types invest more because they are nice. The responders return more to the nice types because they are nice.

3.3.17 Multistage games

Unfortunately you only see a (noisier!) sample of the potential outcomes, and that looks like this:

Table 4: Outcomes when respondent is not visible.
Outcomes when respondent is not visible
Responder’s return decision (given type)
Offered behavior Nice 1 Nice 2 Nice 3 Mean 1 Mean 2 Mean 3
Invest 10% 60% 0% 0% 20%
Invest 50% 60% 60% 0% 40%

False Conclusion: When protected, responders condition behavior less strongly on offers (because offerers can select on type less accurately)

3.3.18 Multistage games

What to do?


  1. Analysis could focus on the effect of treatment on respondent behavior, directly.
    • This would get the correct answer but to a different question (Does information affect the share of contributions returned by subjects on average?)
  2. Strategy method can sometimes help address the problem, but note that that is (a) changing the question and (b) putting demands on respondent imagination and honesty
  3. First mover action could be directly manipulated, but unless deception is used that is also changing the question
  4. First movers could be selected because they act in predictable ways (bordering on deception?)

Take away: Proceed with extreme caution when estimating effects beyond the first stage.

3.4 Pause

Take a short break!

3.5 DAGs

Directed Acyclic Graphs

3.5.1 Key insight

The most powerful results from the study of DAGs give procedures for figuring out when conditioning aids or hinders causal identification.

  • You can read off a confounding variable from a DAG.
    • You figure out what to condition on for causal identification.
  • You can read off “colliders” from a DAG
    • Sometimes you have to avoid conditioning on these
  • Sometimes a variable might be both, so
    • you have to condition on it
    • you have to avoid conditioning on it
    • Ouch.

3.5.2 Key resource

3.5.3 Challenge for us

  • Say you don’t like graphs. Fine.

  • Consider this causal structure:

    • \(Z = f_1(U_1, U_2)\)
    • \(X = f_2(U_2)\)
    • \(Y = f_3(X, U_1)\)

Say \(Z\) is temporally prior to \(X\); it is correlated with \(Y\) (because of \(U_1\)) and with \(X\) (because of \(U_2\)).

Question: Would it be useful to “control” for \(Z\) when trying to estimate the effect of \(X\) on \(Y\)?

3.6 Challenge for us

  • Say you don’t like graphs. Fine.

  • Consider this causal structure:

    • \(Z = f_1(U_1, U_2)\)
    • \(X = f_2(U_2)\)
    • \(Y = f_3(X, U_1)\)

Question: Would it be useful to “control” for \(Z\) when trying to estimate the effect of \(X\) on \(Y\)?

Answer: Hopefully by the end of today you should see that the answer is obviously (or at least, plausibly) “no.”

3.7 Conditional independence and graph structure

  • What DAGs do is tell you when one variable is independent of another variable given some third variable.
  • Intuitively:
    • what variables “shield off” the influence of one variable on another
    • e.g. If inequality causes revolution via discontent, then inequality and revolution should be related to each other overall, but not related to each other among those that are content or among those that are discontent

3.7.1 Conditional independence

Variable sets \(A\) and \(B\) are conditionally independent, given \(C\) if for all \(a\), \(b\), \(c\):

\[\Pr(A = a | C = c) = \Pr(A = a | B = b, C = c)\]

Informally; given \(C\), knowing \(B\) tells you nothing more about \(A\).

3.7.2 Conditional independence on paths graphs

Three elemental relations of conditional independence.

3.7.3 Conditional independence from graphs

\(A\) and \(B\) are conditionally independent, given \(C\) if on every path between \(A\) and \(B\):

  • there is some chain (\(\bullet\rightarrow \bullet\rightarrow\bullet\) or \(\bullet\leftarrow \bullet\leftarrow\bullet\)) or fork (\(\bullet\leftarrow \bullet\rightarrow\bullet\)) with the central element in \(C\),


  • there is an inverted fork (\(\bullet\rightarrow \bullet\leftarrow\bullet\)) with the central element (and its descendants) not in \(C\)


  • In this case we say that \(A\) and \(B\) are d-separated by \(C\).
  • \(A\), \(B\), and \(C\) can all be sets
  • Note that a path can involve arrows pointing any direction \(\bullet\rightarrow \bullet\rightarrow \bullet\leftarrow \bullet\rightarrow\bullet\)

3.7.4 Test yourself

Are A and D unconditionally independent:

  • if you do not condition on anything?
  • if you condition on B?
  • if you condition on C?
  • if you condition on B and C?

3.7.5 Back to this example

  • \(Z = f_1(U_1, U_2)\)
  • \(X = f_2(U_2)\)
  • \(Y = f_3(X, U_1)\)
  1. Let’s graph this
  2. Now: say we removed the arrow from \(X\) to \(Y\)
    • Would you expect to see a correlation between \(X\) and \(Y\) if you did not control for \(Z\)
    • Would you expect to see a correlation between \(X\) and \(Y\) if you did control for \(Z\)

3.7.6 Back to this example

Now: say we removed the arrow from \(X\) to \(Y\)

  • Would you expect to see a correlation between \(X\) and \(Y\) if you did not control for \(Z\)?
  • Would you expect to see a correlation between \(X\) and \(Y\) if you did control for \(Z\)?

3.7.7 Conditional distributions given do operations

We nor formalize things a little more:

  • The probability of outcome \(x\) can always be written in the form \[P(X_1 = x_1)P(X_2 = x_2|X_1=x_1)(X_3 = x_3|X_1=x_1, X_2 = x_2)\dots\]

  • This can be done with any ordering of variables.

  • We want to describe the distribution in a simpler way that takes account of parent-child relations and that can be used to capture interventions.

3.7.8 Conditional distributions given do operations

  • Given an ordering of variables, the Markovian parents of variable \(X_j\) are the minimal set of variables such that when you condition on these, \(X_j\) is independent of all other prior variables in the ordering

\[P: P(x_1,x_2,\dots x_n) = \prod_{}P(x_j|pa_j)\]

  • A DAG is “causal Bayesian network” or “Causal DAG” if (and only if) the probability distribution resulting from setting some set \(X_i\) to \(\hat{x}'_i\) (i.e. do(X=x')) is:

\[P_{\hat{x}_i}: P(x_1,x_2,\dots x_n|\hat{x}_i) = \mathbb{I}(x_i = x_i')\prod_{-i}P(x_j|pa_j)\]

where the parents \(pa_j\) are the parents on the graph (parents of \(x_i\) are the nodes with arrows pointing into \(x_i\).)

3.7.9 Conditional distributions given do operations


\[P_{\hat{x}_i}: P(x_1,x_2,\dots x_n|\hat{x}_i) = \mathbb{I}(x_i = x_i')\prod_{-i}P(x_j|pa_j)\]

  • This means that there is only probability mass on vectors in which \(x_i = x_i'\) (reflecting the success of control) and all other variables are determined by their parents, given the values that have been set for \(x_i\).

  • Such expressions will be critical later when we want to consider identificatation.

  • They let us assess whether the probability of an outcome \(y\), say, depends on the value of some other node, given some other node, or given interventions on some other node.

3.7.10 Conditional distributions given do operations

Illustration, say we have binary \(X\) causes binary \(M\) which cases binary \(Y\); say we intervene and set \(M=1\). Then what is the distribution of \((x,m,y)\)?

It is:

\[\Pr(x,m,y) = \Pr(x)\mathbb I(M = 1)\Pr(y|m)\]

3.7.11 Application

We will use these ideas to motivate a general procedure for learning about, updating over, and querying, causal models.

3.8 Causal models

3.8.1 From graphs to Causal Models

A “causal model” is:


  • An ordered list of \(n\) endogenous nodes, \(\mathcal{V}= (V^1, V^2,\dots, V^n)\), with a specification of a range for each of them
  • A list of \(n\) exogenous nodes, \(\Theta = (\theta^1, \theta^2,\dots , \theta^n)\)
  1. A list of \(n\) functions \(\mathcal{F}= (f^1, f^2,\dots, f^n)\), one for each element of \(\mathcal{V}\) such that each \(f^i\) takes as arguments \(\theta^i\) as well as elements of \(\mathcal{V}\) that are prior to \(V^i\) in the ordering

  2. A probability distribution over \(\Theta\)

3.8.2 From graphs to Causal Models

A simple causal model in which high inequality (\(I\)) affects democratization (\(D\)) via redistributive demands (\(R\)) and mass mobilization (\(M\)), which is also a function of ethnic homogeneity (\(E\)). Arrows show relations of causal dependence between variables.

3.8.3 Effects on a DAG

  • Learning about effects given a model means learning about \(F\) and also the distribution of shocks (\(\Theta\)).

  • For discrete data this can be reduced to a question about learning about the distribution of \(\Theta\) only.

3.8.4 Effects on a DAG

For instance the simplest model consistent with \(X \rightarrow Y\):

  • Endogenous Nodes = \(\{X, Y\}\), both with range \(\{0,1\}\)

  • Exogenous Nodes = \(\{\theta^X, \theta^Y\}\), with ranges \(\{\theta^X_0, \theta^X_1\}\) and \(\{\theta^Y_{00}\theta^Y_{01}, \theta^Y_{10}, \theta^Y_{11}\}\)

  • Functional equations:

    • \(f_Y\): \(\theta^Y =\theta^Y_{ij} \rightarrow \{Y = i \text{ if } X=0; Y = j \text{ if } X=1\}\)
    • \(f_X\): \(\theta^X =\theta^X_{i} \rightarrow \{X = i\}\)
  • Distributions on \(\Theta\): \(\Pr(\theta^i = \theta^i_k) = \lambda^i_k\)

3.8.5 Effects as statement about exogeneous variables

What is the probability that \(X\) has a positive causal effect on \(Y\)?

  • This is equivalent to: \(\Pr(\theta^Y =\theta^Y_{01}) = \lambda^Y_{01}\)

  • So we want to learn about the distributions of the exogenous nodes

  • This general principle extends to a vast class of causal models

3.8.6 Recap: Things you need to know about causal inference

  1. A causal claim is a statement about what didn’t happen.
  2. If you know that \(A\) causes \(B\) and that \(B\) causes \(C\), this does not mean that you know that \(A\) causes \(C\).
  3. There is no causation without manipulation.
  4. There is a fundamental problem of causal inference.
  5. You can estimate average causal effects even if you cannot observe any individual causal effects.
  6. Estimating average causal effects via differences in means does not require that treatment and control groups are identical.
  7. Estimating average causal effects via differences in means is fraught when you condition on post treatment variables or on colliders.

4 Inquiries

Well posed questions

4.1 Outline

  • Types of estimands
  • Principal strata
  • Identification
  • Backdoor
  • Frontdoor
  • dagitty

4.2 Estimands and inquiries

  • Your inquiry is your question and the estimand is the true (generally unknown) answer to the inquiry
  • The estimand is the thing you want to estimate
  • If you are estimating something you should be able to say what your estimand is
  • You are responsible for your estimand. Your estimator will not tell you what your estimand is
  • Just because you can calculate something does not mean that you have an estimand
  • You can test a hypothesis without having an estimand

Read: II ch 4, DD, ch 7

4.2.1 Estimands: ATE, ATT, ATC, S-, P-

  • ATE is Average Treatment Effect (all units)
  • ATT is Average Treatment Effect on the Treated
  • ATC is Average Treatment Effect on the Controls

4.2.2 Estimands: ATE, ATT, ATC, S-, P-

Say that units are randomly assigned to treatment in different strata (maybe just one); with fixed, though possibly different, shares assigned in each stratum. Then the key estimands and estimators are:

Estimand Estimator
\(\tau_{ATE} \equiv \mathbb{E}[\tau_i]\) \(\widehat{\tau}_{ATE} = \sum\nolimits_{x} \frac{w_x}{\sum\nolimits_{j}w_{j}}\widehat{\tau}_x\)
\(\tau_{ATT} \equiv \mathbb{E}[\tau_i | Z_i = 1]\) \(\widehat{\tau}_{ATT} = \sum\nolimits_{x} \frac{p_xw_x}{\sum\nolimits_{j}p_jw_j}\widehat{\tau}_x\)
\(\tau_{ATC} \equiv \mathbb{E}[\tau_i | Z_i = 0]\) \(\widehat{\tau}_{ATC} = \sum\nolimits_{x} \frac{(1-p_x)w_x}{\sum\nolimits_{j}(1-p_j)w_j}\widehat{\tau}_x\)

where \(x\) indexes strata, \(p_x\) is the share of units in each stratum that is treated, and \(w_x\) is the size of a stratum.

4.2.3 Estimands: ATE, ATT, ATC, S-, P-, C-

In addition, each of these can be targets of interest:

  • for the population, in which case we refer to PATE, PATT, PATC and \(\widehat{PATE}, \widehat{PATT}, \widehat{PATC}\)
  • for a sample, in which case we refer to SATE, SATT, SATC, and \(\widehat{SATE}, \widehat{SATT}, \widehat{SATC}\)

And for different subgroups,

  • given some value on a covariate, in which case we refer to CATE (conditional average treatment effect)

4.2.4 Broader classes of estimands: LATE/CATE

The CATEs are conditional average treatment effects, for example the effect for men or for women. These are straightfoward.

However we might also imagine conditioning on unobservable or counterfactual features.

  • The LATE (or CACE: “complier average causal effect”) asks about the effect of a treatment (\(X\)) on an outcome (\(Y\)) for people that are responsive to an encouragement (\(Z\))

\[LATE = \frac{1}{|C|}\sum_{j\in C}(Y_j(X=1) - Y_j(X=0))\] \[C:=\{j:X_j(Z=1) > X_j(Z=0) \}\]

We will return to these in the study of instrumental variables.

4.2.5 Quantile estimands

Other ways to condition on potential outcomes:

  • A quantile treatment effect: You might be interested in the difference between the median \(Y(1)\) and the median \(Y(0)\) (Imbens and Rubin (2015) 20.3.1)
  • or even be interested in the median \(Y(1) - Y(0)\). Similarly for other quantiles.

4.2.6 Model estimands

Many inquiries are averages of individual effects, even if the groups are not known, but they do not have to be:

  • The RDD estimand is a statement about what effects would be at a threshold; it can be defined under a model even if no actual individuals are at the threshold. We imagine average potential outcomes as a function of treatment \(Z\) and running variable \(X\), \(f(z, x)\) and define: \[\tau_{RDD} := f(1, x^*) - f(0, x^*)\]

4.2.7 Distribution estimands

Many inquiries are averages of individual effects, even if the groups are not known,

But they do not have to be:

  • Inquiries might relate to distributional quantities such as:

    • The effect of treatment on the variance in outcomes: \(var(Y(1)) - var(Y(0))\)
    • The variance of treatment effects: \(var(Y(1) - Y(0))\)
    • Other inequality measures (e.g. Ginis; (Imbens and Rubin (2015) 20.3.2))

You might even be interested in \(\min(Y_i(1) - Y_i(0))\).

4.2.8 Spillover estimands

There are lots of interesting “spillover” estimands.

Imagine there are three individuals and each person’s outcomes depends on the assignments of all others. For instance \(Y_1(Z_1, Z_2, Z_3\), or more generally, \(Y_i(Z_i, Z_{i+1 (\text{mod }3)}, Z_{i+2 (\text{mod }3)})\).

Then three estimands might be:

  • \(\frac13\left(\sum_{i}{Y_i(1,0,0) - Y_i(0,0,0)}\right)\)
  • \(\frac13\left(\sum_{i}{Y_i(1,1,1) - Y_i(0,0,0)}\right)\)
  • \(\frac13\left(\sum_{i}{Y_i(0,1,1) - Y_i(0,0,0)}\right)\)

Interpret these. What others might be of interest?

4.2.9 Differences in CATEs and interaction estimands

A difference in CATEs is a well defined estimand that might involve interventions on one node only:

  • \(\mathbb{E}_{\{W=1\}}[Y(X=1) - Y(X=0)] - \mathbb{E}_{\{W=0\}}[Y(X=1) - Y(X=0)]\)

It captures differences in effects.

An interaction is an effect on an effect:

  • \(\mathbb{E}[Y(X=1, W=1) - Y(X=0, W=1)] - \mathbb{E}[Y(X=1, W=0) - Y(X=0, W=0)]\)

Note in the latter the expectation is taken over the whole population.

4.2.10 Mediation estimands and complex counterfactuals

Say \(X\) can affect \(Y\) directly, or indirectly through \(M\). then we can write potential outcomes as:

  • \(Y(X=x, M=m)\)
  • \(M(X=x)\)

We can then imagine inquiries of the form:

  • \(Y(X=1, M=M(X=1)) - Y(X=0, M=M(X=0))\)
  • \(Y(X=1, M=1) - Y(X=0, M=1)\)
  • \(Y(X=1, M=M(X=1)) - Y(X=1, M=M(X=0))\)

Interpret these. What others might be of interest?

4.2.11 Mediation estimands and complex counterfactuals

Again we might imagine that these are defined with respect to some group:

  • \(A = \{i|Y_i(1, M(X=1)) > Y_i(0, M(X=0))\}\)
  • \(\frac{1}{|A|} \sum_{i\in A}(Y(1, 1) > Y(0, 1))\)

here, among those for whom \(X\) has a positive effect on \(Y\), for what share would there be a positive effect if \(M\) were fixed at 1?

4.2.12 Causes of effects and effects of causes

In qualitative research a particularly common inquiry is “did \(X=1\) cause \(Y=1\)?

This is often given as a probability, the “probability of causation” (though at the case level we might better think of this probability as an estimate rather than an estimand):

\[\Pr(Y_i(0) = 0 | Y_i(1) = 1, X = 1)\]

4.2.13 Causes of effects and effects of causes

Intuition: What’s the probability \(X=1\) caused \(Y=1\) in an \(X=1, Y=1\) case drawn from a large population with the following experimental distribution:

Y=0 Y=1 All
X=0 1 0 1
X=1 0.25 0.75 1

4.2.14 Causes of effects and effects of causes

Intuition: What’s the probability \(X=1\) caused \(Y=1\) in an \(X=1, Y=1\) case drawn from a large population with the following experimental distribution:

Y=0 Y=1 All
X=0 0.75 0.25 1
X=1 0.25 0.75 1

4.2.15 Actual causation

Other inquiries focus on distinguishing between causes.

For the Billy Suzy problem (Hall 2004), Halpern (2016) focuses on “actual causation” as a way to distinguish between Suzy and Billy:

Imagine Suzy and Billy, simultaneously throwing stones at a bottle. Both are excellent shots and hit whatever they aim at. Suzy’s stone hits first, knocks over the bottle, and the bottle breaks. However, Billy’s stone would have hit had Suzy’s not hit, and again the bottle would have broken. Did Suzy’s throw cause the bottle to break? Did Billy’s?

4.2.16 Actual causation

Actual Causation:

  1. \(X=x\) and \(Y=y\) both happened;
  2. there is some set of variables, \(\mathcal W\), such that if they were fixed at the levels that they actually took on in the case, and if \(X\) were to be changed, then \(Y\) would change (where \(\mathcal W\) can also be an empty set);
  3. no strict subset of \(X\) satisfies 1 and 2 (there is no redundant part of the condition, \(X=x\)).

4.2.17 Actual causation

  • Suzy: Condition 2 is met if Suzy’s throw made a difference, counterfactually speaking—with the important caveat that, in determining this, we are permitted to condition on Billy’ stone not hitting the bottle.
  • Billy: Condition 2 is not met.

An inquiry: for what share in a population is a possible cause an actual cause?

4.2.18 Pearl’s ladder

Pearl (e.g. Pearl and Mackenzie (2018)) describes three types of inquiry:

Level Activity Inquiry
Association “Seeing” If I see \(X=1\) should I expect \(Y=1\)?
Intervention “Doing” If I set \(X\) to \(1\) should I expect \(Y=1\)?
Counterfactual “Imagining” If \(X\) were \(0\) instead of 1, would \(Y\) then be \(0\) instead of \(1\)?

4.2.19 Pearl’s ladder

We can understand these as asking different types of questions about a causal model

Level Activity Inquiry
Association “Seeing” \(\Pr(Y=1|X=1)\)
Intervention “Doing” \(\mathbb{E}[\mathbb{I}(Y(1)=1)]\)
Counterfactual “Imagining” \(\Pr(Y(1)=1 \& Y(0)=0)\)

The third is qualitatively different because it requires information about two mutually incompatible conditions for units. This is not (generally ) recoverable directly from knowledge of \(\Pr(Y(1)=1)\) and \(\Pr(Y(0)=0)\).

4.3 Inquiries as statements about principal strata

Given a causal model over nodes with discrete ranges, inquiries can generally be described as summaries of the distributions of exogenous nodes.

We already saw two instances of this:

  • The probability that \(X\) has a positive effect on \(Y\) in an \(X \rightarrow Y\) model is \(\lambda^Y_{01}\) (last lecture)
  • The share of “compliers” in an IV model \(Z \rightarrow X \rightarrow Y \leftrightarrow X\) is \(\lambda^X_{01}\)

4.4 Identification

What it is. When you have it. What it’s worth.

4.4.1 Identification

Informally a quantity is “identified” if it can be “recovered” once you have enough data.

Say for example average wage is \(x\) in some very large population. If I gather lots and lots of data on the wages of individuals and take the average then then my estimate will ultimately let be figure out \(x\).

  • If \(x\) is $1 then my estimate will end up centered on $1.
  • If it is $2 it will end up centered on $2.

4.4.2 Identification (Definition)

  • Identifiability Let \(Q(M)\) be a query defined over a class of models \(\mathcal M\), then \(Q\) is identifiable if \(P(M_1) = P(M_2) \rightarrow Q(M_1) = Q(M_1)\).

  • Identifiability with constrained data Let \(Q(M)\) be a query defined over a class of models \(\mathcal M\), then \(Q\) is identifiable from features \(F(M)\) if \(F(M_1) = F(M_2) \rightarrow Q(M_1) = Q(M_1)\).

Based on Defn 3.2.3 in Pearl.

  • Essentially: Each underlying value produces a unique data distribution. When you see that distribution you recover the parameter.

4.4.3 Example without identification 1

Informally a quantity is “identified” if it can be “recovered” once you have enough data.

  • Say for example average wage is \(x^m\) for men and \(x^w\) for women (in some very large population).
  • If I gather lots and lots of data on the wages of (male and female) couples, e.g. \(x^c_i = x^m_i + x^w_i\) then, although this will be informative, it will never be sufficient to recover \(x^m\) for men and \(x^w\).
  • I can recover \(x^c\), but there are too many combinations of possible values of \(x^m\) and \(x^w\) consistent with the observed data.

4.4.4 Example without identification 2

  • share \(a\) of units have negative effects
  • \(b\) have positive effects
  • \(c\) that always have \(Y=0\) and
  • \(d\) always have \(Y=1\).

Then with very large (experimental) data we observe:

Y = 0 Y = 1
X = 0 \(\alpha_{00} \rightarrow b/2 + c/2\) \(\alpha_{01} \rightarrow a/2 + d/2\)
X = 1 \(\alpha_{10} \rightarrow a/2 + c/2\) \(\alpha_{11} \rightarrow b/2 + d/2\)

What quantities are identified?

  • \(b-a\)?
  • \(d-c\)?
  • \(b\)?

4.4.5 Example without identification 2

What if we :

  • knew that \(a=0\)
  • observed that \(\alpha_01=0\)
Y = 0 Y = 1
X = 0 \(\alpha_{00} \rightarrow b/2 + c/2\) \(\alpha_{01} \rightarrow a/2 + d/2\)
X = 1 \(\alpha_{10} \rightarrow a/2 + c/2\) \(\alpha_{11} \rightarrow b/2 + d/2\)

What quantities are now identified?

  • \(b-a\)?
  • \(d-c\)?
  • \(b\)?

4.4.6 Identification : Goal

Our goal in causal inference is to estimate quantities such as:


where \(\hat{x}\) is interpreted as \(X\) set to \(x\) by “external” control. Equivalently: \(do(X=x)\) or sometimes \(X \leftarrow x\).

  • If this quantity is identifiable then we can recover it with infinite data.

  • If it is not identifiable, then, even in the best case, we are not guaranteed to get the right answer.

Are there general rules for determining whether this quantity can be identified? Yes.

4.4.7 Identification : Goal

Note first, identifying


is easy.

But we are not always interested in identifying the distribution of \(Y\) given observed values of \(x\), but rather, the distribution of \(Y\) if \(X\) is set to \(x\).

4.5 Levels and effects

If we can identify the controlled distribution we can calculate other causal quantities of interest.

For example for a binary \(X, Y\) the causal effect of \(X\) on the probability that \(Y=1\) is:

\[\Pr(Y=1|\hat{x}=1) - \Pr(Y=1|\hat{x}=0)\]

Again, this is not the same as:

\[\Pr(Y=1|x=1) - \Pr(Y=1|x=0)\]

It’s the difference between seeing and doing.

4.5.1 When to condition? What to condition on?

The key idea is that you want to find a set of variables such that when you condition on these you get what you would get if you used a do operation.


  • You could imagine creating a “mutilated” graph by removing all the arrows leading out of X
  • Then select a set of variables, \(Z\), such that \(X\) and \(Y\) are d-separated by \(Z\) on the the mutilated graph
  • When you condition on these you are making sure that any covariation between \(X\) and \(Y\) is covariation that is due to the effects of \(X\)

4.5.2 Illustration

4.5.3 Illustration: Remove paths out

4.5.4 Illustration: Block backdoor path

4.5.5 Illustration: Why not like this?

4.5.6 Identification

  • Three results (“Graphical Identification Criteria”)
    • Backdoor criterion
    • Adjustment criterion
    • Frontdoor criterion
  • There are more

4.5.7 Backdoor Criterion: (Pearl 1995)

The backdoor criterion is satisfied by \(Z\) (relative to \(X\), \(Y\)) if:

  1. No node in \(Z\) is a descendant of \(X\)
  2. \(Z\) blocks every backdoor path from \(X\) to \(Y\) (i.e. every path that contains an arrow into \(X\))

In that case you can identify the effect of \(X\) on \(Y\) by conditioning on \(Z\):

\[P(Y=y | \hat{x}) = \sum_z P(Y=y| X = x, Z=z)P(z)\] (This is eqn 3.19 in Pearl (2000))

4.5.8 Backdoor Criterion: (Pearl 1995)

\[P(Y=y | \hat{x}) = \sum_z P(Y=y| X = x, Z=z)P(z)\]

  • No notion of a linear control or anything like that; idea really is like blocking: think lots of discrete data and no missing patterns
  • Note this is a formula for a (possibly counterfactual) level; a counterfactual difference would be given in the obvious way by:

\[P(Y=y | \hat{x}) - P(Y=y | \hat{x}')\]

4.5.9 Backdoor Proof

Following Pearl (2009), Chapter 11. Let \(T\) denote the set of parents of \(X\): \(T := pa(X)\), with (possibly vector valued) realizations \(t\). These might not all be observed.

If the backdoor criterion is satisfied, we have:

  1. \(Y\) is independent of \(T\), given \(X\) and observed data, \(Z\) (since \(Z\) blocks backdoor paths)
  2. \(X\) is independent of \(Z\) given \(T\). (Since \(Z\) includes only nondescendents)
  • Key idea: The intervention level relates to the observational level as follows: \[p(y|\hat{x}) = \sum_{t\in T} p(t)p(y|x, t)\]

  • Think of this as fully accounting for the (possibly unobserved) causes of \(X\), \(T\)

4.5.10 Backdoor Proof

We want to get to:

\[p(y|\hat{x}) = \sum_{t\in T} p(t)p(y|x, t)\]

  • But of course we do not observe \(T\), rather we observe \(Z\). So we now need to write everything in terms of \(Z\) rather than \(T\).

We bring \(Z\) into the picture by writing:

\[p(y|\hat{x}) = \sum_{t\in T} p(t) \sum_z p(y|x, t, z)p(z|x, t)\]

now we want to get rid of \(T\)

4.5.11 Backdoor Proof

now we want to get rid of \(T\)

  • Using the two conditions from the backdoor definition above:

    1. replace \(p(y|x, t, z)\) with \(p(y | x, z)\)
    2. replace \(p(z|x, t)\) with \(p(z|t)\)

This gives: \[p(y|\hat x) = \sum_{t \in T} p(t) \sum_z p(y|x, z)p(z|t)\]

Cleaning up, we can get rid of \(T\):

\[p(y|\hat{x}) = \sum_z p(y|x, z)\sum_{t\in T} p(z|t)p(t) = \sum_z p(y| x, z)p(z)\]

4.5.12 Backdoor proof figure

For intuition:

We would be happy if we could condition on the parent \(T\), but \(T\) is not observed. However we can use \(Z\) instead making use of the fact that:

  1. \(p(y|x, t, z) = p(y | x, z)\) (since \(Z\) blocks)
  2. \(p(z|x, t) = p(z|t)\) (since \(Z\) is upstream and blocked by parents, \(T\))

4.5.13 Adjustment criterion

See Shpitser, VanderWeele, and Robins (2012)

The adjustment criterion is satisfied by \(Z\) (relative to \(X\), \(Y\)) if:

  1. no element of \(Z\) is a descendant in the mutilated graph of any variable \(W\not\in X\) which lies on a proper causal path from \(X\) to \(Y\)
  2. \(Z\) blocks all noncausal paths from \(X\) to \(Y\)


  • mutilated graph: remove arrows pointing into \(X\)
  • proper pathway: A proper causal pathway from \(X\) to \(Y\) only intersects \(X\) at the endpoint

4.5.14 These are different. Simple illustration.

Here \(Z\) satisfies the adjustment criterion but not the backdoor criterion:

\(Z\) is descendant of \(X\) but it is not a descendant of a node on a path from \(X\) to \(Y\). No harm adjusting for \(Z\) here, but not necessary either.

4.5.15 Frontdoor criterion (Pearl)

Consider this DAG:

  • The relationship between \(X\) and \(Y\) is confounded by \(U\).
  • However the \(X\rightarrow Y\) effect is the product of the \(X\rightarrow M\) effect and the \(M\rightarrow Y\) effect


4.5.16 Identification through the front door


  • \(M\) (possibly a set) blocks all directed paths from \(X\) to \(Y\)
  • there is no backdoor path \(X\) to \(M\)
  • \(X\) blocks all backdoor paths from \(M\) to \(Y\) and
  • all (\(m,z\)) combinations arise with positive probability

Then \(\Pr(y| \hat x)\) is identifiable and given by:

\[\Pr(y| \hat x) = \sum_m\Pr(m|x)\sum_{x'}\left(\Pr(y|m,x')\Pr(x')\right)\]

4.5.17 Frontdoor criterion (Proof)

We want to get \(\Pr(y | \hat x)\)

From the graph the joint distribution of variables is:

\[\Pr(x,m,y,u) = \Pr(u)\Pr(x|u)\Pr(m|x)\Pr(y|m,u)\] If we intervened on \(X\) we would have (\(\Pr(X = x |u)=1\)):

\[\Pr(m,y,u | \hat x) = \Pr(u)\Pr(m|x)\Pr(y|m,u)\] If we sum up over \(u\) and \(m\) we get:

\[\Pr(m,y| \hat x) = \Pr(m|x)\sum_u\left(\Pr(y|m,u)\Pr(u)\right)\] \[\Pr(y| \hat x) = \sum_m\Pr(m|x)\sum_u\left(\Pr(y|m,u)\Pr(u)\right)\]

The first part is fine; the second part however involves \(u\) which is unobserved. So we need to get the \(u\) out of \(\sum_u\left(\Pr(y|m,u)\Pr(u)\right)\).

4.5.18 Frontdoor criterion

Now, from the graph:

  1. \(M\) is d-separated from \(U\) by \(X\):

\[\Pr(u|m, x) = \Pr(u|x)\] 2. \(X\) is d-separated from \(Y\) by \(M\), \(U\)

\[\Pr(y|x, m, u) = \Pr(y|m,u)\] That’s enough to get \(u\) out of \(\sum_u\left(\Pr(y|m,u)\Pr(u)\right)\)

4.5.19 Frontdoor criterion

\[\sum_u\left(\Pr(y|m,u)\Pr(u)\right) = \sum_x\sum_u\left(\Pr(y|m,u)\Pr(u|x)\Pr(x)\right)\]

Using the 2 equalities we got from the graph:

\[\sum_u\left(\Pr(y|m,u)\Pr(u)\right) = \sum_x\sum_u\left(\Pr(y|x,m,u)\Pr(u|x,m)\Pr(x)\right)\]


\[\sum_u\left(\Pr(y|m,u)\Pr(u)\right) = \sum_x\left(\Pr(y|m,x)\Pr(x)\right)\]

Intuitively: \(X\) blocks the back door between \(Z\) and \(Y\) just as well as \(U\) does

4.5.20 Frontdoor criterion

Substituting we are left with:

\[\Pr(y| \hat x) = \sum_m\Pr(m|x)\sum_{x'}\left(\Pr(y|m,x')\Pr(x')\right)\]

(The \('\) is to distinguish the \(x\) in the summation from the value of \(x\) of interest)

It’s interesting that \(x\) remains in the right hand side in the calculation of the \(m \rightarrow y\) effect, but this is because \(x\) blocks a backdoor from \(m\) to \(y\)

4.5.21 Front foor

Bringing all this together into a claim we have:


  • \(M\) (possibly a set) blocks all directed paths from \(X\) to \(Y\)
  • there is no backdoor path \(X\) to \(M\)
  • \(X\) blocks all backdoor paths from \(M\) to \(Y\) and
  • all (\(m,z\)) combinations arise with positive probability

Then \(\Pr(y| \hat x)\) is identifiable and given by:

\[\Pr(y| \hat x) = \sum_m\Pr(m|x)\sum_{x'}\left(\Pr(y|m,x')\Pr(x')\right)\]

4.5.22 Front foor

  • This is a very elegant and surprising result
  • There are not many obvious applications of it however
  • The conditions would be violated for example if unobserved third things caused both \(M\) and \(Y\)

4.6 In code: Dagitty

There is a package (Textor et al. 2016) for figuring out what to condition on.


4.6.1 In code: Dagitty

Define a dag using dagitty syntax:

g <- dagitty("dag{X -> M -> Y ; Z -> X ; Z -> R -> Y}")

There is then a simple command to check whether two sets are d-separated by a third set:

dseparated(g, "X", "Y", "M")
dseparated(g, "X", "Y", c("Z","M"))
[1] TRUE

4.6.2 Dagitty: Find adjustment sets

And a simple command to identify the adjustments needed to identify the effect of one variable on another:

adjustmentSets(g, exposure = "X", outcome = "Y")
{ R }
{ Z }

4.6.3 Important Examples : Confounding

Example where \(Z\) is correlated with \(X\) and \(Y\) and is a confounder

4.6.4 Confounding

Example where \(Z\) is correlated with \(X\) and \(Y\) but it is not a confounder

4.6.5 Important Examples : Collider

But controlling can also cause problems. In fact conditioning on a temporally pre-treatment variable could cause problems. Who’d have thunk? Here is an example from Pearl (2005):

4.6.6 Illustration of identification failure from conditioning on a collider

U1 <- rnorm(10000);  U2 <- rnorm(10000)
Z  <- U1+U2
X  <- U2 + rnorm(10000)/2
Y  <- U1*2 + X

lm_robust(Y ~ X) |> tidy() |> kable(digits = 2)
term estimate std.error statistic p.value conf.low conf.high df outcome
(Intercept) -0.02 0.02 -1.21 0.23 -0.06 0.01 9998 Y
X 1.02 0.02 56.52 0.00 0.98 1.05 9998 Y
lm_robust(Y ~ X + Z) |> tidy() |> kable(digits = 2)
term estimate std.error statistic p.value conf.low conf.high df outcome
(Intercept) -0.01 0.01 -1.13 0.26 -0.03 0.01 9997 Y
X -0.34 0.01 -34.98 0.00 -0.36 -0.32 9997 Y
Z 1.67 0.01 220.37 0.00 1.65 1.68 9997 Y

4.6.7 Let’s look at that in dagitty

g <- dagitty("dag{U1 -> Z  ; U1 -> y ; U2 -> Z ; U2 -> x  -> y}")
adjustmentSets(g, exposure = "x", outcome = "y")
isAdjustmentSet(g, "Z", exposure = "x", outcome = "y")
isAdjustmentSet(g, NULL, exposure = "x", outcome = "y")
[1] TRUE

Which means, no need to condition on anything.

4.6.8 Collider & Confounder

A bind: from Pearl 1995.

For a solution for a class of related problems see Robins, Hernan, and Brumback (2000)

4.6.9 Let’s look at that in dagitty

g <- dagitty("dag{U1 -> Z  ; U1 -> y ; 
             U2 -> Z ; U2 -> x  -> y; 
             Z -> x}")
adjustmentSets(g, exposure = "x", outcome = "y")
{ U1 }
{ U2, Z }

which means you have to adjust on an unobservable. Here we double check that including or not including “Z” is enough:

isAdjustmentSet(g, "Z", exposure = "x", outcome = "y")
isAdjustmentSet(g, NULL, exposure = "x", outcome = "y")

4.6.10 Collider & Confounder

So we cannot identify the effect here. But can we still learn about it?

5 Frequentist Analysis

Estimation and testing

5.1 Outline

  • Simple estimates from experimental data
  • Weighting, blocking
  • Design-based variance estimates
  • Design-based \(p\) values
  • Reporting

See topics for Controls and doubly robust estimation

5.1.1 ATE: DIM

Unbiased estimates of the (sample) average treatment effect can be estimated (whether or not there imbalance on covariates) using:

\[ \widehat{ATE} = \frac{1}{n_T}\sum_TY_i - \frac{1}{n_C}\sum_CY_i, \]

5.1.2 ATE: DIM in practice

df <- fabricatr::fabricate(N = 100, Z = rep(0:1, N/2), Y = rnorm(N) + Z)

# by hand
df |>
  summarize(Y1 = mean(Y[Z==1]), 
            Y0 = mean(Y[Z==0]), 
            diff = Y1 - Y0) |> kable(digits = 2)
Y1 Y0 diff
1.07 -0.28 1.35
# with estimatr
estimatr::difference_in_means(Y ~ Z, data = df) |>
  tidy() |> kable(digits = 2)
term estimate std.error statistic p.value conf.low conf.high df outcome
Z 1.35 0.17 7.94 0 1.01 1.68 97.98 Y

5.1.3 ATE: DIM in practice

We can also do this with regression:

estimatr::lm_robust(Y ~ Z, data = df) |>
  tidy() |> kable(digits = 2)
term estimate std.error statistic p.value conf.low conf.high df outcome
(Intercept) -0.28 0.12 -2.33 0.02 -0.51 -0.04 98 Y
Z 1.35 0.17 7.94 0.00 1.01 1.68 98 Y

See Freedman (2008) on why regression is fine here

5.1.4 ATE: Blocks

Say now different strata or blocks \(\mathcal{S}\) had different assignment probabilities. Then you could estimate:

\[ \widehat{ATE} = \sum_{S\in \mathcal{S}}\frac{n_{S}}{n} \left(\frac{1}{n_{S1}}\sum_{S\cap T}y_i - \frac{1}{n_{S0}}\sum_{S\cap C}y_i \right) \]

Note: you cannot just ignore the blocks because assignment is no longer independent of potential outcomes: you might be sampling units with different potential outcomes with different probabilities.

However, the formula above works fine because selecting is random conditional on blocks.

5.1.5 ATE: Blocks

As a DAG this is just classic confounding:

make_model("Block -> Z ->Y <- Block") |> 
  plot(x_coord = c(2,1,3), y_coord = c(2, 1, 1))

5.1.6 ATE: Blocks in practice

Data with heterogeneous assignments:

df <- fabricatr::fabricate(
  N = 500, X = rep(0:1, N/2), 
  prob = .2 + .3*X,
  Z = rbinom(N, 1, prob),
  ip = 1/(Z*prob + (1-Z)*(1-prob)), # discuss
  Y = rnorm(N) + Z*X)

True effect is 0.5, but:

estimatr::difference_in_means(Y ~ Z, data = df) |>
  tidy() |> kable(digits = 2)
term estimate std.error statistic p.value conf.low conf.high df outcome
Z 0.9 0.1 9.32 0 0.71 1.09 377.93 Y

5.1.7 ATE: Blocks in practice

Averaging over effects in blocks

# by hand
estimates <- 
  df |>
  group_by(X) |>
  summarize(Y1 = mean(Y[Z==1]), 
            Y0 = mean(Y[Z==0]), 
            diff = Y1 - Y0,
            W = n())

estimates$diff |> weighted.mean(estimates$W)
[1] 0.7236939
# with estimatr
estimatr::difference_in_means(Y ~ Z, blocks = X, data = df) |>
  tidy() |> kable(digits = 2)
term estimate std.error statistic p.value conf.low conf.high df outcome
Z 0.72 0.11 6.66 0 0.51 0.94 496 Y

5.1.8 ATE with IPW

This also corresponds to the difference in the weighted average of treatment outcomes (with weights given by the inverse of the probability that each unit is assigned to treatment) and control outcomes (with weights given by the inverse of the probability that each unit is assigned to control).

  • The average difference in means estimator is the same as what you would get if you weighted inversely by shares of units in different conditions inside blocks.

5.1.9 ATE with IPW in practice

# by hand
df |>
  summarize(Y1 = weighted.mean(Y[Z==1], ip[Z==1]), 
            Y0 = weighted.mean(Y[Z==0],  ip[Z==0]), # note !
            diff = Y1 - Y0)|> 
  kable(digits = 2)
Y1 Y0 diff
0.59 -0.15 0.74
# with estimatr
estimatr::difference_in_means(Y ~ Z, weights = ip, data = df) |>
  tidy() |> kable(digits = 2)
term estimate std.error statistic p.value conf.low conf.high df outcome
Z 0.74 0.11 6.65 0 0.52 0.96 498 Y

5.1.10 ATE with IPW

  • But inverse propensity weighting is a more general principle, which can be used even if you do not have blocks.

  • The intuition for it comes straight from sampling weights — you weight up in order to recover an unbiased estimate of the potential outcomes for all units, whether or not they are assigned to treatment.

  • With sampling weights however you can include units even if their weight was 1. Why can you not include these units when doing inverse propensity weighting?

5.1.11 Illustration: Estimating treatment effects with terrible treatment assignments: Fixer

Say you made a mess and used a randomization that was correlated with some variable, \(U\). For example:

  • The randomization is done in a way that introduces a correlation between Treatment Assignment and Potential Outcomes
  • Then possibly, even though there is no true causal effect, we naively estimate a large one — enormous bias
  • However since we know the assignment procedure we can fully correct for the bias

5.1.12 Illustration: Estimating treatment effects with terrible treatment assignments: Fixer

  • In the next example, we do this using “inverse propensity score weighting.”
  • This is exactly analogous to standard survey weighting — since we selected different units for treatment with different probabilities, we weight them differently to recover the average outcome among treated units (same for control).

5.1.13 Basic randomization: Fixer

Bad assignment, some randomization process you can’t understand (but can replicate) that results in unequal probabilities.

N <- 400
U <- runif(N, .1, .9)

 design <- 
   declare_model(N = N,
                 Y_Z_0 = U + rnorm(N, 0, .1),
                 Y_Z_1 = U + rnorm(N, 0, .1),
                 Z = rbinom(N, 1, U)) +
  declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
  declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) +
  declare_estimator(Y ~ Z, label = "naive") 

5.1.14 Basic randomization: Fixer

Results is a sampling distribution not centered on the true effect (0)

diagnosis <- diagnose_design(design)
diagnosis$simulations_df |>
  ggplot(aes(estimate)) + geom_histogram() + facet_grid(~estimator) +
  geom_vline(xintercept = 0, color = "red")

5.1.15 A fix

To fix you can estimate the assignment probabilities by replicating the assignment many times:

probs <- replicate(1000, design |> draw_data() |> pull(Z)) |> apply(1, mean)

and then use these assignment probabilities in your estimator

design_2 <-
  design +
  declare_measurement(weights = Z/probs + (1-Z)/(1-probs)) +
  declare_estimator(Y ~ Z, weights = weights, label = "smart") 

5.1.16 Basic randomization: Fixer

Implied weights

draw_data(design_2) |> 
  ggplot(aes(probs, weights, color = factor(Z))) + 

5.1.17 Basic randomization: Fixer

Improved results

diagnosis <- diagnose_design(design_2)
diagnosis$simulations_df |>
  ggplot(aes(estimate)) + geom_histogram() + facet_grid(estimator~.)+
  geom_vline(xintercept = 0, color = "red")

5.1.18 IPW with one unit!

This example is surprising but it helps you see the logic of why inverse weighting gets unbiased estimates (and why that might not guarantee a reasonable answer)

Imagine there is one unit with potential outcomes \(Y(1) = 2, Y(0) = 1\). So the unit level treatment effect is 1.

You toss a coin.

  • If you assign to treatment you estimate: \(\hat\tau = \frac{2}{0.5} = 4\)
  • If you assign to control you estimate: \(\hat\tau = -\frac{1}{0.5} = -2\)

So your expected estimate is: \[0.5 \times 4 - 0.5 \times (-2) = 1\]

Great on average but always lousy

5.1.19 Generalization: why IPW works

  • Say a given unit is assigned to treatment with probability \(\pi_i\)
  • We estimate the average \(Y(1)\) using

\[\hat{\overline{Y_1}} = \frac{1}n\left(\sum_i \frac{Z_iY_i(1)}{\pi_i}\right)\] With independent assignment the expected value of \(\hat{\overline{Y_1}}\) is just:

\[\mathbb{E}[\hat{\overline{Y_1}}] =\frac1n\left( \left(\pi_1 \frac{1\times Y_1(1)}{\pi_1} + (1-\pi_1) \frac{0\times Y_1(1)}{\pi_1}\right) + \left(\pi_2 \frac{1\times Y_2(1)}{\pi_2} + (1-\pi_2) \frac{0\times Y_1(1)}{\pi_2}\right) + \dots\right)\]

\[\mathbb{E}[\hat{\overline{Y_1}}] =\frac1n\left( Y_1(1) + Y_2(1) + \dots\right) = \overline{Y_1}\]

and similarly for \(\mathbb{E}[\hat{\overline{Y_0}}]\) and so using linearity of expectations:

\[\mathbb{E}[\widehat{\overline{Y_1 - Y_0}}] = \overline{Y_1 - Y_0}\]

5.1.20 Generalization: why IPW works

  • Note we needed \(\pi_i >0\) and also \(\pi_i <1\) everywhere. Why?
  • We used independence here; sampling theory is used to show similar results for e.g. complete randomization
  • For blocked randomization this is easy to see

5.2 Design-based Estimation of Variance

Lets talk about “inference”

5.2.1 Var(ATE)

  • Recall that the treatment effect is gotten by taking a sample of outcomes under treatment and comparing them to a sample of outcomes under control
  • Say that there is no “error”
  • Why would this procedure produce uncertainty?

5.2.2 Var(ATE)

  • Why would this procedure produce uncertainty?
  • The uncertainty comes from being uncertain about the average outcome under control from observations of the control units, and from being uncertain about the average outcome under treatment from observation of the treated units
  • In other words, it comes from the variance in the treatment outcomes and variance in the control outcomes (and not, for example, from variance in the treatment effect)

5.2.3 Var(ATE)

  • In classical statistics we characterize our uncertainty over an estimate using an estimate of variance of the sampling distribution of the estimator.

  • Key idea is we want to be able to say: how likely are we to have gotten such an estimate if the distribution of estimates associated with our design looked a given way.

  • More specifically we want to estimate “standard error” or the “standard deviation of the sampling distribution”

(See Woolridge (2023) where the standard error is understood as the “estimate of the standard deviation of the sampling distribution”)

5.2.4 Variance and standard errors


  • \(\hat\tau\) is an estimate for \(\tau\)
  • \(\overline{x}\) is the average values of \(x\)

The variance of the estimator of \(n\) repeated ‘runs’ of a design is: \(Var(\hat{\tau}) = \frac{1}n\sum_i(\hat\tau_i - \overline{\hat\tau_i})^2\)

And the standard error is:

\(se(\hat{\tau}) = \sqrt{\frac{1}n\sum_i(\hat\tau_i - \overline{\hat\tau_i})^2}\)

5.2.5 Variance and standard errors

If we have a good measure for the shape of the sampling distribution we can start to make statements of the form:

  • What are the chances that an estimate would be this large or larger?

If the sampling distribution is roughly normal, as it may be with large samples, then we can use procedures such as: “there is a 5% probability that an estimate would be more than 1.96 standard errors away from the mean of the sampling distribution”

5.2.6 Var(ATE)

  • Key idea: You can estimate variance straight from the data, given knowledge of the assignment process and assuming well defined potential outcomes?

  • Recall in general \(Var(x) = \frac{1}n\sum_i(x_i - \overline{x})^2\). here the \(x_i\)s are the treatment effect estimates we might get under different random assignments, the \(n\) is number of different assignments (assumed here all equally likely, but otherwise we can weight) and \(\overline{x}\) is the truth.

  • For intuition imagine we have just two units \(A\), \(B\), with potential outcomes \(A_1\), \(A_0\), \(B_1\), \(B_0\).

  • When there are two units with outcomes \(x_1, x_2\), the variance simplifies like this:

\[Var(x) = \frac{1}2\left(x_1 - \frac{x_1 + x_2}{2}\right)^2 + \frac{1}2\left(x_2 - \frac{x_1 + x_2}{2}\right)^2 = \left(\frac{x_1 - x_2}{2}\right)^2\]

5.2.7 Var(ATE)

In the two unit case the two possible treatment estimates are: \(\hat{\tau}_1=A_1 - B_0\) and \(\hat{\tau}_2=B_1 - A_0\), depending on what gets put into treatment. So the variance is:

\[Var(\hat{\tau}) = \left(\frac{\hat{\tau}_1 - \hat{\tau}_2}{2}\right)^2 = \left(\frac{(A_1 - B_0) - (B_1 - A_0)}{2}\right)^2 =\left(\frac{(A_1 - B_1) + (A_0 - B_0)}{2}\right)^2 \] which we can re-write as:

\[Var(\hat{\tau}) = \left(\frac{A_1 - B_1}{2}\right)^2 + \left(\frac{A_0 - B_0}{2}\right)^2+ 2\frac{(A_1 - B_1)(A_0-B_0)}{2}\] The first two terms correspond to the variance of \(Y(1)\) and the variance of \(Y(0)\). The last term is a bit pesky though, it corresponds to twice the covariance of \(Y(1)\) and \(Y(0)\).

5.2.8 Var(ATE)

How can we go about estimating this?

\[Var(\hat{\tau}) = \left(\frac{A_1 - B_1}{2}\right)^2 + \left(\frac{A_0 - B_0}{2}\right)^2+ 2\frac{(A_1 - B_1)(A_0-B_0)}{2}\]

In the two unit case it is quite challenging because we do not have an estimate for any of the three terms: we do not have an estimate for the variance in the treatment group or in the control group because we have only one observation in each case; and we do not have an estimate for the covariance because we don’t observe both potential outcomes for any case.

Things do look a bit better however with more units…

5.2.9 Var(ATE): Generalizing

From Freedman Prop 1 / Example 1 (using combinatorics!) we have:

\(V(\widehat{ATE}) = \frac{1}{n-1}\left[\frac{n_C}{n_T}V_1 + \frac{n_T}{n_C}V_0 + 2C_{01}\right]\)

… where \(V_0, V_1\) denote variances and \(C_{01}\) covariance

This is usefully rewritten as:

\[ \begin{split} V(\widehat{ATE}) & = \frac{1}{n-1}\left[\frac{n - n_T}{n_T}V_1 + \frac{n - n_C}{n_C}V_0 + 2C_{01}\right] \\ & = \frac{n}{n-1}\left[\frac{V_1}{n_T} + \frac{V_0}{n_C}\right] - \frac{1}{n-1}\left[V_1 + V_0 - 2C_{01}\right] \end{split} \]

where the final term is positive

5.2.10 Var(ATE)


  • With more than two units we cannot use the sample estimates \(s^2(\{Y_i\}_{i \in C})\) and \(s^2(\{Y_i\}_{i \in T})\) for the first part.
  • But \(C_{01}\) still cannot be estimated from data.
  • The Neyman estimator ignores the second part (and so is conservative).
  • Tip: for STATA users, use , robust (see Samii and Aronow (2012))

5.2.11 ATE and Var(ATE)

For the case with blocking, the conservative estimator is:

\(V(\widehat{ATE}) = {\sum_{S\in \mathcal{S}}{\left(\frac{n_{S}}{n}\right)^2} \left({\frac{s^2_{S1}}{n_{S1}}} + {\frac{s^2_{S0}}{n_{S0}}} \right)}\)

5.2.12 Illustration of Neyman Conservative Estimator

An illustration of how conservative the conservative estimator of variance really is (numbers in plot are correlations between \(Y(1)\) and \(Y(0)\).

We confirm that:

  1. the estimator is conservative
  2. the estimator is more conservative for negative correlations between \(Y(0)\) and \(Y(1)\) — eg if those cases that do particularly badly in control are the ones that do particularly well in treatment, and
  3. with \(\tau\) and \(V(Y(0))\) fixed, high positive correlations are associated with highest variance.

5.2.13 Illustration of Neyman Conservative Estimator

\(\tau\) \(\rho\) \(\sigma^2_{Y(1)}\) \(\Delta\) \(\sigma^2_{\tau}\) \(\widehat{\sigma}^2_{\tau}\) \(\widehat{\sigma}^2_{\tau(\text{Neyman})}\)
1.00 -1.00 1.00 -0.04 0.00 -0.00 0.04
1.00 -0.67 1.00 -0.03 0.01 0.01 0.04
1.00 -0.33 1.00 -0.03 0.01 0.01 0.04
1.00 0.00 1.00 -0.02 0.02 0.02 0.04
1.00 0.33 1.00 -0.01 0.03 0.03 0.04
1.00 0.67 1.00 -0.01 0.03 0.03 0.04
1.00 1.00 1.00 0.00 0.04 0.04 0.04

Here \(\rho\) is the unobserved correlation between \(Y(1)\) and \(Y(0)\); and \(\Delta\) is the final term in the sample variance equation that we cannot estimate.

5.2.14 Illustration of Neyman Conservative Estimator

5.2.15 Tighter Bounds On Variance Estimate

The conservative variance comes from the fact that you do not know the covariance between \(Y(1)\) and \(Y(0)\).

  • But as Aronow, Green, and Lee (2014) point out, you do know something.
  • Intuitively, if you know that the variance of \(Y(1)\) is 0, then the covariance also has to be zero.
  • This basic insight opens a way of calculating bounds on the variance of the sample average treatment effect.

5.2.16 Tighter Bounds On Variance Estimate


  • Take a million-observation dataset, with treatment randomly assigned
  • Assume \(Y(0)=0\) for everyone and \(Y(1)\) distributed normally with mean 0 and standard deviation of 1000.
  • Note here the covariance of \(Y(1)\) and \(Y(0)\) is 0.
  • Note the true variance of the estimated sample average treatment effect should be (approx) \(\frac{Var(Y(1))}{{1000000}} + \frac{Var(Y(0))}{{1000000}} = 1+0=1\), for an se of \(1\).
  • But using the Neyman estimator (or OLS!) we estimate (approx) \(\frac{Var(Y(1))}{({1000000/2})} + \frac{Var(Y(0))}{({1000000/2})} = 2\), for an se of \(\sqrt{2}\).
  • But we can recover the truth knowing the covariance between \(Y(1)\) and \(Y(0)\) is 0.

5.2.17 Tighter Bounds On Variance Estimate: Code

sharp_var <- function(yt, yc, N=length(c(yt,yc)), upper=TRUE){
  m <- length(yt)
  n <- m + length(yc)
  V <- function(x,N) (N-1)/(N*(length(x)-1)) * sum((x - mean(x))^2)
  yt <- sort(yt)
  if(upper) {yc <- sort(yc)
  } else {
    yc <- sort(yc,decreasing=TRUE)}
  p_i <- unique(sort(c(seq(0,n-m,1)/(n-m),seq(0,m,1)/m)))- 
  p_i[1] <- .Machine$double.eps^.5
  yti <- yt[ceiling(p_i*m)]
  yci <- yc[ceiling(p_i*(n-m))]
  p_i_minus <- c(NA,p_i[1: (length(p_i)-1)])
 ((N-m)/m * V(yt,N) + (N-(n-m))/(n-m)*V(yc,N) + 
     2*sum(((p_i-p_i_minus)*yti*yci)[2:length(p_i)]) - 2*mean(yt)*mean(yc))/(N-1)}

5.2.18 Illustration

n   <- 1000000
Y   <- c(rep(0,n/2), 1000*rnorm(n/2))
X   <- c(rep(0,n/2), rep(1, n/2))

lm_robust(Y~X) |> tidy() |> kable(digits = 2)
term estimate std.error statistic p.value conf.low conf.high df outcome
(Intercept) 0.00 0.00 0.63 0.53 0.00 0.00 999998 Y
X 1.21 1.41 0.86 0.39 -1.56 3.98 999998 Y
Error: object 'ols' not found
c(sharp_var(Y[X==1], Y[X==0], upper = FALSE),
  sharp_var(Y[X==1], Y[X==0], upper = TRUE)) |> 
[1] 1 1

The sharp bounds are \([1,1]\) but the conservative estimate is \(\sqrt{2}\).

5.2.19 Asymptotics

  • It is a remarkable thing that you can estimate the standard error straight from the data
  • However, once you want to use the standard error to do hypothesis testing you generally end up looking up distributions (\(t\)-distributions or normal distributions)
  • That’s a little disappointing and has been one of the criticisms made by Deaton and Cartwright (2018)

However you can do hypothesis testing even without an estimate of the standard error.

Up next

5.3 Randomization Inference

A procedure for using the randomization distribution to calculate \(p\) values

5.3.1 Calculate a \(p\) value in your head

  • Illustrating \(p\) values via “randomization inference”

  • Say you randomized assignment to treatment and your data looked like this.

Unit 1 2 3 4 5 6 7 8 9 10
Treatment 0 0 0 0 0 0 0 1 0 0
Health score 4 2 3 1 2 3 4 8 7 6


  • Does the treatment improve your health?
  • What’s the \(p\) value for the null that treatment had no effect on anybody?

5.3.2 Calculate a \(p\) value in your head

  • Illustrating \(p\) values via “randomization inference”
  • Say you randomized assignment to treatment and your data looked like this.
Unit 1 2 3 4 5 6 7 8 9 10
Treatment 0 0 0 0 0 0 0 0 1 0
Health score 4 2 3 1 2 3 4 8 7 6


  • Does the treatment improve your health?
  • What’s the \(p\) value for the null that treatment had no effect on anybody?

5.3.3 Randomization Inference: Some code

  • In principle it is very easy.
  • These few lines generate data, produce the regression estimate and then an ri estimate of \(p\):
# data
df <- fabricate(N = 1000, Z = rep(c(0,1), N/2), Y=  .1*Z + rnorm(N))

# test stat
test.stat <- function(df) with(df, mean(Y[Z==1])- mean(Y[Z==0]))

# test stat distribution
ts <- replicate(4000, df |> mutate(Z = sample(Z)) |> test.stat())

# test
mean(ts >= test.stat(df))   # One sided p value
[1] 0.025

5.3.4 Randomization Inference: Some code

The \(p\) value is the mass to the right of the vertical

hist(ts); abline(v = test.stat(df), col = "red") 

5.3.5 Using ri2

You can do the same using Alex Coppock’s ri2 package


# Declare the assignment
assignment <- declare_ra(N = 1000, m = 500)

# Implement
ri2_out <- conduct_ri(
  formula = Y ~ Z,
  declaration = assignment,
  sharp_hypothesis = 0,
  data = df, 
  p = "upper",
  sims = 4000

5.3.6 Using ri2

term estimate upper_p_value
Z 0.1321367 0.02225

You’ll notice slightly different answer. This is because although the procedure is “exact” it is subject to simulation error.

5.3.7 Randomization Inference

  • Randomization inference can get more complicated when you want to test a null other than the sharp null of no effect.
  • Say you wanted to test the null that the effect is 2 for all units. How do you do it?
  • Say you wanted to test the null that an interaction effect is zero. How do you do it?
  • In both cases by filling in a potential outcomes schedule given the hypothesis in question and then generating a test statistic
Under null that
effect is 0
Under null that
effect is 2
Y(0) Y(1) Y(0) Y(1) Y(0) Y(1)
1 NA 1 1 1 3
2 NA 2 2 2 4
NA 4 4 4 2 4
NA 3 3 3 1 3

5.3.8 ri and CIs

It is possible to use this procedure to generate confidence intervals with a natural interpretation.

  • The key idea is that we can use the same procedure to assess the probability of the data given a sharp null of no effect, but also a sharp null of any other **constant* effect.
  • We can then see what set of effects we reject and what set we accept
  • We are left with a set of values that we cannot reject at the 0.05 level.

5.3.9 ri and CIs in practice

candidates <- seq(-.05, .3, length = 50)

get_p <- function(j)
      formula = Y ~ Z,
      declaration = assignment,
      sharp_hypothesis = j,
      data = df,
      sims = 5000,
    ) |> summary()

# Implement
ps <- candidates |> sapply(get_p)

5.3.10 ri and CIs in practice

Warning: calculating confidence intervals this way can be computationally intensive

5.3.11 ri with DeclareDesign

  • DeclareDesign can do randomization inference natively
  • The trick is to ensure that when calculating the \(p\) values the only stochastic component is the assignment to treatment when calculating the \(p\) values

5.3.12 ri with DeclareDesign (advanced)

Here we get minimal detectable effects by using a design that has two stage simulations so we can estimate the sampling distribution of summaries of the sampling distribution generated from reassignments.

test_stat <- function(data)
  with(data, data.frame(estimate = mean(Y[Z==1]) - mean(Y[Z==0])))

b <- 0

design <- 
  declare_model(N = 100,  Z = complete_ra(N), Y = b*Z + rnorm(N)) +
  declare_estimator(handler = label_estimator(test_stat), label = "actual")+
  declare_measurement(Z = sample(Z))  + # this is the permutation step
  declare_estimator(handler = label_estimator(test_stat), label = "null")

5.3.13 ri with DeclareDesign (advanced)

Simulations data frame from two step simulation. Note computational intensity as number of runs is the product of the sims vector. I speed things up by using a simple estimation function and also using parallelization.

options(parallelly.fork.enable = TRUE) 

simulations <- 
  design |> redesign(b = c(0, .25, .5, .75, 1)) |> 
  simulate_design(sims = c(200, 1, 1000, 1)) 

5.3.14 ri with DeclareDesign (advanced)

A snap shot of the simulations dataframe: we have multiple step 3 draws for each design and step 1 draw.

simulations |> head() |> kable(digits = 2)
design b sim_ID estimator estimate step_1_draw step_3_draw
design_1 0 1 actual -0.02 1 1
design_1 0 1 null 0.06 1 1
design_1 0 2 actual -0.02 1 2
design_1 0 2 null 0.03 1 2
design_1 0 3 actual -0.02 1 3
design_1 0 3 null -0.05 1 3

5.3.15 ri with DeclareDesign (advanced)

Power for each value of b.

simulations |> group_by(b, step_1_draw) |> 
  summarize(p = mean(abs(estimate[estimator == "null"]) >= abs(estimate[estimator == "actual"]))) |>
  group_by(b) |> summarize(power = mean(p <= .05)) |>
  ggplot(aes(b, power)) + geom_line() + geom_hline(yintercept = .8, color = "red")

If you want to figure out more precisely what b gives 80% or 90% power you can narrow down the b range.

5.3.16 ri interactions

Lets now imagine a world with two treatments and we are interested in using ri for assessing the interaction. (Code from Coppock, ri2)


N <- 100
declaration <- randomizr::declare_ra(N = N, m = 50)

data <- fabricate(
  N = N,
  Z = conduct_ra(declaration),
  X = rnorm(N),
  Y = .9 * X + .2 * Z + .1 * X * Z + rnorm(N))

5.3.17 ri interactions

The approach is to declare a null model that is nested by the full model. Then \(F\) test statistic from the model comparisons is taken as the test statistic and distribution of this is built up under re-randomizations.

    model_1 = Y ~ Z + X,
    model_2 = Y ~ Z + X + Z * X,
    declaration = declaration,
    assignment = "Z",
    sharp_hypothesis = coef(lm(Y ~ Z, data = data))[2],
    data = data, 
    sims = 1000
  )  |> summary() |> kable()
term estimate two_tailed_p_value
F-statistic 1.954396 0.171

5.3.18 ri interactions with DeclareDesign

Let’s imagine a true model with interactions. We take an estimate. We then ask how likely that estimate is from a null model with constant effects

Note: this is quite a sharp hypothesis

df <- fabricate(N = 1000, Z1 = rep(c(0,1), N/2), Z2 = sample(Z1), Y = Z1 + Z2 - .15*Z1*Z2 + rnorm(N))

my_estimate <- (lm(Y ~ Z1*Z2, data = df) |> coef())[4]

null_model <-  function(df) {
  M0 <- lm(Y ~ Z1 + Z2, data = df) 
  d1 <- coef(M0)[2]
  d2 <- coef(M0)[3]
  df |> mutate(
    Y_Z1_0_Z2_0 = Y - Z1*d1 - Z2*d2,
    Y_Z1_1_Z2_0 = Y + (1-Z1)*d1 - Z2*d2,
    Y_Z1_0_Z2_1 = Y - Z1*d1 + (1-Z2)*d2,
    Y_Z1_1_Z2_1 = Y + (1-Z1)*d1 + (1-Z2)*d2)

5.3.19 ri interactions with DeclareDesign

Let’s imagine a true model with interactions. We take an estimate. We then ask how likely that estimate is from a null model with constant effects

Imputed potential outcomes look like this:

df <- df |> null_model()

df |>  head() |> kable(digits = 2, align = "c")
ID Z1 Z2 Y Y_Z1_0_Z2_0 Y_Z1_1_Z2_0 Y_Z1_0_Z2_1 Y_Z1_1_Z2_1
0001 0 0 -0.18 -0.18 0.76 0.68 1.61
0002 1 0 0.20 -0.73 0.20 0.12 1.06
0003 0 0 2.56 2.56 3.50 3.42 4.36
0004 1 0 -0.27 -1.21 -0.27 -0.35 0.59
0005 0 1 -2.13 -2.99 -2.05 -2.13 -1.19
0006 1 1 3.52 1.72 2.66 2.58 3.52

5.3.20 ri interactions with DeclareDesign

design <- 
  declare_model(data = df) +
  declare_measurement(Z1 = sample(Z1), Z2 = sample(Z2),
                      Y = reveal_outcomes(Y ~ Z1 + Z2)) +
  declare_estimator(Y ~ Z1*Z2, term = "Z1:Z2")
diagnose_design(design, sims = 1000, diagnosands = ri_ps(my_estimate))
Design Estimator Outcome Term N Sims One Sided Pos One Sided Neg Two Sided
design estimator Y Z1:Z2 1000 0.95 0.05 0.10
(0.01) (0.01) (0.01)

5.3.21 ri in practice

  • In practice (unless you have a design declaration), it is a good idea to create a \(P\) matrix when you do your randomization.
  • This records the set of possible randomizations you might have had: or a sample of these.
  • So, again: assignments have to be replicable

5.3.22 ri Applications

  • Recall that silly randomization procedure from this slide.
  • Say you forgot to take account of the wacky assignment in your estimates and you estimate 0.15.
  • Does the treatment improve your health?: \(p=?\)

5.3.23 ri Applications

  • Randomization procedures are sometimes funky in lab experiments
  • Using randomization inference would force a focus on the true assignment of individuals to treatments
  • Fake (but believable) example follows

5.3.24 ri Applications

Capacity T1 T2 T3
Session Thursday 40 10 30 0
Friday 40 10 0 30
Saturday 10 10 0 0

Optimal assignment to treatment given constraints due to facilities

Subject type N Available
A 3 Thurs, Fri
B 30 Thurs, Sat
C 30 Fri, Sat

Constraints due to subjects

5.3.25 ri Applications

If you think hard about assignment you might come up with an allocation like this.

Subject type N Available Thurs Fri Sat
A 30 Thurs, Fri 15 15 NA
B 30 Thurs, Sat 25 NA 5
C 30 Fri, Sat NA 25 5

Assignment of people to days

5.3.26 ri Applications

That allocation balances as much as possible. Given the allocation you might randomly assign individuals to different days as well as randomly assigning them to treatments within days. If you then figure out assignment propensities, this is what you would get:

Assignment Probabilities
Subject type N Available T1 T2 T3
A 30 Thurs, Fri 0.250 0.375 0.375
B 30 Thurs, Sat 0.375 0.625 0.000
C 30 Fri, Sat 0.375 NA 0.625

5.3.27 ri Applications

Even under the assumption that the day of measurement does not matter, these assignment probabilities have big implications for analysis.

Assignment Probabilities
Subject type N Available T1 T2 T3
A 30 Thurs, Fri 0.250 0.375 0.375
B 30 Thurs, Sat 0.375 0.625 0.000
C 30 Fri, Sat 0.375 NA 0.625
  • Only the type \(A\) subjects could have received any of the three treatments.

  • There are no two treatments for which it is possible to compare outcomes for subpopulations \(B\) and \(C\)

  • A comparison of \(T1\) versus \(T2\) can only be made for population \(A \cup B\)

  • However subpopulation \(A\) is assigned to \(A\) (versus \(B\)) with probability 4/5; while population \(B\) is assigned with probability 3/8

5.3.28 ri Applications

  • Implications for design: need to uncluster treatment delivery

  • Implications for analysis: need to take account of propensities

Idea: Wacky assignments happen but if you know the propensities you can do the analysis.

5.3.29 ri Applications: Indirect assignments

A particularly interesting application is where a random assignment combines with existing features to determine an assignment to an “indirect” treatment.

  • For instance: \(n\) of \(N\) are assigned to a treatment.
  • You are interested in whether “having a friend assigned to treatment” makes a difference to a subject. Or maybe “a friend of a friend”
  • That means the subject has a complex clustered assignment that depends on how many friends they have
  • A bit mind-boggling, but:
    • Rerun your assignment many times and each time figure out whether a subject is assigned to an indirect treatment or not
    • Calculate the implied quantity of interest for each assignment
    • Assess the place of the actual quantity in the sampling distribution

5.4 Principle: Keep the reporting close to the design

5.4.1 Design-based analysis

Report the analysis that is implied by the design.

N Y All Diff
T1 N \(\overline{y}_{00}\) \(\overline{y}_{01}\) \(\overline{y}_{0x}\) \(d_2|T1=0\)
(sd) (sd) (sd) (sd)
Y \(\overline{y}_{10}\) \(\overline{y}_{10}\) \(\overline{y}_{1x}\) \(d_2|T1=1\)
(sd) (sd) (sd) (sd)
All \(\overline{y}_{x0}\) \(\overline{y}_{x1}\) \(y\) \(d_2\)
(sd) (sd) (sd) (sd)
Diff \(d_1|T2=0\) \(d_1|T2=1\) \(d_1\) \(d_1d_2\)
(sd) (sd) (sd) (sd)

This is instantly recognizable from the design and returns all the benefits of the factorial design including all main effects, conditional causal effects, interactions and summary outcomes. It is much clearer and more informative than a regression table.

6 Bayesian approaches

Updating on causal quantities

6.1 Outline

  1. Bayesian reasoning
  2. Bayesian calculations by hand
  3. stan
  4. A simple structural model with stan
  5. CausalQueries

6.2 Bayes reasoning

  • Bayesian methods are just sets of procedures to figure out how to update beliefs in light of new information.

  • We begin with a prior belief about the probability that a hypothesis is true.

  • New data then allow us to form a posterior belief about the probability of the hypothesis.

6.2.1 Bayes Rule

Bayesian inference takes into account:

  • the consistency of the evidence with a hypothesis
  • the uniqueness of the evidence to that hypothesis
  • background knowledge about the problem.

6.2.2 Illustration 1

I draw a card from a deck and ask What are the chances it is a Jack of Spades?

  • Just 1 in 52.

Now I tell you that the card is indeed a spade. What would you guess?

  • 1 in 13

What if I told you it was a heart?

  • No chance it is the Jack of Spades

What if I said it was a face card and a spade.

  • 1 in 3.

6.2.3 Illustration 1

These answers are applications of Bayes’ rule.

In each case the answer is derived by assessing what is possible, given the new information, and then assessing how likely the outcome of interest among the states that are possible. In all the cases you calculate:

\[\text{Prob Jack of Spades | Info} = \frac{\text{Is Jack of Spades Consistent w/ Info?}}{\text{How many cards are consistent w/ Info?}} \]

6.2.4 Illustration 2 Interpreting Your Test Results

You take a test to see whether you suffer from a disease that affects 1 in 100 people. The test is good in the following sense:

  • if you have the disease, then with a 99% probability it will say you have the disease
  • if you do not have it, then with a 99% probability, it will say that you do not have it

The test result says that you have the disease. What are the chances you have it?

6.2.5 Illustration 2 Interpreting Your Test Results

  • It is not 99%. 99% is the probability of the result given the disease, but we want the probability of the disease given the result.

  • The right answer is 50%, which you can think of as the share of people that have the disease among all those that test positive. For example

  • e.g. if there were 10,000 people, then 100 would have the disease and 99 of these would test positive. But 9,900 would not have the disease and 99 of these would test positive. So the people with the disease that test positive are half of the total number testing positive.

6.2.6 Illustration 2. A picture

What’s the probability of being a circle given you are black?

6.2.7 Illustration 2. More formally.

As an equation this might be written:

\[\text{Prob You have the Disease | Pos} = \frac{\text{How many have the disease and test pos?}}{\text{How many people test pos?}}\]

6.2.8 Two Child Problem

Consider last an old puzzle described in Gardner (1961).

  • Mr Smith has two children, \(A\) and \(B\).
  • At least one of them is a boy.
  • What are the chances they are both boys?

To be explicit about the puzzle, we will assume that the information that one child is a boy is given as a truthful answer to the question “is at least one of the children a boy?

Assuming also that there is a 50% probability that a given child is a boy.

6.2.9 Two Child Problem

As an equation:

\[\text{Prob both boys | Not both girls} = \frac{\text{Prob both boys}}{\text{Prob not both girls}} = \frac{\text{1 in 4}}{\text{3 in 4}}\]

6.2.10 Monty Hall

Can anyone describe the Monty Hall puzzle?

6.2.11 Bayes Rule

Formally, all of these equations are applications of Bayes’ rule which is a simple and powerful formula for deriving updated beliefs from new data.

The formula is given as:

\[\Pr(H|\mathcal{D})=\frac{\Pr(\mathcal{D}|H)\Pr(H)}{\Pr(\mathcal{D})}\\ =\frac{\Pr(\mathcal{D}|H)\Pr(H)}{\sum_{H'}\Pr(\mathcal{D}|H')\Pr(H'))}\]

6.2.12 Bayes Rule

Formally, all of these equations are applications of Bayes’ rule which is a simple and powerful formula for deriving updated beliefs from new data.

For continuous distributions and parameter vector \(\theta\):


6.2.13 Useful Distributions: Beta and Dirichlet Distributions

  • Bayes rule requires the ability to express a prior distribution but it does not require that the prior have any particular properties other than being probability distributions.
  • Sometimes however it can be useful to make use of “off the shelf” distributions.

Consider the share of people in a population that voted. This is a quantity between 0 and 1.

6.2.14 Useful Distributions: Beta and Dirichlet Distributions

  • Two people might may both believe that the turnout was around 50% but differ in how certain they are about this claim.
  • One might claim to have no information and to believe any turnout rate between 0 and 100% is equally likely; another might be completely confident that the number if 50%.

Here the parameter of interest is a share. The Beta and Dirichlet distributions are particularly useful for representing beliefs on shares.

6.2.15 Beta

  • The Beta distribution is a distribution over the \([0,1]\) that is governed by two parameters, \(\alpha\) and \(\beta\).
  • In the case in which both \(\alpha\) and \(\beta\) are 1, the distribution is uniform – all values are seen as equally likely.
  • As \(\alpha\) rises large outcomes are seen as more likely
  • As \(\beta\) rises, lower outcomes are seen as more likely.
  • If both rise proportionately the expected outcome does not change but the distribution becomes tighter.

An attractive feature is that if one has a prior Beta(\(\alpha\), \(\beta\)) over the probability of some event, and then one observes a positive case, the Bayesian posterior distribution is also a Beta with with parameters \(\alpha+1, \beta\). Thus if people start with uniform priors and build up knowledge on seeing outcomes, their posterior beliefs should be Beta.

6.2.16 Beta

Here is a set of such distributions.

Beta distributions

6.2.17 Dirichlet distributions.

The Dirichlet distributions are generalizations of the Beta to the situation in which there are beliefs not just over a proportion, or a probability, but over collections of probabilities.

  • If four outcomes are possible and each is likely to occur with probability \(p_k\), \(k=1,2,3,4\) then beliefs are distributions over a three dimensional unit simplex.

  • The distribution has as many parameters as there are outcomes and these are traditionally recorded in a vector, \(\alpha\).

  • As with the Beta distribution, an uninformative prior (Jeffrey’s prior) has \(\alpha\) parameters of \((.5,.5,.5, \dots)\) and a uniform (“flat”) distribution has \(\alpha = (1,1,1,,\dots)\).

  • The Dirichlet updates in a simple way. If you have a Dirichlet prior with parameter \(\alpha = (\alpha_1, \alpha_2, \dots)\) and you observe outcome \(1\), for example, then then posterior distribution is also Dirichlet with parameter vector \(\alpha' = (\alpha_1+1, \alpha_2,\dots)\).

6.3 Bayes by hand

Bayes on a Grid

6.3.1 Bayes by hand

  • The simplest and most intuitive way to do Bayesian estimation is just to apply the formula over a grid of possible values
  • This becomes too hard once your parameter space grows but it is worth working though the logic by hand to get a feel for Bayes

6.3.2 Bayes by hand

  • Lets say that we want to figure out the share of women in some population.
  • We start off with a flat prior over all possible numbers
  • We draw a sample from the population: 100 people of which 20 are women
  • What’s our posterior?

6.3.3 Bayes by hand

fabricate(N = 100,
          parameters = seq(.01, .99, length = N),
          likelihood = dbinom(20, 100, parameters),
          posterior = likelihood/sum(likelihood)) |>
  ggplot(aes(parameters, posterior)) + geom_line() + theme_bw()

6.3.4 Bayes by hand

Now with a strongish prior on 50%:

fabricate(N = 100,
          parameters = seq(.01, .99, length = N),
          prior = dbeta(parameters, 20, 20), 
          prior = prior/sum(prior),
          likelihood = dbinom(20, 100, parameters),
          posterior = likelihood*prior/sum(likelihood*prior)) |>
  ggplot(aes(parameters, posterior)) + geom_line() + theme_bw() +
  geom_line(aes(parameters, prior), color = "red")

6.3.5 Causal inference on a grid

Recall this joint distribution with binary X and binary Y from here

Y = 0 Y = 1
X = 0 \(b/2 + c/2\) \(a/2 + d/2\)
X = 1 \(a/2 + c/2\) \(b/2 + d/2\)

reminder: \(a\) is share with negative effects, \(b\) is share with positive effects…

6.3.6 Causal inference on a grid: strategy

Say we now had (finite) data filling out this table. What posteriors should we form over \(a,b,c,d\)?

Y = 0 Y = 1
X = 0 \(n_{00}\) \(n_{01}\)
X = 1 \(n_{10}\) \(n_{11}\)

Lets start with a flat prior over the shares and then update over possible shares based on the data.

This time we will start with a draw of possible shares and put look for posterior weights on each drawn share.

6.3.7 Causal inference on a grid: likelihood

\[ \Pr(n_{00}, n_{01}, n_{10}, n_{11} \mid a,b,c,d) = f_{\text{multinomial}}\left( \alpha_{00}, \alpha_{01}, \alpha_{10}, \alpha_{11} \mid \sum n, w \right) \] where:

\[w = \left(\frac12(b + c), \frac12(a+d), \frac12(a+c), \frac12(b+d)\right)\]

why multinomial?

6.3.8 Causal inference on a grid: execution

prior draw with 10000 possibilities:

x <- gtools::rdirichlet(10000, alpha = c(1,1,1,1)) |> as.data.frame()
names(x) <- letters[1:4]

x |> head() |> kable(digits = 3)
a b c d
0.384 0.558 0.007 0.051
0.269 0.594 0.005 0.132
0.029 0.759 0.166 0.047
0.065 0.012 0.530 0.393
0.638 0.092 0.121 0.149
0.333 0.154 0.079 0.433

each row sums to 1; each point (row) lies on a simplex

6.3.9 Causal inference on a grid: execution

Imagine we had data (number of units with given values of X and Y):

\(n_{00} = 400, n_{01} = 100, n_{10} = 100, n_{11} = 400\)

Difference in means = .6.


# add likelihood and calculate posterior

x <- x |> 
  rowwise() |>  # Ensures row-wise operations
    likelihood = dmultinom(
      c(400, 100, 100, 400),
      prob = c(b + c, a + c, a + d, b + d) / 2
  ) |> 
  ungroup() |>
  mutate(posterior = likelihood / sum(likelihood))

6.3.10 Causal inference on a grid: execution

x |> 
  mutate(likelihood = formatC(likelihood, format = "e", digits = 2),
         posterior = formatC(posterior, format = "e", digits = 2)) |> 
  head() |>
  kable(digits = 2)
a b c d likelihood posterior
0.38 0.56 0.01 0.05 5.54e-50 5.87e-47
0.27 0.59 0.01 0.13 2.71e-28 2.88e-25
0.03 0.76 0.17 0.05 3.58e-22 3.79e-19
0.06 0.01 0.53 0.39 2.28e-107 2.41e-104
0.64 0.09 0.12 0.15 0.00e+00 0.00e+00
0.33 0.15 0.08 0.43 7.83e-183 8.30e-180

6.3.11 Causal inference on a grid: inferences

x |> summarize(a = weighted.mean(a, posterior),
               b = weighted.mean(b, posterior),
               ATE = b - a) |>
  kable(digits = 2)
a b ATE
0.1 0.69 0.59

6.3.12 Causal inference on a grid: inferences

x |> ggplot(aes(b, a, size = posterior)) + geom_point(alpha = .5) 

Spot the ridge

6.3.13 Bayes by hand

  • This approach is sound, but if you are dealing with many continuous parameters, the full parameter space can get very large and so the number of calculations you do increases rapidly.

  • Luckily other approaches have been developed.

6.4 Stan

6.4.1 Plan

In this section we will:

  • getting going with stan
  • implement a simple linear model and talk through the main model blocks
  • implement a simple hierarchical model
  • describe a behavioral game and set up a model to recover some parameters of interest, given the game

6.4.2 Getting set up

The good news: There is lots of help online. Start with: https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started

We will jump straight into things and work through a session.

  1. Install the stan package and fire up. Useful to set options so that multiple cores are being used:
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())

6.4.3 One variable model: Simple example

  1. Now lets consider the simplest one var linear model.
    • We will need model code
    • And data

6.4.4 A simple model: Code

To implement a stan model you should write the code in a text editor and save it as a text file. You can also write it directly in your script. You can then bring the file into R or call the file directly.

6.4.5 A simple model: Code

I saved a simple model called one_var.stan locally. Here it is:

readLines("assets/one_var.stan", warn = FALSE) |>
  cat(sep = "\n")
data {
  int<lower=0> N;
  vector[N] Y;
  vector[N] X;
parameters {
  real a;
  real b;
  real<lower=0> sigma;
model {
  Y ~ normal(a + b * X, sigma);

6.4.6 A simple model: Code

The key features here are (read from bottom up!):

  • \(Y\) is assumed to be normally distributed with mean a + bX and standard deviation sigma.
  • There are then three parameters: a, b, sigma.
  • There are no priors placed on these but sigma is constrained to be positive. Without priors, improper flat priors are assumed.
  • Stan expects a data set that contains three things: a scalar, N and X1,Y` data

6.4.7 Simple model: Data

We feed data to the model in the form of a list. The idea of a list is that the data can include all sorts of objects, not just a single dataset.

X = rnorm(20)

some_data <- list(
 N = 20,
 X = X,
 Y = X + rnorm(20)

6.4.8 Simple model: Now Let’s Run It

M <- stan(file = "assets/one_var.stan", 
          data = some_data)

When you run the model you get a lot of useful output on the estimation and the posterior distribution. Here though are the key results:

mean sd Rhat
a -0.179 0.214 1
b 0.738 0.183 1
sigma 0.950 0.175 1

These look good.

The Rhat at the end tells you about convergence. You want this very close to 1.

6.4.9 A simple model: Now lets use it

The model output contains the full posterior distribution.

my_posterior <- M |> extract() |> data.frame() 

my_posterior |> ggplot(aes(a,b)) + geom_point() + theme_bw()

6.4.10 A simple model: Now lets use it

With the full posterior you can look at marginal posterior distributions over arbitrary transformations of parameters.

summary((my_posterior$a + my_posterior$b)/my_posterior$a) |> round(2)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-2528.86    -3.83    -1.45     0.43    -0.16  6552.41 

6.4.11 Building up

Let’s go back to the code.

There we had three key blocks: data, parameters, and model

More generally the blocks you can specify are:

  • data (define the vars that will be coming in from the data list)
  • transformed data (can be used for preprocessing)
  • parameters (required: defines the parameters to be estimated)
  • transformed parameters (transformations of parameters useful for computational reasons and sometimes for clarity)
  • model (give priors and likelihood)
  • generated quantities (can be used for post processing)

6.4.12 Parameters block

The parameters block declared the set of parameters that we wanted to estimate. In the simple model these were a, b, and sigma. Note in the declaration we also:

  • said what kind of parameters they (vectors, matrices, simplices etc)
  • gave possible constraints

6.4.13 Parameters block

Instead of defining:

real a;
real b;

We could have defined

vector[2] coefs;

and then referenced coef[1] and coef[2] in the model block.

6.4.14 Parameters block

Or we could also have imposed the constraint that the slope coefficient is positive by defining:

real a;
real<lower = 0> b;

6.4.15 Model Block

In the model block we give the likelihood

But we can also give the priors (if we want to). If priors are not provided, flat (possibly improper) priors are assumed

In our case for example we could have provided something like

model {
  b ~ normal(-10, 1);
  Y ~ normal(a + b * X, sigma);

This suggests that we start off believing b is centered on -10. That will surely matter for our conclusions. Lets try it:

6.4.16 Version 2

This time I will write the model right in the editor:

new_model <- '
data {
  int<lower=0> N;
  vector[N] Y;
  vector[N] X;
parameters {
  real a;
  real b;
  real<lower=0> sigma;
model {
  b ~ normal(-10,1);
  Y ~ normal(a + b * X, sigma);

6.4.17 Estimation 2

M2 <- stan(model_code = new_model, data = some_data)
mean sd Rhat
a -1.338 2.444 1.003
b -7.875 1.172 1.003
sigma 10.988 2.499 1.001

Note that we get a much lower estimate for b with the same data.

6.4.18 A multilevel model

  • Now imagine a setting in which there are 10 villages, each with 10 respondents. Half in each village are assigned to treatment \(X=1\), and half to control \(X=0\).

  • Say that there is possibly a village-specific average outcome: \(Y_v = a_v + b_vX\) where \(a_v\) and \(b_v\) are each drawn from some distribution with a mean and variance of interest. The individual outcomes are draws from a village level distribution centered on the village specific average outcome.

  • This all implies a multilevel structure.

6.4.19 A multilevel model

Here is a model for this (N baked in)

ml_model <- '
data {
  vector[100] Y;
  int<lower=0,upper=1> X[100];
  int village[100];
parameters {
  vector<lower=0>[3] sigma; 
  vector[10] a;
  vector[10] b;
  real mu_a;
  real mu_b;
transformed parameters {
  vector[100] Y_vx;
  for (i in 1:100) Y_vx[i] = a[village[i]] + b[village[i]] * X[i];
model {
  a ~ normal(mu_a, sigma[1]);
  b ~ normal(mu_b, sigma[2]);
  Y ~ normal(Y_vx, sigma[3]);

6.4.20 A multilevel model

Here is a slightly more general version: https://github.com/stan-dev/example-models/blob/master/ARM/Ch.17/17.1_radon_vary_inter_slope.stan

6.4.21 Multilevel model: Data

Lets create some multilevel data. Looking at this, can you tell what is the typical village level effect? How much heterogeneity is there?

village   <- rep(1:10, each = 10)
village_b <- 1 + rnorm(10)
X         <- rep(0:1, 50)
Y         <- village_b[village]*X + rnorm(100)

ml_data <- list(
  village = village,
  X = X, 
  Y = Y)

6.4.22 Multilevel Results

M_ml <- stan(model_code = ml_model, data = ml_data)
mean sd Rhat
mu_a -0.10 0.18 1
mu_b 1.06 0.48 1
sigma[1] 0.31 0.20 1
sigma[2] 1.31 0.42 1
sigma[3] 1.02 0.08 1

6.5 A game and a structural model

parameters drawn from theory

6.5.1 A game and a structural model

Say that a set of people in a population is playing sequential prisoner’s dilemmas.

In such games, selfish behavior might suggest defections by everyone everywhere. But, of course, people often cooperate. Why might this be?

  • One possible reason is that some people are irrational, in the sense that they simply choose to cooperate, ignoring the payoffs.
  • Another possibility is that rational people think that others are irrational, in the sense that they think that others will reciprocate when they observe cooperative action

6.5.2 Model

We will capture some of this intuition with a behavioral type model in which

  • each player has a “rationality” propensity of \(r_i\) – this is the probability with which they choose to do the rational thing, rather than the generous thing
  • \(r_i \sim U[0, \theta]\) for \(\theta > .5\).
  • A player with rationality propensity of \(r_i\) believes \(r_j \sim [0, r_i]\). So everyone assumes that they are the most rational people in the room…
  • The game is such that:
  • second mover: a second mover with rationality propensity \(r_i\) will cooperate with probability \(1-r_i\) if the first mover cooperated; otherwise they defect
  • first mover: a first mover with \(r_i\) will cooperate nonstrategically with probability \((1-r_i)\); however with probability \(r_i\) they will also cooperate strategically if they think that the second mover has \(r_j<.25\).

6.5.3 Expectations from model

  • In all, this means that a player with propensity \(r_i>.5\) will cooperate with probability \(1-r_i\); a player with propensity \(r_i<.5\) will cooperate with probability \(1\).

  • Interestingly the not-very-rational people sometimes cooperate strategically but the really rational people never cooperate strategically because they think it won’t work.

6.5.4 Event Probabilities

What then are the probabilities of each of the possible outcomes?

  • There will be cooperation by both players with probability \((\int_0^{.5} p(r_i) dr_i + \int_{.5}^1 p(r_i)(1-r_i) dr_i)\int_0^1p(r_i)(1-r_i)dr_i\)
  • There will be cooperation by player 1 only with probability \((\int_0^{.5} p(r_i) dr_i + \int_{.5}^1 p(r_i)(1-r_i) dr_i)(\int_0^1p(r_i)(r_i)dr_i)\)
  • There will be cooperation by neither with probability: \(1-\int_0^{.5} p(r_i) dr_i - \int_{.5}^1 p(r_i)(1-r_i) dr_i\)

where \(p\) is the density function on \(r_i\) given \(\theta\)

6.5.5 Event probabilities

Given the assumption on \(p\)

  • There will be cooperation by both players with probability \((1+.25/\theta -.5\theta)(1-.5\theta)\)
  • There will be cooperation by player 1 only with probability \((1+.25/\theta -.5\theta)(.5\theta)\)
  • There will be cooperation by neither player with probability \((.5\theta-.25/\theta)\)

6.5.6 Data

  • We have data on the actions of the first movers and the second movers and are interested in the distribution of the \(p_i\)s.

  • Lets collapse that data into a simple list of the number of each type of game outcome:

  • And say we start off with a uniform prior of \(\theta\).

  • What should we conclude about \(\theta\)?

6.5.7 Model

Here’s a model:

game_model <- '
data {
  int<lower=0> play[3];
parameters {
  real<lower=.5, upper=1> theta;
transformed parameters {
simplex[3] w;
 w[1] = (1+.25*theta - .5*theta)*(1-.5*theta);
 w[2] = (1+.25*theta - .5*theta)*(.5*theta);
 w[3] = (-.25*theta  + .5*theta);
model {
  play ~ multinomial(w);

6.5.8 Model

Note we define event weights as transformed parameters on a simplex. We also constrain \(\theta\) to be \(>.5\). Obviously we are relying a lot on our model.

6.5.9 Plot posterior on \(\theta\)

M3 <- stan(model_code = game_model,  
           data = list(play = c(10,10,10)))

6.5.10 Plot posterior on \(\theta\)

M4 <- stan(model_code = game_model,  
           data = list(play = c(20,6,4)))

6.5.11 Posterior on a quantity of interest

What is the probability of observing strategic first round cooperation?

A player with rationality \(r_i\) will cooperate strategically with probability \(r_i\) if \(r_i<.5\) and 0 otherwise. Thus we are interested in \(\int_0^{.5}r_i/\theta dr_i = .125/\theta\)

6.6 CausalQueries

CausalQueries brings these elements together

6.6.1 Big picture

CausalQueries brings these elements together by allowing users to:

  1. Make model: Specify a DAG: CausalQueries figures out all principal strata and places a prior on these
  2. Update model: Provide data to the DAG: CausalQueries writes a stan model and updates on all parameters
  3. Query model: CausalQueries figures out which parameters correspond to a given causal query

6.6.2 Illustration \(X \rightarrow Y\) model

Returning to this problem:

Y = 0 Y = 1
X = 0 \(n_{00}\) \(n_{01}\)
X = 1 \(n_{10}\) \(n_{11}\)

where \(X\) is randomized, both \(X\), \(Y\) binary

6.6.3 Model, update, query

data = fabricate(
  N = 1000, 
  X = rbinom(N, 1, prob = .5),  
  Y = rbinom(N, 1, prob = .2 + .4*X))

model <- make_model("X -> Y") |> update_model(data)

6.6.4 Model, update, query

model |> inspect("posterior_distribution") 

Summary statistics of model parameters posterior distributions:

  Distributions matrix dimensions are 
  4000 rows (draws) by 6 cols (parameters)

     mean   sd
X.0  0.49 0.02
X.1  0.51 0.02
Y.00 0.35 0.06
Y.10 0.10 0.06
Y.01 0.45 0.06
Y.11 0.10 0.06

6.6.5 Model, update, query

model |> grab("posterior_distribution") |> 
  ggplot(aes(Y.01, Y.10)) + geom_point(alpha = .2)

Posterior draws

Note in the grid approach we had the same set of candidate vectors with different weights attached; in the MCMC approach our draws reflect the posterior distribution directly.

6.6.6 Model, update, query

model |> query_model(
  query = c(ATE = "Y[X=1] - Y[X=0]", 
            POS = "Y[X=1] > Y[X=0]", 
            SOME = "Y[X=1] != Y[X=0]" ),
  using = c("priors", "posteriors")) |>

6.6.7 Generalization: Procedure

The CausalQueries approach generalizes to settings in which nodes are categorical:

  1. Identify all principal strata: that is, the universe of possible response types or “causal types”: \(\theta\)
  2. Define as parameters of interest the probability of each of these response types: \(\lambda\)
  3. Place a prior over \(\lambda\): e.g. Dirichlet
  4. Figure out \(\Pr(\text{Data} | \lambda)\)
  5. Use stan to figure out \(\Pr(\lambda | \text{Data})\)

6.6.8 Generalization: Procedure

…where dotted lines means that the response types for two nodes are not independent

6.6.9 Illustration: “Lipids” data

Example of an IV model. What are the principle strata (response types)? What relations of conditional independence are implied by the models?


lipids_data |> kable()
event strategy count
Z0X0Y0 ZXY 158
Z1X0Y0 ZXY 52
Z0X1Y0 ZXY 0
Z1X1Y0 ZXY 23
Z0X0Y1 ZXY 14
Z1X0Y1 ZXY 12
Z0X1Y1 ZXY 0
Z1X1Y1 ZXY 78

Note that in compact form we simply record the number of units (“count”) that display each possible pattern of outcomes on the three variables (“event”).[^1]

6.6.10 Model

model <- make_model("Z -> X -> Y; X <-> Y") 
model |> plot()

6.6.11 Updating and querying

Queries can be condition on observable or counterfactual quantities

model |>
  update_model(lipids_data, refresh = 0) |>
  query_model(queries = c(
      ATE  = "Y[X=1] - Y[X=0]",
      PoC  = "Y[X=1] - Y[X=0] :|: X==0 & Y==0",
      LATE = "Y[X=1] - Y[X=0] :|: X[Z=1] > X[Z=0]"),
      using = "posteriors") 
Table 5: Replication of Chickering and Pearl (1996).
query given mean sd cred.low.2.5% cred.high.97.5%
Y[X=1] - Y[X=0] - 0.55 0.10 0.37 0.73
Y[X=1] - Y[X=0] X==0 & Y==0 0.64 0.15 0.37 0.89
Y[X=1] - Y[X=0] X[Z=1] > X[Z=0] 0.70 0.05 0.59 0.80

7 Design

Focus on randomization schemes

7.1 Aims and practice

  1. Goals
  2. Cluster randomization
  3. Blocked randomization
  4. Factorial designs
  5. External validity
  6. Assignments with DeclareDesign
  7. Indirect assignments

7.1.1 Experiments

  • Experiments are investigations in which an intervention, in all its essential elements, is under the control of the investigator. (Cox & Reid)

  • Two major types of control:

  1. control over assignment to treatment – this is at the heart of many field experiments
  2. control over the treatment itself – this is at the heart of many lab experiments
  • Main focus today is on 1 and on the question: how does control over assignment to treatment allow you to make reasonable statements about causal effects?

7.1.2 Experiments

7.1.3 Basic randomization

  • Basic randomization is very simple. For example, say you want to assign 5 of 10 units to treatment. Here is simple code:
sample(0:1, 10, replace = TRUE)
 [1] 0 1 0 1 1 1 1 1 1 0

7.1.4 …should be replicable

In general you might want to set things up so that your randomization is replicable. You can do this by setting a seed:

sample(0:1, 10, replace = TRUE)
 [1] 1 0 1 1 1 0 1 1 1 1

and again:

sample(0:1, 10, replace = TRUE)
 [1] 1 0 1 1 1 0 1 1 1 1

7.1.5 Basic randomization

Even better is to set it up so that it can reproduce lots of possible draws so that you can check the propensities for each unit.

P <- replicate(1000, sample(0:1, 10, replace = TRUE)) 
apply(P, 1, mean)
 [1] 0.519 0.496 0.510 0.491 0.524 0.514 0.535 0.497 0.470 0.506

Here the \(P\) matrix gives 1000 possible ways of allocating 5 of 10 units to treatment. We can then confirm that the average propensity is 0.5.

  • A huge advantage of this approach is that if you make a mess of the random assignment; you can still generate the P matrix and use that for all analyses!

7.1.6 Do it in advance

  • Unless you need them to keep subjects at ease, leave your spinners and your dice and your cards behind.
  • Especially when you have multiple or complex randomizations you are generally much better doing it with a computer in advance

A survey dictionary with results from a complex randomization presented in a simple way for enumerators

7.1.7 Did the randomization ‘’work’’?

  • People often wonder: did randomization work? Common practice is to implement a set of \(t\)-tests to see if there is balance. This makes no sense.

  • If you doubt whether it was implemented properly do an \(F\) test. If you worry about variance, specify controls in advance as a function of relation with outcomes (more on this later). If you worry about conditional bias then look at substantive differences between groups, not \(t\)–tests

  • If you want realizations to have particular properties: build it into the scheme in advance.

7.2 Cluster Randomization

7.2.1 Cluster Randomization

  • Simply place units into groups (clusters) and then randomly assign the groups to treatment and control.
  • All units in a given group get the same treatment

Note: clusters are part of your design, not part of the world.

7.2.2 Cluster Randomization

  • Often used if intervention has to function at the cluster level or if outcome defined at the cluster level.

  • Disadvantage: loss of statistical power

  • However: perfectly possible to assign some treatments at cluster level and then other treatments at the individual level

  • Principle: (unless you are worried about spillovers) generally make clusters as small as possible

  • Principle: Surprisingly, variability in cluster size makes analysis harder. Try to control assignment so that cluster sizes are similar in treatment and in control.

  • Be clear about whether you believe effects are operating at the cluster level or at the individual level. This matters for power calculations.

  • Be clear about whether spillover effects operate only within clusters or also across them. If within only you might be able to interpret treatment as the effect of being in a treated cluster…

7.2.3 Cluster Randomization: Block by cluster size

Surprisingly, if clusters are of different sizes the difference in means estimator is not unbiased, even if all units are assigned to treatment with the same probability.

Here’s the intuition.Say there are two clusters each with homogeneous treatment effects:

Cluster Size Y0 Y1
1 1000000 0 1
2 1 0 0

Then: What is the true average treatment effect? What do you expect to estimate from cluster random assignment?

The solution is to block by cluster size. For more see: http://gking.harvard.edu/files/cluster.pdf

7.3 Blocked assignments and other restricted randomizations

7.3.1 Blocking

There are more or less efficient ways to randomize.

  • Randomization helps ensure good balance on all covariates (observed and unobserved) in expectation.
  • But balance may not be so great in realization
  • Blocking can help ensure balance ex post on observables

7.3.2 Blocking

Consider a case with four units and two strata. There are 6 possible assignments of 2 units to treatment:

ID X Y(0) Y(1) R1 R2 R3 R4 R5 R6
1 1 0 1 1 1 1 0 0 0
2 1 0 1 1 0 0 1 1 0
3 2 1 2 0 1 0 1 0 1
4 2 1 2 0 0 1 0 1 1
\(\widehat{\tau}\): 0 1 1 1 1 2

Even with a constant treatment effect and everything uniform within blocks, there is variance in the estimation of \(\widehat{\tau}\). This can be eliminated by excluding R1 and R6.

7.3.3 Blocking

Simple blocking in R (5 pairs):

sapply(1:5, function(i) sample(0:1))
1 2 3 4 5
1 1 0 1 1
0 0 1 0 0

7.3.4 Of blocks and clusters

7.3.5 Blocking

  • Blocking is a case of restricted randomization. Although each unit is sampled with equal probability, the profiles of possible assignments are not.
  • You have to take account of this when doing analysis
  • There are many other approaches.
    • Matched Pairs are a particularly fine approach to blocking
    • You could also randomize and then replace the randomization if you do not like the balance. This sounds tricky (and it is) but it is OK as long as you understand the true lottery process you are employing and incorporate that into analysis
    • It is even possible to block on covariates for which you don’t have data ex ante, by using methods in which you allocate treatment over time as a function of features of your sample (also tricky)

7.3.6 Other types of restricted randomization

  • Really you can set whatever criterion you want for your set of treated units to have (eg no treated unit beside another treated unit; at least 5 from the north, 10 from the south, guaranteed balance by some continuous variable etc)
  • You just have to be sure that you understand the random process that was used and that you can use it in the analysis stage
  • But here be dragons
    • The more complex your design, the more complex your analysis.
    • General injunction http://www.ncbi.nlm.nih.gov/pubmed/15580598 ``as ye randomize so shall ye analyze’’)
    • In general you should make sure that a given randomization procedure coupled with a given estimation procedure will produce an unbiased estimate. DeclareDesign can help with this.

7.3.7 Challenges with re-randomization

  • We can see that blocked and clustered assignments are actually types of restricted randomizations: they limit the set of acceptable randomizations to those with good properties
  • You could therefore implement the equivalent distribution of assignments y specifying an acceptable rule and then re-randomizing when the rule is met
  • That’s fine but you would then have to take account of clustering and blocking just as you do when you actually cluster or block

7.4 Factorial Designs

7.4.1 Factorial Designs

  • Often when you set up an experiment you want to look at more than one treatment.
  • Should you do this or not? How should you use your power?

7.4.2 Factorial Designs

  • Often when you set up an experiment you want to look at more than one treatment.
  • Should you do this or not? How should you use your power?

Load up:

\(T2=0\) \(T2=1\)
T1 = 0 \(50\%\) \(0\%\)
T1 = 1 \(50\%\) \(0\%\)

Spread out:

\(T2=0\) \(T2=1\)
T1 = 0 \(25\%\) \(25\%\)
T1 = 1 \(25\%\) \(25\%\)

7.4.3 Factorial Designs

  • Often when you set up an experiment you want to look at more than one treatment.
  • Should you do this or not? How should you use your power?

Three arm it?:

\(T2=0\) \(T2=1\)
T1 = 0 \(33.3\%\) \(33.3\%\)
T1 = 1 \(33.3\%\) \(0\%\)

Bunch it?:

\(T2=0\) \(T2=1\)
T1 = 0 \(40\%\) \(20\%\)
T1 = 1 \(20\%\) \(20\%\)

7.4.4 Factorial Designs

  • Surprisingly, adding multiple treatments does not generally eat into your power (unless you are decomposing a complex treatment – then it can. Why?)
  • Especially when you use a fully crossed design like the middle one above.
  • Fisher: “No aphorism is more frequently repeated in connection with field trials, than that we must ask Nature few questions, or, ideally, one question, at a time. The writer is convinced that this view is wholly mistaken.”
  • However – adding multiple treatments does alter the interpretation of your average treatment effects. If T2 is an unusual treatment for example, then half the T1 effect is measured for unusual situations.

This speaks to “spreading out.” Note: the “bunching” example may not pay off and has undesireable feature of introducing a correlation between treatment assignments.

7.4.5 Factorial Designs

Two ways to do favtial assignments in DeclareDesign:

# Block the second assignment
declare_assignment(Z1 = complete_ra(N)) +
declare_assignment(Z2 = block_ra(blocks = Z1)) +
# Recode four arms  
declare_assignment(Z = complete_ra(N, num_arms = 4)) +
declare_measurement(Z1 = (Z == "T2" | Z == "T4"),
                      Z2 = (Z == "T3" | Z == "T4"))

7.4.6 Factorial Designs: In practice

  • In practice if you have a lot of treatments it can be hard to do full factorial designs – there may be too many combinations.

  • In such cases people use fractional factorial designs, like the one below (5 treatments but only 8 units!)

Variation T1 T2 T3 T4 T5
1 0 0 0 1 1
2 0 0 1 0 0
3 0 1 0 0 1
4 0 1 1 1 0
5 1 0 0 1 0
6 1 0 1 0 1
7 1 1 0 0 0
8 1 1 1 1 1

7.4.7 Factorial Designs: In practice

  • Then randomly assign units to rows. Note columns might also be blocking covariates.

  • In R, look at library(survey)

7.4.8 Factorial Designs: In practice

  • But be careful: you have to be comfortable with possibly not having any simple counterfactual unit for any unit (invoke sparsity-of-effects principle).
Unit T1 T2 T3 T4 T5
1 0 0 0 1 1
2 0 0 1 0 0
3 0 1 0 0 1
4 0 1 1 1 0
5 1 0 0 1 0
6 1 0 1 0 1
7 1 1 0 0 0
8 1 1 1 1 1
  • In R, look at library(survey)

7.4.9 Controversy?

Muralidharan, Romero, and Wüthrich (2023) write:

Factorial designs are widely used to study multiple treatments in one experiment. While t-tests using a fully-saturated “long” model provide valid inferences, “short” model t-tests (that ignore interactions) yield higher power if interactions are zero, but incorrect inferences otherwise. Of 27 factorial experiments published in top-5 journals (2007–2017), 19 use the short model. After including interactions, over half of their results lose significance. […]

7.5 External Validity: Can randomization strategies help?

7.5.1 Principle: Address external validity at the design stage

Anything to be done on randomization to address external validity concerns?

  • Note 1: There is little or nothing about field experiments that makes the external validity problem greater for these than for other forms of ‘’sample based’’ research
  • Note 2: Studies that use up the available universe (cross national studies) actually have a distinct external validity problem
  • Two ways to think about external validity issues:
    1. Are things likely to operate in other units like they operate in these units? (even with the same intervention)
    2. Are the processes in operation in this treatment likely to operate in other treatments? (even in this population)

7.5.2 Principle: Address external validity at the design stage

  • Two ways to think about external validity issues:
    1. Are things likely to operate in other units like they operate in these units? (even with the same intervention) 2.Are the processes in operation in this treatment likely to operate in other treatments? (even in this population)
  • Two approaches for 1.
    • Try to sample cases and estimate population average treatment effects
    • Exploit internal variation: block on features that make the case unusal and assess importance of these (eg is unit poor? assess how effects differ in poor and wealthy components)
  • 2 is harder and requires a sharp identification of context free primitives, if there are such things.

7.6 Assignments with DeclareDesign

7.6.1 A design: Multilevel data

A design with hierarchical data and different assignment schemes.

design <- 
    school = add_level(N = 16, 
                       u_school = rnorm(N, mean = 0)),     
    classroom = add_level(N = 4,    
                  u_classroom = rnorm(N, mean = 0)),
    student =  add_level(N = 20,    
                         u_student = rnorm(N, mean = 0))
    ) +
    potential_outcomes(Y ~ .1*Z + u_classroom + u_student + u_school)
    ) +
  declare_assignment(Z = simple_ra(N)) + 
  declare_measurement(Y = reveal_outcomes(Y ~ Z))  +
  declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) + 
  declare_estimator(Y ~ Z, .method = difference_in_means)    

7.6.2 Sample data

Here are the first couple of rows and columns of the resulting data frame.

my_data <- draw_data(design)
kable(head(my_data), digits = 2)
school u_school classroom u_classroom student u_student Y_Z_0 Y_Z_1 Z Y
01 1.35 01 1.26 0001 -1.28 1.33 1.43 0 1.33
01 1.35 01 1.26 0002 0.79 3.40 3.50 1 3.50
01 1.35 01 1.26 0003 -0.12 2.49 2.59 0 2.49
01 1.35 01 1.26 0004 -0.65 1.96 2.06 1 2.06
01 1.35 01 1.26 0005 0.36 2.97 3.07 1 3.07
01 1.35 01 1.26 0006 -0.96 1.65 1.75 0 1.65

7.6.3 Sample data

Here is the distribution between treatment and control:

      col.names = c("control", "treatment"))
control treatment
645 635

7.6.4 Complete Random Assignment using the built in function

assignment_complete <-   declare_assignment(Z = complete_ra(N))  

design_complete <- 
  replace_step(design, "assignment", assignment_complete)

7.6.5 Data from complete assignment

We can draw a new set of data and look at the number of subjects in the treatment and control groups.

data_complete <- draw_data(design_complete)

0 1
640 640

7.6.6 Plotted

7.6.7 Block Random Assignment

  • The treatment and control group will in expectation contain the same share of students in different classrooms.
  • But as we saw this does necessarily hold in realization
  • We make this more obvious by sorting the students by treatment status with schools

7.6.8 Blocked design

assignment_blocked <-   
  declare_assignment(Z = block_ra(blocks = classroom))  

estimator_blocked <- 
  declare_estimator(Y ~ Z, blocks = classroom, 
                    .method = difference_in_means)  

design_blocked <- 
  design |> 
  replace_step("assignment", assignment_blocked) |>
  replace_step("estimator", estimator_blocked)

7.6.9 Illustration of blocked assignment

  • Note that subjects are sorted here after the assignment to make it easier to see that in this case blocking ensures that exactly 5 students within each classroom are assigned to treatment.

7.6.10 Clustering

But what if all students in a given class have to be assigned the same treatment?

assignment_clustered <- 
  declare_assignment(Z = cluster_ra(clusters = classroom))  
estimator_clustered <- 
  declare_estimator(Y ~ Z, clusters = classroom, 
                    .method = difference_in_means)  

design_clustered <- 
  design |> 
  replace_step("assignment", assignment_clustered) |> 
  replace_step("estimator", estimator_clustered)

7.6.11 Illustration of clustered assignment

7.6.12 Clustered and Blocked

assignment_clustered_blocked <-   
  declare_assignment(Z = block_and_cluster_ra(blocks = school,
                                              clusters = classroom))  
estimator_clustered_blocked <- 
  declare_estimator(Y ~ Z, blocks = school, clusters = classroom, 
                    .method = difference_in_means)  

design_clustered_blocked <- 
  design |> 
  replace_step("assignment", assignment_clustered_blocked) |> 
  replace_step("estimator", estimator_clustered_blocked)

7.6.13 Illustration of clustered and blocked assignment

7.6.14 Illustration of efficiency gains from blocking

designs <- 
    simple = design, 
    complete = design_complete, 
    blocked = design_blocked, 
    clustered = design_clustered,  
    clustered_blocked = design_clustered_blocked) 
diagnoses <- diagnose_design(designs)

7.6.15 Illustration of efficiency gains from blocking

Design Power Coverage
simple 0.16 0.95
(0.01) (0.01)
complete 0.20 0.96
(0.01) (0.01)
blocked 0.42 0.95
(0.01) (0.01)
clustered 0.06 0.96
(0.01) (0.01)
clustered_blocked 0.08 0.96
(0.01) (0.01)

7.6.16 Sampling distributions

diagnoses$simulations_df |> 
  mutate(design = factor(design, c("blocked", "complete", "simple", "clustered_blocked", "clustered"))) |>
  ggplot(aes(estimate)) +
  geom_histogram() + facet_grid(~design)

7.6.17 Nasty integer issues

  • In many designs you seek to assign an integer number of subjects to treatment from some set.

  • Sometimes however your assignment targets are not integers.


  • I have 12 subjects in four blocks of 3 and I want to assign each subject to treatment with a 50% probability.

Two strategies:

  1. I randomly set a target of either 1 or 2 for each block and then do complete assignment in each block. This can result in the numbers treated varying from 4 to 8
  2. I randomly assign a target of 1 for two blocks and 2 for the other two blocks: Intuition–set a floor for the minimal target and then distribue the residual probability across blocks

7.6.18 Nasty integer issues

# remotes::install_github("macartan/probra")
Error in library(probra): there is no package called 'probra'

blocks <- rep(1:4, each = 3)

table(blocks, prob_ra(blocks = blocks))
Error in prob_ra(blocks = blocks): could not find function "prob_ra"
table(blocks, block_ra(blocks = blocks))
blocks 0 1
     1 1 2
     2 2 1
     3 1 2
     4 1 2

7.6.19 Nasty integer issues

Can also be used to set targets

# remotes::install_github("macartan/probra")
Error in library(probra): there is no package called 'probra'

fabricate(N = 4,  size = c(47, 53, 87, 25), n_treated = prob_ra(.5*size)) %>%
  janitor::adorn_totals("row") |> 
  kable(caption = "Setting targets to get 50% targets with minimal variance")
Error in loadNamespace(x): there is no package called 'janitor'

7.6.20 Nasty integer issues

Can also be used to set for complete assignment with heterogeneous propensities


df <- fabricate(N = 100,  p = seq(.1, .9, length = 100), Z = prob_ra(p)) 
Error in prob_ra(p): could not find function "prob_ra"
Error in df$Z: object of type 'closure' is not subsettable
df |> ggplot(aes(p, Z)) + geom_point() + theme_bw()
Error in `ggplot()`:
! `data` cannot be a function.
ℹ Have you misspelled the `data` argument in `ggplot()`

7.7 Indirect assignments

Indirect control

7.7.1 Indirect assignments

Indirect assignments are generally generated by applying a direct assignment and then figuring our an implied indirect assignment


df <-
    N = 100, 
    latitude = runif(N),
    longitude = runif(N))

adjacency <- 
  sapply(1:nrow(df), function(i) 
    1*((df$latitude[i] - df$latitude)^2 + (df$longitude[i] - df$longitude)^2)^.5 < .1)

diag(adjacency) <- 0

7.7.2 Indirect assignments: Adjacency matrix

adjacency |>  
  reshape2::melt(c("x", "y"), value.name = "close") |> mutate(close = factor(close)) |>
  geom_tile() + xlab("individual") + ylab("individual") + theme_bw() +
  scale_fill_grey(start = 1, end = 0)  # 1 = white, 0 = black

7.7.3 Indirect assignments

n_assigned <- 50

design <-
  declare_model(data = df) + 
    direct = complete_ra(N, m = n_assigned),
    indirect = 1*(as.vector(as.vector(direct) %*% adjacency) >= 1))

draw_data(design) |> with(table(direct, indirect))
direct  0  1
     0 13 37
     1 13 37

7.7.4 Indirect assignments: Properties

indirect_propensities <- replicate(5000, draw_data(design)$indirect) |> 
  apply(1, mean) 

7.7.5 Indirect assignments: Properties

df |> ggplot(aes(latitude, longitude, label = round(indirect_propensities_1, 2))) + geom_text()

7.7.6 Indirect assignments: Redesign

replicate(5000, draw_data(design |> redesign(n_assigned = 25))$indirect) |> 
  apply(1, mean) 

7.7.7 Indirect assignments: Redesign

df |> ggplot(aes(latitude, longitude, label = round(indirect_propensities_2, 2))) + 

Looks better: but there are trade offs between the direct and indirect distributions

Figuring out the optimal procedure requires full diagnosis

8 Design diagnosis

A focus on power

8.1 Outline

  1. Tests review
  2. \(p\) values and significance
  3. Power
  4. Sources of power
  5. Advanced applications

8.2 Tests

8.2.1 Review

In the classical approach to testing a hypothesis we ask:

How likely are we to see data like this if indeed the hypothesis is true?

  • If the answer is “not very likely” then we treat the hypothesis as suspect.
  • If the answer is not “not very likely” then the hypothesis is maintained (some say “accepted” but this is tricky as you may want to “maintain” multiple incompatible hypotheses)

How unlikely is “not very likely”?

8.2.2 Weighing Evidence

When we test a hypothesis we decide first on what sort of evidence we need to see in order to decide that the hypothesis is not reliable.

  • Othello has a hypothesis that Desdemona is innocent.

  • Iago confronts him with evidence:

    • See how she looks at him: would she look a him like that if she were innocent?
    • … would she defend him like that if she were innocent?
    • … would he have her handkerchief if she were innocent?
    • Othello, the chances of all of these things arising if she were innocent is surely less than 5%

8.2.3 Hypotheses are often rejected, sometimes maintained, but rarely accepted

  • Note that Othello is focused on the probability of the events if she were innocent but not the probability of the events if Iago were trying to trick him.

  • He is not assessing his belief in whether she is faithful, but rather how likely the data would be if she were faithful.


  • He assesses: \(\Pr(\text{Data} | \text{Hypothesis is TRUE})\)
  • While a Bayesian would assess: \(\Pr(\text{Hypothesis is TRUE} | \text{Data})\)

8.2.4 Recap: Calculate a \(p\) value in your head

  • Illustrating \(p\) values via “randomization inference”

  • Say you randomized assignment to treatment and your data looked like this.

Unit 1 2 3 4 5 6 7 8 9 10
Treatment 0 0 0 0 0 0 0 1 0 0
Health score 4 2 3 1 2 3 4 8 7 6


  • Does the treatment improve your health?
  • What’s the \(p\) value for the null that treatment had no effect on anybody?

8.3 Power

8.4 What power is

Power is just the probability of getting a significant result rejecting a hypothesis.

Simple enough but it presupposes:

  • A well defined hypothesis
  • An actual stipulation of the world under which you evaluate the probability
  • A procedure for producing results and determining of they are significant / rejecting a hypothesis

8.4.1 By hand

I want to test the hypothesis that a six never comes up on this dice.

Here’s my test:

  • I will roll the dice once.
  • If a six comes up I will reject the hypothesis.

What is the power of this test?

8.4.2 By hand

I want to test the hypothesis that a six never comes up on this dice.

Here’s my test:

  • I will roll the dice twice.
  • If a six comes up either time I will reject the hypothesis.

What is the power of this test?

8.4.3 Two probabilities

Power sometimes seems more complicated because hypothesis rejection involves a calculated probability and so you need the probability of a probability.

I want to test the hypothesis that this dice is fair.

Here’s my test:

  • I will roll the dice 1000 times and if I see fewer than x 6s or more than y 6s I will reject the hypothesis.


  • What should x and y be?
  • What is the power of this test?

8.4.4 Step 1: When do you reject?

For this we need to figure a rule for rejection. This is based on identifying events that should be unlikely under the hypothesis.

Here is how many 6’s I would expect if the dice is fair:

fabricate(N = 1001, sixes = 0:1000, p = dbinom(sixes, 1000, 1/6)) |>
  ggplot(aes(sixes, p)) + geom_line()

8.4.5 Step 1: When do you reject?

I can figure out from this that 143 or fewer is really very few and 190 or more is really very many:

c(lower = pbinom(143, 1000, 1/6), upper = 1 - pbinom(189, 1000, 1/6))
     lower      upper 
0.02302647 0.02785689 

8.4.6 Step 2: What is the power?

  • Now we need to stipulate some belief about how the world really works—this is not the null hypothesis that we plan to reject, but something that we actually take to be true.

  • For instance: we think that in fact sixes appear 20% of the time.

Now what’s the probability of seeing at least 190 sixes?

1 - pbinom(189, 1000, .2)
[1] 0.796066

So given I think 6s appear 20% of the time, I think it likely I’ll see at least 190 sixes and reject the hypothesis of a fair dice.

8.4.7 Rule of thumb

  • 80% or 90% is a common rule of thumb for “sufficient” power
  • but really, how much power you need depends on the purpose

8.4.8 Think about

  • Are there other tests I could have implemented?
  • Are there other ways to improve this test?

8.4.9 Subtleties

  • Is a significant result from an underpowered study less credible? (only if there is a significance filter)
  • What significance level should you choose for power? (Obviously the stricter the level the lower the power, so use what you will use when you actually implement tests)
  • Do you really have to know the effect size to do power analysis? (No, but you should know at least what effects sizes you would want to be sure about picking up if they were present)
  • Power is just one of many possible diagnosands
  • What’s power for Bayesians?

8.4.10 Power analytics

Simplest intuition on power:

What is the probability of getting a significant estimate given the sampling distribution is centered on \(b\) and the standard error is 1?

  • Probability below -1.96: \(F(-1.96 | \tau))\)
  • Probability above -1.96: \(1-F(1.96 | \tau)\)

Add these together: probability of getting an estimate above 1.96 or below -1.96.

power <- function(b, alpha = 0.05, critical = qnorm(1-alpha/2))  

  1 - pnorm(critical, mean = abs(b)) + pnorm(-critical, mean = abs(b))
[1] 0.05
[1] 0.5000586
[1] 0.5000586
[1] 0.8508388

8.4.11 Power analytics: graphed

This is essentially what is done by pwrss::power.z.test – and it produces nice graphs!


pwrss::power.z.test(ncp = 1.96, alpha = 0.05, alternative = "not equal", plot = TRUE)
     power ncp.alt ncp.null alpha  z.crit.1 z.crit.2
 0.5000586    1.96        0  0.05 -1.959964 1.959964

8.4.12 Power analytics: graphed

Substantively: if in expectation an estimate will be just significant, then your power is 50%

8.4.13 Equivalent

power <- function(b, alpha = 0.05, critical = qnorm(1-alpha/2))  

  1 - pnorm(critical - b) + pnorm(-critical - b)

[1] 0.5000586


x <- seq(-3, 3, .01)
plot(x, dnorm(x), main = "power associated with effect of 1 se")
abline(v = 1.96 - 1)
abline(v = -1.96 - 1)

8.4.14 Power analytics for a trial: by hand

  • Of course the standard error will depend on the number of units and the variance of outcomes in treatment and control.

  • Say \(N\) subject are divided into two groups and potential outcomes have standard deviation \(\sigma\) in treatment and control. Then the conservative variance of the treatment effect is (approx / conservatively):

\[Var(\tau)=\frac{\sigma^2}{N/2} + \frac{\sigma^2}{N/2} = 4\frac{\sigma^2}{N}\]

and so the (conservative / approx) standard error is:


Note here we seem to be using the actual standard error but of course the tests we actually run will use an estimate of the standard error…

8.4.15 Power analytics for a trial: by hand

se <- function(sd, N) (N/(N-1))^.5*2*sd/(N^.5)

power_2 <- function(b, alpha = .05, sd = 1, N = 100, critical = qnorm(1-alpha/2), se = 2*sd/N^.5)  

  1 - pnorm(critical, mean = abs(b)/se(sd, N)) + pnorm(-critical, mean = abs(b)/se(sd, N))

[1] 0.05
[1] 0.7010827

8.4.16 Power analytics for a trial: flexible

This can be done e.g. with pwrss like this:

pwrss::pwrss.t.2means(mu1 = .2, mu2 = .1, sd1 = 1, sd2 = 1, 
               n2 = 50, alpha = 0.05,
               alternative = "not equal")
 Difference between Two means 
 (Independent Samples t Test) 
 H0: mu1 = mu2 
 HA: mu1 != mu2 
  Statistical power = 0.079 
  n1 = 50 
  n2 = 50 
 Alternative = "not equal" 
 Degrees of freedom = 98 
 Non-centrality parameter = 0.5 
 Type I error rate = 0.05 
 Type II error rate = 0.921 
power_2(.50, N = 100)
[1] 0.7010827

8.4.17 Power for more complex trials: Analytics

Mostly involve figuring out the standard error.

Consider a cluster randomized trial, with each unit having a cluster level shock \(\epsilon_k\) and an individual shock \(\nu_i\). Say these have variances \(\sigma^2_k\), \(\sigma^2_i\).

The standard error is:

\[\sqrt{\frac{4\sigma^2_k}{K} + \frac{4\sigma^2_i}{nK}}\]

Define \(\rho = \frac{\sigma^2_k}{\sigma^2_k + \sigma^2_i}\)

\[\sqrt{\rho \frac{4\sigma^2}{K} + (1- \rho)\frac{4\sigma^2}{nK}}\]

\[\sqrt{((n - 1)\rho + 1)\frac{4\sigma^2}{nK}}\]


  • \(((n - 1)\rho + 1)\) is the “design effect”
  • \(\frac{nK}{((n - 1)\rho + 1)}\) is the “effective sample size”

Plug in these standard errors and proceed as before

8.4.18 Power via design diagnosis

Is arbitrarily flexible

N <- 100
b <- .5

design <- 
  declare_model(N = N, 
    U = rnorm(N),
    potential_outcomes(Y ~ b * Z + U)) + 
  declare_assignment(Z = simple_ra(N),
                     Y = reveal_outcomes(Y ~ Z)) + 
  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) + 
  declare_estimator(Y ~ Z, inquiry = "ate", .method = lm_robust)

8.4.19 Run it many times

sims_1 <- simulate_design(design) 

sims_1 |> select(sim_ID, estimate, p.value)
sim_ID estimate p.value
1 0.81 0.00
2 0.40 0.04
3 0.88 0.00
4 0.72 0.00
5 0.38 0.05
6 0.44 0.02

8.4.20 Power is mass of the sampling distribution of decisions under the model

sims_1 |>
  ggplot(aes(p.value)) + 
  geom_histogram() +
  geom_vline(xintercept = .05, color = "red")

8.4.21 Power is mass of the sampling distribution of decisions under the model

Obviously related to the estimates you might get

sims_1 |>
  mutate(significant = p.value <= .05) |>
  ggplot(aes(estimate, p.value, color = significant)) + 

8.4.22 Check coverage is correct

sims_1 |>
  mutate(within = (b > sims_1$conf.low) & (b < sims_1$conf.high)) |> 
  pull(within) |> mean()
[1] 0.9573333

8.4.23 Check validity of \(p\) value

A valid \(p\)-value satisfies \(\Pr(p≤x)≤x\) for every \(x \in[0,1]\) (under the null)

sims_2 <- 
  redesign(design, b = 0) |>

8.4.24 Design diagnosis does it all (over multiple designs)

Mean Estimate Bias SD Estimate RMSE Power Coverage
0.50 0.00 0.20 0.20 0.70 0.95
(0.00) (0.00) (0.00) (0.00) (0.00) (0.00)

8.4.25 Design diagnosis does it all

design |>
  redesign(b = c(0, 0.25, 0.5, 1)) |>
b Mean Estimate Bias SD Estimate RMSE Power Coverage
0 -0.00 -0.00 0.20 0.20 0.05 0.95
(0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
0.25 0.25 -0.00 0.20 0.20 0.23 0.95
(0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
0.5 0.50 0.00 0.20 0.20 0.70 0.95
(0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
1 1.00 0.00 0.20 0.20 1.00 0.95
(0.00) (0.00) (0.00) (0.00) (0.00) (0.00)

8.4.26 Diagnose over multiple moving parts (and ggplot)

design |>
  ## Redesign
  redesign(b = c(0.1, 0.3, 0.5), N = 100, 200, 300) |>
  ## Diagnosis
  diagnose_design() |>
  ## Prep
  tidy() |>
  filter(diagnosand == "power") |>
  ## Plot
  ggplot(aes(N, estimate, color = factor(b))) +

8.4.27 Diagnose over multiple moving parts (and ggplot)

8.4.28 Diagnose over multiple moving parts and multiple diagnosands (and ggplot)

design |>

  ## Redesign
  redesign(b = c(0.1, 0.3, 0.5), N = 100, 200, 300) |>
  ## Diagnosis
  diagnose_design() |>
  ## Prep
  tidy() |>
  ## Plot
  ggplot(aes(N, estimate, color = factor(b))) +

8.4.29 Diagnose over multiple moving parts and multiple diagnosands (and ggplot)

8.5 Beyond basics

8.5.1 Power tips

coming up:

  • power everywhere
  • power with bias
  • power with the wrong standard errors
  • power with uncertainty over effect sizes
  • power and multiple comparisons

8.5.2 Power depends on all parts of MIDA

We often focus on sample sizes


Power also depends on

  • the model – obviously signal to noise
  • the assignments and specifics of sampling strategies
  • estimation procedures

8.5.3 Power from a lag?

Say we have access to a “pre” measure of outcome Y_now; call it Y_base. Y_base is informative about potential outcomes. We are considering using Y_now - Y_base as the outcome instead of Y_now.

N <- 100
rho <- .5

design <- 
                 Y_base = rnorm(N),
                 Y_Z_0 = 1 + correlate(rnorm, given = Y_base, rho = rho),
                 Y_Z_1 = correlate(rnorm, given = Y_base, rho = rho),
                 Z = complete_ra(N),
                 Y_now = Z*Y_Z_1 + (1-Z)*Y_Z_0,
                 Y_change = Y_now - Y_base) +
  declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) +
  declare_estimator(Y_now ~ Z, label = "level") +
  declare_estimator(Y_change ~ Z, label = "change")+
  declare_estimator(Y_now ~ Z + Y_base, label = "RHS")

8.5.4 Power from a lag?

design |> redesign(N = c(10, 100, 1000, 10000), rho = c(.1, .5, .9)) |>

8.5.5 Power from a lag?


  • if you difference: the lag has to be sufficiently information to pay its way (the \(\rho = .5\) equivalent between level and change follows from Gerber and Green (2012) equation 4.6)
  • The right hand side is your friend, at least for experiments (Ding and Li (2019))
  • As \(N\) grows the stakes fall

8.5.6 Power when estimates are biased

bad_design <- 
  declare_model(N = 100, 
    U = rnorm(N),
    potential_outcomes(Y ~ 0 * X + U, conditions = list(X = 0:1)),
    X = ifelse(U > 0, 1, 0)) + 
  declare_measurement(Y = reveal_outcomes(Y ~ X)) + 
  declare_inquiry(ate = mean(Y_X_1 - Y_X_0)) + 
  declare_estimator(Y ~ X, inquiry = "ate", .method = lm_robust)

8.5.7 Power when estimates are biased

You can see from the null design that power is great but bias is terrible and coverage is way off.

Mean Estimate Bias SD Estimate RMSE Power Coverage
1.59 1.59 0.12 1.60 1.00 0.00
(0.01) (0.01) (0.00) (0.01) (0.00) (0.00)

Power without unbiasedness corrupts, absolutely

8.5.8 Power with a more subtly biased experimental design

another_bad_design <- 
    N = 100, 
    female = rep(0:1, N/2),
    U = rnorm(N),
    potential_outcomes(Y ~ female * Z + U)) + 
    Z = block_ra(blocks = female, block_prob = c(.1, .5)),
    Y = reveal_outcomes(Y ~ Z)) + 

  declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) + 
  declare_estimator(Y ~ Z + female, inquiry = "ate", 
                    .method = lm_robust)

8.5.9 Power with a more subtly biased experimental design

You can see from the null design that power is great but bias is terrible and coverage is way off.

Mean Estimate Bias SD Estimate RMSE Power Coverage
0.76 0.26 0.24 0.35 0.84 0.85
(0.01) (0.01) (0.01) (0.01) (0.01) (0.02)

8.5.10 Power with the wrong standard errors

clustered_design <-
    cluster = add_level(N = 10, cluster_shock = rnorm(N)),
    individual = add_level(
        N = 100,
        Y_Z_0 = rnorm(N) + cluster_shock,
        Y_Z_1 = rnorm(N) + cluster_shock)) +
  declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) +
  declare_assignment(Z = cluster_ra(clusters = cluster)) +
  declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
  declare_estimator(Y ~ Z, inquiry = "ATE")
Mean Estimate Bias SD Estimate RMSE Power Coverage
-0.00 -0.00 0.64 0.64 0.79 0.20
(0.01) (0.01) (0.01) (0.01) (0.01) (0.01)

What alerts you to a problem?

8.5.11 Let’s fix that one

clustered_design_2  <-
  clustered_design |> replace_step(5, 
  declare_estimator(Y ~ Z, clusters = cluster))
Mean Estimate Bias SD Estimate RMSE Power Coverage
0.00 -0.00 0.66 0.65 0.06 0.94
(0.02) (0.02) (0.01) (0.01) (0.01) (0.01)

8.5.12 Power when you are not sure about effect sizes (always!)

  • you can do power analysis for multiple stipulations
  • or you can design with a distribution of effect sizes
design_uncertain <-
  declare_model(N = 1000, b = 1+rnorm(1), Y_Z_1 = rnorm(N), Y_Z_2 = rnorm(N) + b, Y_Z_3 = rnorm(N) + b) +
  declare_assignment(Z = complete_ra(N = N, num_arms = 3, conditions = 1:3)) +
  declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
  declare_inquiry(ate = mean(b)) +
  declare_estimator(Y ~ factor(Z), term = TRUE)

  inquiry   estimand
1     ate -0.3967765
  inquiry  estimand
1     ate 0.7887188

8.5.13 Multiple comparisons correction (complex code)

Say I run two tests and want to correct for multiple comparisons.

Two approaches. First, by hand:

b = .2

design_mc <-
  declare_model(N = 1000, Y_Z_1 = rnorm(N), Y_Z_2 = rnorm(N) + b, Y_Z_3 = rnorm(N) + b) +
  declare_assignment(Z = complete_ra(N = N, num_arms = 3, conditions = 1:3)) +
  declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
  declare_inquiry(ate = b) +
  declare_estimator(Y ~ factor(Z), term = TRUE)

8.5.14 Multiple comparisons correction (complex code)

design_mc |>
  simulate_designs(sims = 1000) |>
  filter(term != "(Intercept)") |>
  group_by(sim_ID) |>
  mutate(p_bonferroni = p.adjust(p = p.value, method = "bonferroni"),
         p_holm = p.adjust(p = p.value, method = "holm"),
         p_fdr = p.adjust(p = p.value, method = "fdr")) |>
  ungroup() |>
    "Power using naive p-values" = mean(p.value <= 0.05),
    "Power using Bonferroni correction" = mean(p_bonferroni <= 0.05),
    "Power using Holm correction" = mean(p_holm <= 0.05),
    "Power using FDR correction" = mean(p_fdr <= 0.05)
Power using naive p-values Power using Bonferroni correction Power using Holm correction Power using FDR correction
0.7374 0.6318 0.6886 0.7032

8.5.15 Multiple comparisons correction (approach 2)

The alternative approach (generally better!) is to design with a custom estimator that includes your corrections.

my_estimator <- function(data) 
  lm_robust(Y ~ factor(Z), data = data) |> 
  tidy() |>
  filter(term != "(Intercept)") |>
  mutate(p.naive = p.value,
         p.value = p.adjust(p = p.naive, method = "bonferroni"))

design_mc_2 <- design_mc |>
  replace_step(5, declare_estimator(handler = label_estimator(my_estimator))) 

run_design(design_mc_2) |> 
  select(term, estimate, p.value, p.naive) |> kable()
term estimate p.value p.naive
factor(Z)2 0.1182516 0.2502156 0.1251078
factor(Z)3 0.1057031 0.3337476 0.1668738

8.5.16 Multiple comparisons correction (Null model case)

Lets try same thing for a null model (using redesign(design_mc_2, b = 0))

design_mc_3 <- 
  design_mc_2 |> 
  redesign(b = 0) 

run_design(design_mc_3) |> select(estimate, p.value, p.naive) |> kable(digits = 3)
estimate p.value p.naive
0.068 0.799 0.399
0.144 0.151 0.076

8.5.17 Multiple comparisons correction (Null model case)

…and power:

Mean Estimate Bias SD Estimate RMSE Power Coverage
0.00 0.00 0.08 0.08 0.02 0.95
(0.00) (0.00) (0.00) (0.00) (0.00) (0.01)
-0.00 -0.00 0.08 0.08 0.02 0.96
(0.00) (0.00) (0.00) (0.00) (0.00) (0.01)


8.5.18 You might try

  • Power for an interaction (in a factorial design)
  • Power for a binary variable (versus a continuous variable?)
  • Power gains from blocked randomization
  • Power losses from clustering at different levels
  • Controlling the ICC directly? (see book cluster designs section)

8.5.19 Big takeaways

  • Power is affected not just by sample size, variability and effect size but also by you data and analysis strategies.
  • Try to estimate power under multiple scenarios
  • Try to use the same code for calculating power as you will use in your ultimate analysis
  • Basically the same procedure can be used for any design. If you can declare a design and have a test, you can calculate power
  • Your power might be right but misleading. For confidence:
    • Don’t just check power, check bias and coverage also
    • Check power especially under the null
  • Don’t let a focus on power distract you from more substantive diagnosands

9 Topics 1

9.1 Covariate Adjustment

9.1.1 When to condition? What to condition on?

The key idea from Pearl is that you want to find a set of variables such that when you condition on these you get what you would get if you had random assignment—or, used a do operation.

9.1.2 When to condition? What to condition on?


  • You could imagine creating a “mutilated” graph by removing all the arrows leading out of X
  • Then select a set of variables, \(Z\), such that \(X\) and \(Y\) are d-separated by \(Z\) on the the mutilated graph
  • When you condition on these you are making sure that any covariation between \(X\) and \(Y\) is covariation that is due to the effects of \(X\)

9.1.3 Illustration

9.1.4 Illustration: Remove paths out

9.1.5 Illustration: Block backdoor path

9.1.6 Illustration: Why not like this?

9.1.7 Backdoor Criterion: (Pearl 1995)

The backdoor criterion is satisfied by \(Z\) (relative to \(X\), \(Y\)) if:

  1. No node in \(Z\) is a descendant of \(X\)
  2. \(Z\) blocks every backdoor path from \(X\) to \(Y\) (i.e. every path that contains an arrow into \(X\))

In that case you can identify the effect of \(X\) on \(Y\) by conditioning on \(Z\):

\[P(Y=y | \hat{x}) = \sum_z P(Y=y| X = x, Z=z)P(z)\] (This is eqn 3.19 in Pearl (2000))

9.1.8 Backdoor Criterion: (Pearl 1995)

\[P(Y=y | \hat{x}) = \sum_z P(Y=y| X = x, Z=z)P(z)\]

  • No notion of a linear control or anything like that; idea really is like blocking: think lots of discrete data and no missing patterns
  • Note this is a formula for a (possibly counterfactual) level; a counterfactual difference would be given in the obvious way by:

\[P(Y=y | \hat{x}) - P(Y=y | \hat{x}')\]

9.1.9 Backdoor Proof

Following Pearl (2009), Chapter 11. Let \(T\) denote the set of parents of \(X\): \(T := pa(X)\), with (possibly vector valued) realizations \(t\). These might not all be observed.

If the backdoor criterion is satisfied, we have:

  1. \(Y\) is independent of \(T\), given \(X\) and observed data, \(Z\) (since \(Z\) blocks backdoor paths)
  2. \(X\) is independent of \(Z\) given \(T\). (Since \(Z\) includes only nondescendents)
  • Key idea: The intervention level relates to the observational level as follows: \[p(y|\hat{x}) = \sum_{t\in T} p(t)p(y|x, t)\]

  • Think of this as fully accounting for the (possibly unobserved) causes of \(X\), \(T\)

9.1.10 Backdoor Proof

We want to get to:

\[p(y|\hat{x}) = \sum_{t\in T} p(t)p(y|x, t)\]

  • But of course we do not observe \(T\), rather we observe \(Z\). So we now need to write everything in terms of \(Z\) rather than \(T\).

We bring \(Z\) into the picture by writing:

\[p(y|\hat{x}) = \sum_{t\in T} p(t) \sum_z p(y|x, t, z)p(z|x, t)\]

now we want to get rid of \(T\)

9.1.11 Backdoor Proof

now we want to get rid of \(T\)

  • Using the two conditions from the backdoor definition above:

    1. replace \(p(y|x, t, z)\) with \(p(y | x, z)\)
    2. replace \(p(z|x, t)\) with \(p(z|t)\)

This gives: \[p(y|\hat x) = \sum_{t \in T} p(t) \sum_z p(y|x, z)p(z|t)\]

Cleaning up, we can get rid of \(T\):

\[p(y|\hat{x}) = \sum_z p(y|x, z)\sum_{t\in T} p(z|t)p(t) = \sum_z p(y| x, z)p(z)\]

9.1.12 Backdoor proof figure

For intuition:

We would be happy if we could condition on the parent \(T\), but \(T\) is not observed. However we can use \(Z\) instead making use of the fact that:

  1. \(p(y|x, t, z) = p(y | x, z)\) (since \(Z\) blocks)
  2. \(p(z|x, t) = p(z|t)\) (since \(Z\) is upstream and blocked by parents, \(T\))

9.1.13 Guidelines

9.1.14 Example

Consider for example this data.

  • You randomly pair offerers and receivers in a dictator game (in which offerers decide how much of $1 to give to receivers).
  • Your population comes from two groups (80% Baganda and 20% Banyankole) so in randomly assigning partners you are randomly determining whether a partner is a coethnic or not.
  • You find that in non-coethnic pairings 35% is offered, in coethnic pairings 48% is offered.

Should you believe it?

9.1.15 Covariate Adjustment

  • Population: randomly matched Baganda (80% of pop) and Banyankole (20% of pop)
  • You find: in non-coethnic pairings 35% is offered, in coethnic pairings 48% is offered.
  • But a closer look at the data reveals…
To: Baganda To: Banyankole
Offers by Baganda 64% 16%
Banyankole 16% 4%
Figure 1: Number of Games
To: Baganda To: Banyankole
Offers by Baganda 50 50
Banyankole 20 20
Figure 2: Average Offers

So that’s a problem

9.1.16 Covariate Adjustment


  • With such data you might be tempted to ‘control’ for the covariate (here: ethnic group), using regression.
  • But, perhaps surprisingly, it turns out that regression with covariates does not estimate average treatment effects.
  • It does estimate an average of treatment effects, but specifically a minimum variance estimator, not necessarily an estimator of your estimand.

9.1.17 Covariate Adjustment


  • \(\hat{\tau}_{ATE} =\sum_{x} \frac{w_x}{\sum_{j}w_{j}}\hat{\tau}_x\)
  • \(\hat{\tau}_{OLS} =\sum_{x} \frac{w_xp_x(1-p_x)}{\sum_{j}w_j{p_j(1-p_j)}}\hat{\tau}_x\)

Instead you can use formula above for \(\hat{\tau}_{ATE}\) to estimate ATE


9.1.18 Covariate adjustment via saturated regression

  • Alternatively you can use propensity weights.
  • Alternatively you can use a regression that includes both the treatment and the treatment interacted with the covariates.
    • In practice this is best done by demeaning the covariates; doing this lets you read off the average effect from the main term. Key resource: Lin (2012)

You should have noticed that the logic for controlling for a covariate here is equivalent to the logic we saw for heterogeneous assignment propensities. These are really the same thing.

9.1.19 Covariate adjustment via saturated regression

Returning to prior example:

df <- fabricatr::fabricate(
  N = 500, 
  X = rep(0:1, N/2), 
  Z = rbinom(N, 1, .2 + .3*X),
  Y = rnorm(N) + Z*X)

lm_robust(Y ~ Z*X_c, data = df |> mutate(X_c = X - mean(X))) |>
  tidy() |> kable(digits = 2)
term estimate std.error statistic p.value conf.low conf.high df outcome
(Intercept) -0.05 0.06 -0.89 0.37 -0.17 0.06 496 Y
Z 0.47 0.11 4.32 0.00 0.26 0.68 496 Y
X_c 0.08 0.12 0.65 0.51 -0.15 0.30 496 Y
Z:X_c 0.85 0.22 3.91 0.00 0.42 1.28 496 Y

9.1.20 Covariate adjustment via saturated regression

Returning to prior example:

lm_lin(Y ~ Z, ~ X, data = df) |>
  tidy() |> kable(digits = 2)
term estimate std.error statistic p.value conf.low conf.high df outcome
(Intercept) -0.05 0.06 -0.89 0.37 -0.17 0.06 496 Y
Z 0.47 0.11 4.32 0.00 0.26 0.68 496 Y
X_c 0.08 0.12 0.65 0.51 -0.15 0.30 496 Y
Z:X_c 0.85 0.22 3.91 0.00 0.42 1.28 496 Y

9.1.21 Demeaning and saturating

Demeaning interactions

  • Say you have a factorial design with treatments X1 and X2 (or observational data with two covariates)
  • You analyse with a model that has main terms and interaction terms
  • Interpreting coefficients can be confusing, but sometimes demeaning can help. What does demeaning do?

9.1.22 Demeaning and saturating


  • Declare a factorial design in which Y is generated according to

f_Y <- function(X1, X2, u) .1 + .2*X1 + .3*X2 + u*X1*X2

where u is distributed \(U[0,1]\).

  • Specify estimands carefully
  • Run analyses in which we do and do not demean the treatments; compare and explain results

9.1.23 Demeaning interactions

f_Y <- function(X1, X2, u) .1 + .2*X1 + .3*X2 + u*X1*X2

design <-
  declare_model(N = 1000, u = runif(N),
                X1 = complete_ra(N), X2 = block_ra(blocks = X1),
                X1_demeaned = X1 - mean(X1),
                X2_demeaned = X2 - mean(X2),
                Y = f_Y(X1, X2, u)) +
    base = mean(f_Y(0, 0, u)),
    average = mean(f_Y(0, 0, u) + f_Y(0, 1, u)  + f_Y(1, 0, u)  + f_Y(1, 1, u))/4,
    CATE_X1_given_0 = mean(f_Y(1, 0, u) - f_Y(0, 0, u)),
    CATE_X2_given_0 = mean(f_Y(0, 1, u) - f_Y(0, 0, u)),
    ATE_X1 = mean(f_Y(1, X2, u) - f_Y(0, X2, u)),
    ATE_X2 = mean(f_Y(X1, 1, u) - f_Y(X1, 0, u)),
    I_X1_X2 = mean((f_Y(1, 1, u) - f_Y(0, 1, u)) - (f_Y(1, 0, u) - f_Y(0, 0, u)))
  ) +
  declare_estimator(Y ~ X1*X2, 
                    inquiry = c("base", "CATE_X1_given_0", "CATE_X2_given_0", "I_X1_X2"), 
                    term = c("(Intercept)", "X1", "X2", "X1:X2"),
                    label = "natural") +
  declare_estimator(Y ~ X1_demeaned*X2_demeaned, 
                    inquiry = c("average", "ATE_X1", "ATE_X2", "I_X1_X2"), 
                    term = c("(Intercept)", "X1_demeaned", "X2_demeaned", "X1_demeaned:X2_demeaned"),
                    label = "demeaned")

9.1.24 Demeaning interactions: Solution

Estimator Inquiry Term Mean Estimand Mean Estimate
demeaned ATE_X1 X1_demeaned 0.45 0.45
demeaned ATE_X2 X2_demeaned 0.55 0.55
demeaned I_X1_X2 X1_demeaned:X2_demeaned 0.50 0.50
demeaned average (Intercept) 0.48 0.48
natural CATE_X1_given_0 X1 0.20 0.20
natural CATE_X2_given_0 X2 0.30 0.30
natural I_X1_X2 X1:X2 0.50 0.50
natural base (Intercept) 0.10 0.10

It’s all good. But you need to match the estimator to the inquiry: demean for average marginal effects; do not demean for conditional marginal effects.

9.1.25 Recap

If you have different groups with different assignment propensities you can do any or all of these:

  1. Blocked differences in means
  2. Inverse propensity weighting
  3. Saturated regression (Lin)
  4. More… (coming)

You cannot (reliably):

  1. Ignore the groups
  2. Include them in a regression (without interactions)

9.2 To control or not control

Even “good” controls might not pay their way in practice:

When (and how) does controlling for covariates improve things and when does it make it worse

9.2.1 Considerations

  • Even though randomization ensures no bias, you may sometimes want to “control” for covariates in order to improve efficiency (see the discussion of blocking above).
  • Or you may have to take account of the fact that the assignment to treatment is correlated with a covariate (as above).
  • In observational work you might also figure out you have to control for a covariate to justify inferences (Refer to our discussion of the backdoor criteria)

9.2.2 Observational work

  • Observational motivation: Controls can provide grounds for identification
  • But recall – they can also destroy identification

For a great walk through of what you can draw from graphical models for the decision to control see:

A Crash Course in Good and Bad Controls by Cinelli, Forney, and Pearl (2022)

Aside: these implications generally refer to use controls as covariates – e.g. by implementing blocked differences in means or similar. For a Bayesian model of the form used in CausalQueries the information from “bad controls” is used wisely.

9.2.3 Experimental work

  • Conditional Bias and Precision Gains from Controls

  • Experimental motivation: Controls can reduce noise and improve precision. This is an argument for using variables that are correlated with the output (not with the treatment).

9.2.4 Precision Gains from Controls

However: Introducing controls can create complications

  • As argued by Freedman (summary from Lin (2012)), we can get: “worsened asymptotic precision, invalid measures of precision, and small-sample bias”\(^*\)

  • These adverse effects are essentially removed with an interacted model

  • See discussions in Imbens and Rubin (2015) (7.6, 7.7) and especially Theorem 7.2 for the asymptotic variance of the estimator

\(^*\) though note that the precision concern does not hold when treatment and control groups are equally sized

9.2.5 To control or not

We will illustrate by comparing:

  • DIM
  • OLS (linear controls)
  • Lin (saturated)


  • Varying fraction assigned to treatment
  • Varying relation between \(Y(0)\) and \(Y(1)\)

9.2.6 Declaration (from RDSS)

# https://book.declaredesign.org/library/experimental-causal.html

prob <- 0.5
control_slope <- -1

declaration_18.3 <-
  declare_model(N = 100, X = runif(N, 0, 1),
                U = rnorm(N, sd = 0.1),
                Y_Z_1 = 1*X + U, Y_Z_0 = control_slope*X + U
  ) +
  declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) +
  declare_assignment(Z = complete_ra(N = N, prob = prob)) + 
  declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
  declare_estimator(Y ~ Z, inquiry = "ATE", label = "DIM") +
  declare_estimator(Y ~ Z + X, .method = lm_robust, inquiry = "ATE", label = "OLS") +
  declare_estimator(Y ~ Z, covariates = ~X, .method = lm_lin, 
                    inquiry = "ATE", label = "Lin")

9.2.7 Implied potential outcomes

The variances and covariance of potential outcomes depend on the slope parameter

9.2.8 To control or not


declaration_18.3 |> 
    control_slope = seq(-1, 1, 0.5), 
    prob = seq(0.1, 0.9, 0.1)) |> 

9.2.9 To control or not

  • Lin always does well
  • OLS does fine with equal probability assignments
  • Otherwise the ranking of DIM and OLS is design dependent

9.2.10 Conditional Bias and Precision Gains from Controls

  • Treatment correlated with covariates can induce “conditional bias.”
  • Including controls that are correlated with treatment can introduce inefficiencies
  • Including controls can change your estimates so be sure not to fish!

9.2.11 Declaration

a <- 0
b <- 0

design <- 
  declare_model(N = 100,
                        X = rnorm(N),
                        Z = complete_ra(N),
                        Y_Z_0 = a*X + rnorm(N),
                        Y_Z_1 = a*X + correlate(given = X, rho = b, rnorm) + 1,
                        Y = reveal_outcomes(Y ~ Z)) +
  declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) +
  declare_estimator(Y ~ Z, covariates = ~X, .method = lm_lin, label = "Lin") +
  declare_estimator(Y ~ Z,  label = "No controls") +
  declare_estimator(Z ~ X,  label = "Condition")

The design implements estimation controlling and not controlling for \(X\) and also keeps track of the results of a test for the relation between \(Z\) and \(X\).

9.2.12 Simulations

We simulate with many simulations over a range of designs

simulations <- 
  list(design |> redesign(a = 0, b = 0),  design |> redesign(a = 1, b = 0),  design |> redesign(a = 0, b = 1)) |>
  simulate_design(sims = 20000) 

9.2.13 Standard errors

We see the standard errors are larger when you control in cases in which the control is not predictive of the outcome and it is correlated with the treatment. Otherwise they can be smaller.

See Mutz et al

9.2.14 Errors

We also see “conditional bias” when we do not control: where the distribution of errors depends on the correlation with the covariate.


9.2.15 Estimands

Puzzle: Does the sample average treatment effect on the treated depend on the covariate balance?

9.3 Doubly robust estimation

9.3.1 Doubly robust estimation

Doubly robust estimation combines:

  1. A model for how the covariates predict the potential outcomes
  2. A model for how the covariates predict assignment propensities

Using both together to estimate potential outcomes using propensity weighting lets you do well even if either model is wrong.

Each part can be done using nonparameteric methods resulting in an overall semi-parametric procedure.

  • \(\pi(Z) = \Pr(Z=1|X)\): Estimate \(\hat\pi\)
  • \(Y_z = \mathbb{E}[Y|Z=z, X]\): Estimate \(\hat{Y}_z\)
  • Estimate of causal effect: \(\frac{1}{n}\sum_{i=1}^n\left(\left(\frac{Z_i}{\hat{\pi}_i}(Y_i - \hat{Y}_{i1}\right) - \left(\frac{1-Z_i}{1-\hat{\pi}_i}(Y_i - \hat{Y}_{i0}\right) + \left(\hat{Y}_{i1} - \hat{Y}_{i0}\right) \right)\)

9.3.2 Doubly robust estimation

  • Estimate of causal effect: \(\frac{1}{n}\sum_{i=1}^n\left(\left(\frac{Z_i}{\hat{\pi}_i}(Y_i - \hat{Y}_{i1}\right) - \left(\frac{1-Z_i}{1-\hat{\pi}_i}(Y_i - \hat{Y}_{i0}\right) + \left(\hat{Y}_{i1} - \hat{Y}_{i0}\right) \right)\)

  • Note that if \(\hat{Y}_{iz}\) are correct then the first parts drop out and we we get the right answer.

  • So if you can impute the potential outcomes, you are good (though hardly surprising)

9.3.3 Doubly robust estimation

  • More subtly say the \(\hat{\pi}\)s are correct, but your imputations are wrong; then we again have an unbiased estimator.

To see this imagine with probability \(\pi\) we assign unit 1 to treatment and 2 to control (otherwise 1 to control and 2 to treatment).

Then our expected estimate is:

\(\frac12\pi\left(\left(\frac{1}{\pi}(Y_{11} - \hat{Y}_{11}\right) - \left(\frac{1}{\pi}(Y_{20} - \hat{Y}_{20}\right) \right) + (1-\pi)\left(\left(\frac{1}{1-\pi}(Y_{21} - \hat{Y}_{21}\right) - \left(\frac{1}{1-\pi}(Y_{10} - \hat{Y}_{10}\right) \right) + \left(\hat{Y}_{11} - \hat{Y}_{20}\right) + \left(\hat{Y}_{21} - \hat{Y}_{10}\right)\)

\(\frac12\left(Y_{11} - Y_{10} + Y_{21}- Y_{20} +\pi\left(\left(\frac{1}{\pi}( - \hat{Y}_{11}\right) - \left(\frac{1}{\pi}( - \hat{Y}_{20}\right) \right) + (1-\pi)\left(\left(\frac{1}{1-\pi}( - \hat{Y}_{21}\right) - \left(\frac{1}{1-\pi}(- \hat{Y}_{10}\right) \right)\right) + \left(\hat{Y}_{11} - \hat{Y}_{20}\right) + \left(\hat{Y}_{21} - \hat{Y}_{10}\right)\)

\(\frac12\left(Y_{11} - Y_{10} + Y_{21}- Y_{20}\right)\)

Robins, Rotnitzky, and Zhao (1994)

9.3.4 Doubly robust estimation illustration

Consider this data (with confounding):

# df with true treatment effect of 1 
# (0.5 if race = 0; 1.5 if race = 1)

df <- fabricatr::fabricate(
  N = 5000,
  class = sample(1:3, N, replace = TRUE),
  race = rbinom(N, 1, .5),
  Z = rbinom(N, 1, .2 + .3*race),
  Y = .5*Z + race*Z + class + rnorm(N),
  qsmk = factor(Z),
  class = factor(class),
  race = factor(race)

9.3.5 Simple approaches

Naive regression produces biased estimates, even with controls. Lin regression gets the right result however.

# Naive
lm_robust(Y ~ Z, data = df)$coefficients[["Z"]]
[1] 1.229234
# OLS with controls
lm_robust(Y ~ Z + class + race, data = df)$coefficients[["Z"]]
[1] 1.139115
# Lin
lm_lin(Y ~ Z,  ~ class + race, data = df)$coefficients[["Z"]]
[1] 1.017271

9.3.6 Doubly robust estimation

drtmle is an R package that uses doubly robust estimation to compute “marginal means of an outcome under fixed levels of a treatment.”

drtmle_fit <- drtmle(
  W = df |> select(race, class), 
  A = df$Z, 
  Y = df$Y, 
  SL_Q = c("SL.glm", "SL.mean", "SL.glm.interaction"),
  SL_g = c("SL.glm", "SL.mean", "SL.glm.interaction"),
  SL_Qr = "SL.glm",
  SL_gr = "SL.glm", 
  maxIter = 1

9.3.7 Doubly robust estimation

# "Marginal means"
[1] 2.990577 1.972552
# Effects
ci(drtmle_fit, contrast = c(-1,1))
                   est    cil    ciu
E[Y(0)]-E[Y(1)] -1.018 -1.082 -0.954
wald_test(drtmle_fit, contrast = c(-1,1))
                       zstat pval
H0:E[Y(0)]-E[Y(1)]=0 -31.203    0

Resource: https://muse.jhu.edu/article/883477

9.3.8 Assessing performance

Challenge: Use DeclareDesign to compare performance of drtmle and lm_lin

9.4 Matching

9.4.1 Core idea

The matching idea can be seen as an application of the backdoor criterion:

Seek features \(X\) that block backdoor paths from \(Z\) to \(Y\) and estimate:

\[\sum_x p(x) \mathbb E[Y | Z = 1, X = x] - \mathbb E[Y | Z = 0, X = x]\]

This, fundamentally, is a blocked differences in means. The intuitions from the discussion of Lin estimation applies here.

9.4.2 ATT Wrinkle

In practice matching differs from simply controlling for covariates (with interactions) insofar matches are (often) sought specifically for treated units.

Thus for example in data like this:

0 1 1
0 2 2
1 1 3
1 1 4

we would not make use of the second observation. Indeed it is not even part of our estimand.

9.4.3 ATT Wrinkle

More formally if \(M_i\) are matches for each unit \(i\) of \(n_1\) treated units, our estimand is:

\[\tau_{ATT} = \frac{1}{n_1} \sum_{i: Z_i = 1}\left(Y_i(1) - Y_i(0)\right) \]

and we estimate:

\[\hat\tau_{ATT} =\frac{1}{n_1} \sum_{i: Z_i = 1}\left(Y_i - \frac{1}{|M_i|}\sum_{j \in M_i}Y_j\right) \]

9.4.4 ATE?

In principle however one could maintain a focus on ATE and find matches both for treated and for control units.

The commitment to the ATT then is not cooked in. See Stuart and Green (2008)

Example below.

9.4.5 Variations

  • Exact matching. Great if possible: use as matches units that exactly agree on all covariates
  • Coarsened exact matching. Transparent and easy: use as matches units that exactly agree on all coarsened covariates (i.e. nearly agree on covariates)
  • Distance matching e.g. Mahalanobis, nearest neighbor: use as matches units that are close in the multidimensional covariate space
  • Propensity matching. Clever but more model dependent: use as matches units that, baed ono covariates, have similar estimated probabilities of being in treatment

9.4.6 Exact matching illustration

matching_function <- 
  function(data, formula = "Z ~ X", method = "exact", ...) { 
    matched <- as.formula(formula) |> 
      MatchIt::matchit(method = method, data = data, ...) 

matching_method = "exact"

design <-
    N = 20, 
    U = rnorm(N), 
    X = sample(c(.2, .4, .6, .8), N, replace = TRUE),
    Z = rbinom(N, 1, prob = X),
    Y_Z_0 = 0.2 * X + U,
    Y_Z_1 = Y_Z_0 + 0.5,
    Y = reveal_outcomes(Y~Z)
  ) + 
  declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) +
  declare_step(handler = matching_function, method = matching_method) +
  declare_estimator(Y ~ Z, weights = weights, label = "Matched")

9.4.7 Investigate

data <- draw_data(design)
[1] 15  9
data|> group_by(X,Z) |> summarize(Y = mean(Y), n = n()) |> kable()
X Z Y n
0.2 0 -0.1203946 2
0.2 1 -0.7960410 1
0.4 0 -0.4513312 5
0.4 1 1.2997507 1
0.6 0 -0.1495231 4
0.6 1 -0.0433644 2

9.4.8 Investigate

get_estimates(design,  data) |> kable()
estimator term estimate std.error statistic p.value conf.low conf.high df outcome
Matched Z 0.3219382 0.5401371 0.5960305 0.5613898 -0.8449571 1.488833 13 Y

9.4.9 By hand

sets <- 
  data|> group_by(X,Z) |> summarize(Y = mean(Y), n = n()) |>
  group_by(X) |> summarize(diff = Y[2] - Y[1], n = n[2])

sets |> kable()
X diff n
0.2 -0.6756465 1
0.4 1.7510819 1
0.6 0.1061586 2
weighted.mean(sets$diff, sets$n)
[1] 0.3219382

9.4.10 Alternative method

Providing alternative arguments to the matching function which then get passed along yo MatchIt

design <-
    N = 20, 
    U = rnorm(N), 
    X = sample(c(.2, .4, .6, .8), N, replace = TRUE),
    Z = rbinom(N, 1, prob = X),
    Y_Z_0 = 0.2 * X + U,
    Y_Z_1 = Y_Z_0 + 0.5,
    Y = reveal_outcomes(Y~Z)
  ) + 
  declare_inquiry(ATT = mean(Y_Z_1[Z==1] - Y_Z_0[Z==1])) +
  declare_step(handler = matching_function, 
               method =  "full",
               distance = "glm", link = "probit") +
  declare_estimator(Y ~ Z, weights = weights, label = "Matched")

draw_estimates(design) |> kable()
estimator term estimate std.error statistic p.value conf.low conf.high df outcome
Matched Z 0.4901459 0.9547857 0.5133569 0.6139452 -1.515784 2.496076 18 Y

9.4.11 Matching estimator for DeclareDesign

For flexibility (e.g if you want to add multiple estimators) we can turn the matching and estimation into a single estimation function:

matching_estimator <- 
  function(data, match_formula = "Z ~ X", analysis_formula = "Y ~ Z", treatment = "Z", method = "exact", ...) { 
    matched <- as.formula(match_formula) |> 
      MatchIt::matchit(method = method, data = data, ...) 
    data <- MatchIt::match.data(matched) 
              data = data, weights = weights) |> 
      tidy() |> filter(term == treatment)

design <- design[[1]] + design[[2]] + 
  declare_estimator(handler = label_estimator(matching_estimator), 
                    match_formula = "Z ~ X", analysis_formula = "Y ~ Z",
                    method =  "full", distance = "glm", link = "probit",
                    inquiry = "ATT")

design |> draw_estimates() |> kable()
estimator term estimate std.error statistic p.value conf.low conf.high df outcome inquiry
estimator Z 0.1656742 0.5254799 0.3152816 0.7561737 -0.9383182 1.269667 18 Y ATT

9.4.12 Efficiency considerations

Here there is no actual confounding and there are efficiency losses from matching. Why?

design <-
    N = 20, 
    U = rnorm(N), 
    X = sample(c(.2, .4, .6, .8), N, replace = TRUE),
    Z = rbinom(N, 1, prob = .5),
    Y_Z_0 = 0.2 * X + U,
    Y_Z_1 = Y_Z_0 + 0.5,
    Y = reveal_outcomes(Y~Z)
  ) + 
    ATE = mean(Y_Z_1 - Y_Z_0), ATT = mean(Y_Z_1 - Y_Z_0)) +
  declare_estimator(Y ~ Z, label = "Unmatched", inquiry = "ATE") +
  declare_estimator(handler = label_estimator(matching_estimator), 
                    label = "Matched", inquiry = "ATT")

diagnose_design(design) |> reshape_diagnosis() |> select(Estimator, RMSE) |> kable()
Estimator RMSE
Unmatched 0.49
Matched 0.53

9.4.13 Puzzle

Say that in fact \(X\) does not affect \(Z\); then is there anything at stake in going from the ATE to the ATT?

9.4.14 For real

The MatchIt package (Ho et al. 2011) provides multiple methods and diagnostic tools to implement different strategies.

The below example of full propensity matching is from their vignettes


  • Choose covariates (close the backdoor)
  • Choose matching method
  • Check matches
  • Maybe revise method
  • Implement

9.4.15 For real

“Lalonde” data has been used by many researchers to assess the merits of different strategies. It is used ot assess job training effect on earnings. We have before and after earnings data and various characteristics.

The basic analysis would yield:

lm_robust(re78 ~ treat, data = lalonde) |> tidy() |> kable()
term estimate std.error statistic p.value conf.low conf.high df outcome
(Intercept) 6984.1697 352.1654 19.8320697 0.0000000 6292.570 7675.7691 612 re78
treat -635.0262 677.1954 -0.9377297 0.3487532 -1964.935 694.8824 612 re78

9.4.16 For real

But there are clear concerns regarding confounding:

lm_robust(treat ~ age + educ + race + married + nodegree + re74 +  re75, data = lalonde) |> 
  tidy() |> kable(digits = 2)
term estimate std.error statistic p.value conf.low conf.high df outcome
(Intercept) 0.34 0.13 2.67 0.01 0.09 0.59 605 treat
age 0.00 0.00 1.34 0.18 0.00 0.01 605 treat
educ 0.02 0.01 2.62 0.01 0.01 0.04 605 treat
racehispan -0.44 0.05 -8.10 0.00 -0.55 -0.34 605 treat
racewhite -0.53 0.04 -13.95 0.00 -0.60 -0.45 605 treat
married -0.09 0.04 -2.60 0.01 -0.16 -0.02 605 treat
nodegree 0.10 0.04 2.20 0.03 0.01 0.19 605 treat
re74 0.00 0.00 -2.26 0.02 0.00 0.00 605 treat
re75 0.00 0.00 0.75 0.46 0.00 0.00 605 treat

9.4.17 For real: Match

Here we match using a probit mode to predict assignment probabilities:

pm <- 
  matchit(treat ~ age + educ + race + married + nodegree + re74 + re75,
          data = lalonde,
          method = "full",
          distance = "glm",
          link = "probit")
pm_data <- MatchIt::match.data(pm) 

pm_data |> arrange(subclass) |> head() |> kable(digits = 2)
treat age educ race married nodegree re74 re75 re78 distance weights subclass
NSW1 1 37 11 black 1 1 0.00 0.00 9930.05 0.64 1.00 1
PSID69 0 30 17 black 0 0 17827.37 5546.42 14421.13 0.63 2.32 1
NSW10 1 33 12 white 1 0 0.00 0.00 12418.07 0.04 1.00 2
PSID13 0 34 8 white 1 1 8038.87 11404.35 5486.80 0.04 0.06 2
PSID16 0 42 0 hispan 1 1 2797.83 10929.92 9922.93 0.05 0.06 2
PSID26 0 27 7 white 1 1 3064.29 8461.07 11149.45 0.04 0.06 2

“matching as pre-processing”

9.4.18 For real: Inspect


Huge gains in balance

9.4.19 For real: Estimate

lm(re78 ~ treat * (age + educ + race + married + nodegree + re74 + re75),
          data = pm_data, weights = weights) |>
    variables = "treat",
    vcov = ~subclass, 
    newdata = subset(treat == 1))

 Estimate Std. Error    z Pr(>|z|)   S 2.5 % 97.5 %
     1977        704 2.81  0.00501 7.6   596   3357

Term: treat
Type:  response 
Comparison: 1 - 0

9.4.20 Intuition: What are these weights?

pm_data <- pm_data |> 
       propensity = glm(
         treat ~ age + educ + race + married + nodegree + re74 + re75,
         data = pm_data, family = binomial(link = "probit"))$fitted.values)

pm_data |> 
  ggplot(aes(propensity, weights, color = subclass, shape = factor(treat))) +  geom_point() +
  scale_y_continuous(trans = "log") + theme(legend.position = "none")

Weights on treated units = 1. If propensity is low, treated units weighted more than control.

9.4.21 Intuition: Comparison with IPW (by hand)

lm_robust(re78 ~ treat * (age + educ + race + married + nodegree + re74 + re75 + ipw),
   data = pm_data |> 
     mutate(ipw = 1/(treat*propensity + (1-treat)*(1-propensity))) |>
     filter(ipw <= 20), 
   weights = ipw) |>
    variables = "treat",
    newdata = subset(treat == 1))

 Estimate Std. Error    z Pr(>|z|)   S 2.5 % 97.5 %
     1854        852 2.18   0.0296 5.1   184   3524

Term: treat
Type:  response 
Comparison: 1 - 0

9.5 Diff in diff

Key idea: the evolution of units in the control group allow you to impute what the evolution of units in the treatment group would have been had they not been treated

9.5.1 Logic

We have group \(A\) that enters treatment at some point and group \(B\) that never does

The estimate:

\[\hat\tau = (\mathbb{E}[Y^A | post] - \mathbb{E}[Y^A | pre]) -(\mathbb{E}[Y^B | post] - \mathbb{E}[Y^B | pre])\] (how different is the change in \(A\) compared to the change in \(B\)?)

can be written:

\[\hat\tau = (\mathbb{E}[Y_1^A | post] - \mathbb{E}[Y_0^A | pre]) -(\mathbb{E}[Y_0^B | post] - \mathbb{E}[Y_0^B | pre])\]

9.5.2 Logic

\[\hat\tau = (\mathbb{E}[Y_1^A | post] - \mathbb{E}[Y_0^A | pre]) -(\mathbb{E}[Y_0^B | post] - \mathbb{E}[Y_0^B | pre])\] Adding and subtract \(\mathbb{E}[Y_0^A | post]\):

\[\hat\tau = (\mathbb{E}[Y_1^A | post] - \mathbb{E}[Y_0^A | post]) + ((\mathbb{E}[Y_0^A | post] - \mathbb{E}[Y_0^A | pre]) -(\mathbb{E}[Y_0^B | post] - \mathbb{E}[Y_0^B | pre]))\]

Cleaning up:

\[\hat\tau_{ATT} = \tau_{ATT} + \text{Difference in trends}\]

9.5.3 Simplest case

n_units <- 2
design <- 
    unit = add_level(N = n_units, I = 1:N),
    time = add_level(N = 6, T = 1:N, nest = FALSE),
    obs = cross_levels(by = join_using(unit, time))) +
  declare_model(potential_outcomes(Y ~ I + T^.5 + Z*T)) +
  declare_assignment(Z = 1*(I>(n_units/2))*(T>3)) +
  declare_measurement(Y = reveal_outcomes(Y~Z)) + 
  declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0),
                  ATT = mean(Y_Z_1[Z==1] - Y_Z_0[Z==1])) +
  declare_estimator(Y ~ Z, label = "naive") + 
  declare_estimator(Y ~ Z + I, label = "FE1") + 
  declare_estimator(Y ~ Z + as.factor(T), label = "FE2") + 
  declare_estimator(Y ~ Z + I + as.factor(T), label = "FE3")  

9.5.4 Diagnosis

Here only the two way fixed effects is unbiased and only for the ATT.

The ATT here is averaging over effects for treated units (later periods only). We know nothing about the size of effects in earlier periods when all units are in control!

design |> diagnose_design() 
Inquiry Estimator Bias
ATE FE1 2.25
ATE FE2 6.50
ATE FE3 1.50
ATE naive 5.40
ATT FE1 0.75
ATT FE2 5.00
ATT FE3 0.00
ATT naive 3.90

9.5.5 The classic graph

design |> 
  draw_data() |>
  ggplot(aes(T, Y, color = unit)) + geom_line() +
       geom_point(aes(T, Y_Z_0)) + theme_bw()

9.5.6 Extends to multiple units easily

design |> redesign(n_units = 10) |> diagnose_design() 
Inquiry Estimator Bias
ATE FE1 2.25
ATE FE2 6.50
ATE FE3 1.50
ATE naive 5.40
ATT FE1 0.75
ATT FE2 5.00
ATT FE3 0.00
ATT naive 3.90

9.5.7 Extends to multiple units easily

design |> 
  redesign(n_units = 10) |>  
  draw_data() |> 
  ggplot(aes(T, Y, color = unit)) + geom_line() +
       geom_point(aes(T, Y_Z_0)) + theme_bw()

9.5.8 In practice

  • Need to defend parallel trends
  • Most typically using an event study: predicting trends in a unit based on its history
  • Sometimes: report balance between treatment and control groups in covariates
  • Placebo leads and lags

9.5.9 Heterogeneity

  • Things get much more complicated when there is

    1. heterogeneous timing in treatment take up and
    2. heterogeneous effects
  • It’s only recently been appreciated how tricky things can get

  • But we already have an intuition from our analysis of trials with heterogeneous assignment and heterogeneous effects:

  • in such cases fixed effects analysis weights stratum level treatment effects by the variance in assignment to treatment

  • something similar here

9.5.10 Staggared assignments

Just two units assigned at different times:

trend = 0

design <- 
    unit = add_level(N = 2, ui = rnorm(N), I = 1:N),
    time = add_level(N = 6, ut = rnorm(N), T = 1:N, nest = FALSE),
    obs = cross_levels(by = join_using(unit, time))) +
    potential_outcomes(Y ~ trend*T + (1+Z)*(I == 2))) +
  declare_assignment(Z = 1*((I == 1) * (T>3) + (I == 2) * (T>5))) +
  declare_measurement(Y = reveal_outcomes(Y~Z), 
                      I_c = I - mean(I)) +
  declare_inquiry(mean(Y_Z_1 - Y_Z_0)) +
  declare_estimator(Y ~ Z, label = "1. naive") + 
  declare_estimator(Y ~ Z + I, label = "2. FE1") + 
  declare_estimator(Y ~ Z + as.factor(T), label = "3. FE2") + 
  declare_estimator(Y ~ Z + I + as.factor(T), label = "4. TWFE") + 
  declare_estimator(Y ~ Z*I_c + as.factor(T), label = "5. Sat")  

9.5.11 Staggared assignments diagnosis

draw_data(design) |> mutate(I = factor(I)) |>
  ggplot(aes(T, Y, color = I, label = Z)) + geom_text()

9.5.12 Staggared assignments diagnosis

Estimator Mean Estimand Mean Estimate
1. naive 0.50 -0.12
(0.00) (0.00)
2. FE1 0.50 0.36
(0.00) (0.00)
3. FE2 0.50 -1.00
(0.00) (0.00)
4. TWFE 0.50 0.25
(0.00) (0.00)
5. Sat 0.50 0.50
(0.00) (0.00)

9.5.13 Where do these numbers come?

  • The estimand is .5 – this comes from weighting the effect for unit 1 (0) and the effect for unit 2 (1) equally

  • The naive estimate is wildly off because it does not take into account that units with different treatment shares have different average levels in outcomes

9.5.14 Where do these numbers come?

  • The estimate when we control for unit is 0.36: this comes from weighting the unit-stratum level effects according to the variance of assignment to each stratum:
design |> draw_data() |> group_by(unit) |> summarize(var = mean(Z)*(1-mean(Z))) |>
  mutate(weight = var/sum(var)) |> kable(digits = 2)
unit var weight
1 0.25 0.64
2 0.14 0.36

9.5.15 Where do these numbers come?

  • The estimate when we control for time is -1: this comes from weighting the time-stratum level effects according to the variance of assignment to each stratum
  • it weights periods 4 and 5 only and equally, yielding the difference in outcomes for unit 1 in treatment (0) and group 2 in control (1)
design |> draw_data()  |> group_by(time) |> summarize(var = mean(Z)*(1-mean(Z))) |>
  mutate(weight = var/sum(var)) |> kable(digits = 2)
time var weight
1 0.00 0.0
2 0.00 0.0
3 0.00 0.0
4 0.25 0.5
5 0.25 0.5
6 0.00 0.0

9.5.16 Where do these numbers come?

  • The estimate when we control for time and unit is 0.25

  • This is actually a lot harder to interpret:

  • We can figure out what it is from the “Goodman-Bacon decomposition” in Goodman-Bacon (2021)

9.5.17 Where do these numbers come?

9.5.18 Where do these numbers come?

In this case we can think of our data having the following structure:

y1 y2
pre 0 1
mid 0 1
post 0 2
  • the mid to pre diff in diff is (0-0) - (1-1) = 0 (group 2 serves as control)
  • the post to mid diff in diff is (2-1) - (0-0) = 1 (group 1 serves as control though already in treatment!)

TWFE gives a weighted average of these two, putting a 3/4 weight on the first and a 1/4 weight on the second

9.5.19 Two way fixed effects


\[\hat\beta^{DD} = \mu_{12}\hat\beta^{2 \times 2, 1}_{12} + (1-\mu_{12})\hat\beta^{2 \times 2, 2}_{12}\]

where \(\mu_{12} = \frac{1-\overline{Z_1}}{1-(\overline{Z_1}-\overline{Z_2})} = \frac{1-\frac36}{1-\left(\frac36-\frac16\right)} = \frac34\)

\[\frac34\hat\beta^{2 \times 2, 1}_{12} + \frac14 \hat\beta^{2 \times 2, 2}_{12}\] (weights formula from WP version)

9.5.20 Two way fixed effects


  • \(\hat\beta^{2 \times 2, 1}_{12}\) is \((\overline{y}_1^{MID(1,2)} - \overline{y}_1^{PRE(1)}) - (\overline{y}_2^{MID(1,2)} - \overline{y}_2^{PRE(1)})\)

which in the simple example without time trends is \((0 - 0) - (1-1) = 0\)

  • \(\hat\beta^{2 \times 2, 2}_{12}\) is \((\overline{y}_2^{POST(2)} - \overline{y}_2^{MID(1,2)}) - (\overline{y}_1^{POST(2)} - \overline{y}_1^{MID(1,2)})\)

which is \((2 - 1) - (0 - 0) = 1\)

9.5.21 A problem

So quite complex weighting of different comparisons

  • Units effectively get counted as both treatment and control units for different comparisons
  • Treated units counted as control units
  • Effectively negative weights on some quantities
  • Possible to have very poorly performing estimates

9.5.22 Solutions

  • Involve better specification of estimands
  • Use of comparisons directly relevant for the estimands
  • Imputation of control outcomes in treated units using data from appropriate control units only and then targetting estimands directly (Borusyak, Jaravel, and Spiess 2021)
  • Particular solution: focus on the effect of treatment at the time of first treatment / or time of switching: this usually involves a no carryover assumption (De Chaisemartin and d’Haultfoeuille 2020) also Imai and Kim (2021)

9.5.23 Solutions

See excellent review by Roth et al. (2023)


9.5.24 Chaisemartin and d’Haultfoeuille (2020).

design <- 
    unit = add_level(N = 4, I = 1:N),
    time = add_level(N = 8, T = 1:N, nest = FALSE),
    obs = cross_levels(by = join_using(unit, time),
                       potential_outcomes(Y ~ T + (1 + Z)*I))) +
  declare_assignment(Z = 1*(T > (I + 4))) +
    Y = reveal_outcomes(Y~Z),
    Z_lag = lag_by_group(Z, groups = unit, n = 1, order_by = T)) +
    ATT_switchers = mean(Y_Z_1 - Y_Z_0), 
    subset = Z == 1 & Z_lag == 0 & !is.na(Z_lag)) +
    Y = "Y",  G = "unit",  T = "T",  D = "Z", mode = "old", 
    handler = label_estimator(rdss::did_multiplegt_tidy),
    inquiry = c("ATT_switchers"),
    label = "chaisemartin"

9.5.25 Chaisemartin and d’Haultfoeuille (2020)

Note the inquiry

run_design(design) |> kable(digits = 2)
inquiry estimand estimator estimate
ATT_switchers 2 chaisemartin 2

9.5.26 Triple differencing

  • A response to concerns that double differencing is not enough is to triple difference

  • When you think that there may be a violation of parallel tends but you have other outcomes that would pick up the same difference in trends

  • See Olden and Møen (2022)

9.5.27 Triple differencing

You are interested in the effects of influx of refugees on right wing voting

  • You have (say more conservative) states with no refugees at any period
  • You have (say more liberal) states with refugees post 2016 only

You want to do differences in differences comparing these states before and after

  • However you worry that things change differntially in the conservative and liberal states: no parallel trends

  • but you can identify areas within states that are more or less likely to be exposed and compare differnces in differences in the exposed and unexposed groups.

9.5.28 Triple differencing


  • Two types of states: \(L \in \{0,1\}\), only \(L=1\) types get refugee influx
  • Two time periods: \(Post \in \{0,1\}\), refugee influx occurs in period \(Post = 1\)
  • Two groups: \(B \in \{0,1\}\), \(B=1\) types affected by refugee influx

\[Y = \beta_0 + \beta_1 L + \beta_2 B + \beta_3 Post + \beta_4 LB + \beta_5 L Post + \beta_6 B Post + \beta_7L B Post + \epsilon\]

9.5.29 Triple differencing

\[\frac{\partial ^3Y}{\partial L \partial B \partial Post} = \beta_7\]

9.5.30 Can we not just condition on the \(B=1\) types?

The level among the \(B=1\) types is:

\[Y = \beta_0 + \beta_1 L + \beta_2 + \beta_3 Post + \beta_4 L + \beta_5 L Post + \beta_6 Post + \beta_7L Post + \epsilon\] If you did simple before / after differences among the \(B\) types you would get

\[\Delta Y| L = 1, B = 1 = \beta_3 + \beta_5 + \beta_6 + \beta_7\] \[\Delta Y| L = 0, B = 1 = \beta_3 + \beta_6\]

9.5.31 Can we not just condition on the \(B=1\) types?

And so if you differenced again you would get:

\[\Delta^2 Y| B = 1 = \beta_5 + \beta_7\] So the problem is that you have \(\beta^5\) in here which corresponds exactly to how \(L\) states change over time.

9.5.32 Triple

But we can figure out \(\beta_5\) by doing a diff in diff among the \(B\)’s.

\[Y|B = 0 = \beta_0 + \beta_1 L + \beta_3 Post + \beta_5 L Post\]

\[\Delta^2 Y| B = 0 = \beta_5\]

9.5.33 Easier to swallow?

  • The identifying assumption is that absent treatment the differences in trends between \(L=0\) and \(L=1\) would be the same for units with \(B=0\) and \(B=1\).

  • See equation 5.4 in Olden and Møen (2022)

\[ \left(E[Y_0|L=1, B=1, {\textit {Post}}=1] - E[Y_0|L=1, B=1, {\textit {Post}}=0]\right) \\ \quad - \ \left(E[Y_0|L=1, B=0, {\textit {Post}}=1] - E[Y_0|L=1, B=0, {\textit {Post}}=0]\right) \\ = \nonumber \\ \left(E[Y_0|L=0, B=1, {\textit {Post}}=1] - E[Y_0|L=0, B=1, {\textit {Post}}=0]\right) \\ \quad - \ \left(E[Y_0|L=0, B=0, {\textit {Post}}=1] - E[Y_0|L=0, B=0, {\textit {Post}}=0]\right)\]

9.5.34 Easier to swallow?

  • In a sense this is one parallel trends assumption, not two

  • But there are four counterfactual quantities in this expression.

Puzzle: Is it possible to have an effect identified by a difference in difference but incorrectly by a triple difference design?

10 Topics 2

10.1 Synthetic controls

Interested in the effect of an intervention on a single unit.

Available data: time series for unit of interest and possible comparison groups

10.1.1 Core idea

  • Generate a Frankenstein control from weighting a set of control units so that they collectively track the unit of interest and can plausibly serve as a counterfactual.

  • Identify weights \(w\) that minimize \(\left(Y_i^s(0) - \sum_{j\neq i} w_j Y_j^s(0)\right)^2\) in time period(s) \(s\), prior to treatment; use these to predict counterfactual outcomes \(Y^t_i(0)\) in later periods, \(t\)

  • Note, in Abadie and Gardeazabal (2003) weights are chosen to minimize \((X_1 - X_0W)'V(X_1 - X_0W)\) where \(X_1\), \(X_0\) are covariates in the unit of interest and the “donor” units respectively. \(V\) is chosen so that the pre treatment outcomes are as close as possible.

10.2 Optimization problem

In practice weights are found using a two stage minimization procedure:

  1. Given matrix \(V\) choose \(W(V)\) to minimize \((X_1-X_0W)'V(X1-X_0W)\)
  2. Choose \(V\) to minimize \((Z_1-Z_0W(V))'(X1-X_0W(V))\)

So in effect you end up with weights on cases and also weights on variables

10.2.1 Basque example

  • The seminal study is: Abadie and Gardeazabal (2003) which examines the impact of conflict on the Basque area

  • A vignette showing how to use the SCtools package to implement synthetic controls for this example is (here)[https://cran.r-project.org/web/packages/SCtools/vignettes/replicating-basque.html]

  • We will walk through this and then interrogate this with a common package and a tidy version

10.2.2 Basque example

Packages and data:


10.2.3 Basque example

Some fairly nasty data manipulation:

f_dataprep  <- function(data, time_plot = min(data$year):max(data$year))
  foo = data,
  predictors = c("school.illit", "school.prim", "school.med",
     "school.high", "school.post.high", 
  predictors.op = "mean",
  time.predictors.prior = 1964:1969,
  special.predictors = list(
    list("gdpcap", 1960:1969 ,"mean"),
    list("sec.agriculture", seq(1961, 1969, 2), "mean"),
    list("sec.energy", seq(1961, 1969, 2), "mean"),
    list("sec.industry", seq(1961, 1969, 2), "mean"),
    list("sec.construction", seq(1961, 1969, 2), "mean"),
    list("sec.services.venta", seq(1961, 1969, 2), "mean"),
    list("sec.services.nonventa", seq(1961, 1969, 2), "mean"),
    list("popdens",               1969,               "mean")),
  dependent = "gdpcap",
  unit.variable = "regionno",
  unit.names.variable = "regionname",
  time.variable = "year",
  treatment.identifier = 17,
  controls.identifier = c(2:16, 18),
  time.optimize.ssr = 1960:1969,
  time.plot = time_plot)

dataprep.out  <- f_dataprep(basque)

10.2.4 Implementation (with messages)

synth.out <- synth(data.prep.obj = dataprep.out, method = "BFGS")

X1, X0, Z1, Z0 all come directly from dataprep object.

 searching for synthetic control unit  


MSPE (LOSS V): 0.008864606 

 0.02773094 1.194e-07 1.60609e-05 0.0007163836 1.486e-07 0.002423908 0.0587055 0.2651997 0.02851006 0.291276 0.007994382 0.004053188 0.009398579 0.303975 

 2.53e-08 4.63e-08 6.44e-08 2.81e-08 3.37e-08 4.844e-07 4.2e-08 4.69e-08 0.8508145 9.75e-08 3.2e-08 5.54e-08 0.1491843 4.86e-08 9.89e-08 1.162e-07 

10.2.5 Results

This produces the following weights, selected to render the pre-treatment trends of the synthetic and actual controls as similar as possible.

regionno regionname w.weight
1 Spain (Espana) NA
2 Andalucia 0.00
3 Aragon 0.00
4 Principado De Asturias 0.00
5 Baleares (Islas) 0.00
6 Canarias 0.00
7 Cantabria 0.00
8 Castilla Y Leon 0.00
9 Castilla-La Mancha 0.00
10 Cataluna 0.85
11 Comunidad Valenciana 0.00
12 Extremadura 0.00
13 Galicia 0.00
14 Madrid (Comunidad De) 0.15
15 Murcia (Region de) 0.00
16 Navarra (Comunidad Foral De) 0.00
17 Basque Country (Pais Vasco) NA
18 Rioja (La) 0.00

10.2.6 Plotting the counterfactual

Lets plot by hand and compare with package

basque_df <- basque |> filter(regionno == 17) |> 
  mutate(counterfactual = .15*basque$gdpcap[basque$regionno == 14] + .85*basque$gdpcap[basque$regionno == 10] )

basque_df |> ggplot(aes(year, gdpcap)) + geom_line(color = "blue") + 
  geom_line(aes(year, counterfactual), color = "red") + theme_bw() +
  ylab("real per-capita GDP (1986 USD, thousand)") + xlab("year")

10.2.7 Plotting 2

path.plot(synth.res = synth.out, dataprep.res = dataprep.out,
          Ylab = "real per-capita GDP (1986 USD, thousand)", Xlab = "year",
          Ylim = c(0, 12), Legend = c("Basque country",
                                      "synthetic Basque country"), 
          Legend.position = "bottomright")

10.2.8 Then what?

What is your estimate?

How do you do inference?

  • Estimate could be at a particular point in time or averaging over a range. Inference often based on placebos: assess whether estimated effects are large relative to what you would get if you looked for effects in random places.

  • This has an RI flavor to it, but we are not exploiting any actual randomization.

  • Alternatives are to do sensitivity analyses, bootstrap (though this is really capturing sampling variability), and perhaps using estimates of the uncertainty from errors in pre treatment predictions.

10.2.9 tidysynth implementation

tidysynth a wrapper for synth and tidier for sure

Vignette on: Impact of Proposition 99 on cigarette consumption in California. https://github.com/edunford/tidysynth

smoking |> head() |> kable()
state year cigsale lnincome beer age15to24 retprice
Rhode Island 1970 123.9 NA NA 0.1831579 39.3
Tennessee 1970 99.8 NA NA 0.1780438 39.9
Indiana 1970 134.6 NA NA 0.1765159 30.6
Nevada 1970 189.5 NA NA 0.1615542 38.9
Louisiana 1970 115.9 NA NA 0.1851852 34.3
Oklahoma 1970 108.4 NA NA 0.1754592 38.4

10.2.10 tidysynth implementation

In a pipe:

smoking_out <-
  smoking %>%
  # initial the synthetic control object
    outcome = cigsale, # outcome
    unit = state, # unit index in the panel data
    time = year, # time index in the panel data
    i_unit = "California", # unit where the intervention occurred
    i_time = 1988, # time period when the intervention occurred
    generate_placebos=T # generate placebo synthetic controls (for inference)

10.2.11 add predictor information

smoking_out <- smoking_out %>%
  # Generate the aggregate predictors used to fit the weights
  # average log income, retail price of cigarettes, and proportion of the
  # population between 15 and 24 years of age from 1980 - 1988
  generate_predictor(time_window = 1980:1988,
                     ln_income = mean(lnincome, na.rm = T),
                     ret_price = mean(retprice, na.rm = T),
                     youth = mean(age15to24, na.rm = T)) %>%
  # average beer consumption in the donor pool from 1984 - 1988
  generate_predictor(time_window = 1984:1988, 
                     beer_sales = mean(beer, na.rm = T)) %>%
  # Lagged cigarette sales 
  generate_predictor(time_window = 1975, cigsale_1975 = cigsale) %>%
  generate_predictor(time_window = 1980, cigsale_1980 = cigsale) %>%
  generate_predictor(time_window = 1988, cigsale_1988 = cigsale) 

10.2.12 implement

smoking_out <- smoking_out %>%
  # Generate the fitted weights for the synthetic control
  generate_weights(optimization_window = 1970:1988, # time for optimization
                   margin_ipop = .02,sigf_ipop = 7,bound_ipop = 6 # options
  ) %>%
  # Generate the synthetic control

10.2.14 plot weights

smoking_out %>% plot_weights()

10.2.15 plot placebos

smoking_out %>% plot_placebos()

10.2.16 Design declaration

sc_helper <- function(data, treatment_time = 2010)

  data |>
  # initial the synthetic control object
    outcome = Y, # outcome
    unit = state, # unit index in the panel data
    time = year, # time index in the panel data
    i_unit = "01", # unit where the intervention occurred
    i_time = treatment_time, # time period when the intervention occurred
    generate_placebos = F  # generate placebo synthetic controls
    )  |>
  generate_predictor(time_window = 2000:2020,
                     lnincome = mean(ln_income, na.rm = T),
                     youth = mean(age15to24, na.rm = T)) %>%
  generate_weights(optimization_window = 2000:2010, # time for optimization
                   margin_ipop = .02,sigf_ipop = 7,bound_ipop = 6 # options
  ) |>

10.2.17 Declare an estimator

sc_estimator <- function(data, treatment_time = 2010) 
  data.frame(estimate = 
               sc_helper(data, treatment_time = 2010) |>
               grab_synthetic_control() |> 
               mutate(difference = real_y - synth_y) |>
               filter(time_unit >= treatment_time) |> 
               pull(difference) |> 

10.2.18 Design

design <-
    state = add_level(50, 
                      youth_mean = runif(N, .2, .7), 
                      income_mean = runif(N, 1, 5), 
                      b = runif(N, 0, 10)),
    time = add_level(21, year = 2000:2020, nest = FALSE),
    unit = cross_levels(join_using(state, time))) +
    treatment = 1*(state == "01" & (year >=2010)),
    age15to24 = youth_mean + .5*runif(N, -.1, .1) - .2*(year- 2000),
    ln_income = income_mean + .5*rlnorm(N),
    Y = 5*age15to24 + .1*(year- 2000)^2 +  1*ln_income + b* treatment) +
  declare_inquiry(b[1]) +
  declare_estimator(handler = label_estimator(sc_estimator))

10.2.19 Inspection

draw_data(design) |> sc_helper()  |> plot_trends()

10.2.20 Simulation

simulations <- simulate_design(design, sims = 10)

simulations |> ggplot(aes(estimand, estimate)) + geom_point()

10.3 Noncompliance and the LATE estimand

10.3.1 Local Average Treatment Effects

Sometimes you give a medicine but only a nonrandom sample of people actually try to use it. Can you still estimate the medicine’s effect?

X=0 X=1
Z=0 \(\overline{y}_{00}\) (\(n_{00}\)) \(\overline{y}_{01}\) (\(n_{01}\))
Z=1 \(\overline{y}_{10}\) (\(n_{10}\)) \(\overline{y}_{11}\) (\(n_{11}\))

Say that people are one of 3 types:

  1. \(n_a\) “always takers” have \(X=1\) no matter what and have average outcome \(\overline{y}_a\)
  2. \(n_n\) “never takers” have \(X=0\) no matter what with outcome \(\overline{y}_n\)
  3. \(n_c\) “compliers have” \(X=Z\) and average outcomes \(\overline{y}^1_c\) if treated and \(\overline{y}^0_c\) if not.

10.3.2 Local Average Treatment Effects

Sometimes you give a medicine but only a non random sample of people actually try to use it. Can you still estimate the medicine’s effect?

X=0 X=1
Z=0 \(\overline{y}_{00}\) (\(n_{00}\)) \(\overline{y}_{01}\) (\(n_{01}\))
Z=1 \(\overline{y}_{10}\) (\(n_{10}\)) \(\overline{y}_{11}\) (\(n_{11}\))

We can figure something about types:

\(X=0\) \(X=1\)
\(Z=0\) \(\frac{\frac{1}{2}n_c}{\frac{1}{2}n_c + \frac{1}{2}n_n} \overline{y}^0_{c}+\frac{\frac{1}{2}n_n}{\frac{1}{2}n_c + \frac{1}{2}n_n} \overline{y}_{n}\) \(\overline{y}_{a}\)
\(Z=1\) \(\overline{y}_{n}\) \(\frac{\frac{1}{2}n_c}{\frac{1}{2}n_c + \frac{1}{2}n_a} \overline{y}^1_{c}+\frac{\frac{1}{2}n_a}{\frac{1}{2}n_c + \frac{1}{2}n_a} \overline{y}_{a}\)

10.3.3 Local Average Treatment Effects

You give a medicine to 50% but only a non random sample of people actually try to use it. Can you still estimate the medicine’s effect?

\(X=0\) \(X=1\)
\(Z=0\) \(\frac{n_c}{n_c + n_n} \overline{y}^0_{c}+\frac{n_n}{n_c + n_n} \overline{y}_n\) \(\overline{y}_{a}\)
(n) (\(\frac{1}{2}(n_c + n_n)\)) (\(\frac{1}{2}n_a\))
\(Z=1\) \(\overline{y}_{n}\) \(\frac{n_c}{n_c + n_a} \overline{y}^1_{c}+\frac{n_a}{n_c + n_a} \overline{y}_{a}\)
(n) (\(\frac{1}{2}n_n\)) (\(\frac{1}{2}(n_a+n_c)\))

Key insight: the contributions of the \(a\)s and \(n\)s are the same in the \(Z=0\) and \(Z=1\) groups so if you difference you are left with the changes in the contributions of the \(c\)s.

10.3.4 Local Average Treatment Effects

Average in \(Z=0\) group: \(\frac{{n_c} \overline{y}^0_{c}+ \left(n_{n}\overline{y}_{n} +{n_a} \overline{y}_a\right)}{n_a+n_c+n_n}\)

Average in \(Z=1\) group: \(\frac{{n_c} \overline{y}^1_{c} + \left(n_{n}\overline{y}_{n} +{n_a} \overline{y}_a \right)}{n_a+n_c+n_n}\)

So, the difference is the ITT: \(({\overline{y}^1_c-\overline{y}^0_c})\frac{n_c}{n}\)

Last step:

\[ITT = ({\overline{y}^1_c-\overline{y}^0_c})\frac{n_c}{n}\]


\[LATE = \frac{ITT}{\frac{n_c}{n}}= \frac{\text{Intent to treat effect}}{\text{First stage effect}}\]

10.3.5 The good and the bad of LATE

  • You get a well-defined estimate even when there is non-random take-up
  • May sometimes be used to assess mediation or knock-on effects
  • But:
    • You need assumptions (monotonicity and the exclusion restriction – where were these used above?)
    • Your estimate is only for a subpopulation
    • The subpopulation is not chosen by you and is unknown
    • Different encouragements may yield different estimates since they may encourage different subgroups

10.3.6 Pearl and Chickering again

With and without an imposition of monotonicity


models <- 
  list(unrestricted =  make_model("Z -> X -> Y; X <-> Y"),
       restricted =  make_model("Z -> X -> Y; X <-> Y") |>
         set_restrictions("X[Z=1] < X[Z=0]")) |> 
  lapply(update_model,  data = lipids_data, refresh = 0) 

models |>
  query_model(query = list(CATE = "Y[X=1] - Y[X=0]", 
                           Nonmonotonic = "X[Z=1] < X[Z=0]"),
              given = list("X[Z=1] > X[Z=0]", TRUE),
              using = "posteriors") 

10.3.7 Pearl and Chickering again

With and without an imposition of monotonicity:

model query mean sd
unrestricted CATE 0.70 0.05
restricted CATE 0.71 0.05
unrestricted Nonmonotonic 0.01 0.01
restricted Nonmonotonic 0.00 0.00

In one case we assume monotonicity, in the other we update on it (easy in this case because of the empirically verifiable nature of one sided non compliance)

10.4 Ethics

10.4.1 Constraint: Is it ethical to manipulate subjects for research purposes?

  • There is no foundationless answer to this question.

  • Belmont principles commonly used for guidance:

    1. Respect for persons
    2. Beneficence
    3. Justice
  • Unfortunately, operationalizing these requires further ethical theories. (1) is often operationalized by informed consent (a very liberal idea). (2) and (3) sometimes by more utiliarian principles

  • The major focus on (1) by IRBs might follow from the view that if subjects consent, then they endorse the ethical calculations made for 2 and 3 — they think that it is good and fair.

  • Trickiness: can a study be good or fair because of implications for non-subjects?

10.4.2 Is it ethical to manipulate subjects for research purposes?

  • Many (many) field experiments have nothing like informed consent.

  • For example, whether the government builds a school in your village, whether an ad appears on your favorite radio show, and so on.

  • Consider three cases:

    1. You work with a nonprofit to post (true?) posters about the crimes of politicians on billboards to see effects on voters
    2. You hire confederates to offer bribes to police officers to see if they are more likely to bend the law for coethnics
    3. The British government asks you to work on figuring out how the use of water cannons helps stop rioters rioting

10.4.3 Is it ethical to manipulate subjects for research purposes?

  • Consider three cases:

    • You work with a nonprofit to post (true?) posters about the crimes of politicians on billboards to see effects on voters
    • You hire confederates to offer bribes to police officers to see if they are more likely to bend the law for coethnics
    • The British government asks you to work on figuring out how the use of water cannons helps stop rioters rioting
  • In all cases, there is no consent given by subjects.

  • In 2 and 3, the treatment is possibly harmful for subjects, and the results might also be harmful. But even in case 1, there could be major unintended harmful consequences.

  • In cases 1 and 3, however, the “intervention” is within the sphere of normal activities for the implementer.

10.4.4 Constraint: Is it ethical to manipulate subjects for research purposes?

  • Sometimes it is possible to use this point of difference to make a “spheres of ethics” argument for “embedded experimentation.”

  • Spheres of Ethics Argument: Experimental research that involves manipulations that are not normally appropriate for researchers may nevertheless be ethical if:

    • Researchers and implementers agree on a division of responsibility where implementers take on responsibility for actions
    • Implementers have legitimacy to make these decisions within the sphere of the intervention
    • Implementers are indeed materially independent of researchers (no swapping hats)

10.4.5 Constraint: Is it ethical to manipulate subjects for research purposes?

  • Difficulty with this argument:
    • Question begging: How to determine the legitimacy of the implementer? (Can we rule out Nazi doctors?)

Otherwise keep focus on consent and desist if this is not possible

10.4.6 APSA Guidelines

  • Available here
  • Used e.g. by APSR
  • Below is lightly abbreviated; full text however has detailed guidelines

10.4.7 APSA Ethics: General [Abbr]

  1. Political science researchers should respect autonomy, consider the wellbeing of participants and other people affected by their research, and be open about the ethical issues they face.

  2. Political science researchers have an individual responsibility to consider the ethics of their research related activities and cannot outsource ethical reflection to review boards, other institutional bodies, or regulatory agencies.

  3. These principles describe the standards of conduct and reflexive openness that are expected of political science researchers. … [In cases of reasonable deviations], researchers should acknowledge and justify deviations in scholarly publications and presentations of their work.

10.4.8 APSA Ethics: Power

  1. When designing and conducting research, political scientists should be aware of power differentials between researcher and researched, and the ways in which such power differentials can affect the voluntariness of consent and the evaluation of risk and benefit.
  1. especially with low-power or vulnerable participants
  2. covert or deceptive research with more than minimal harm may sometimes be ethically permissible in research with powerful parties

10.4.10 APSA Ethics: Deception

  1. Political science researchers should carefully consider any use of deception and the ways in which deception can conflict with participant autonomy.
  1. ask: is it plausible that engaged individuals would withhold consent if fully informed consent were sought
  2. disclose, justify,, and describe steps taken to respect participant autonomy.

[Note: no general injunction against]

10.4.11 APSA Ethics: Harm and Trauma

  1. Political science researchers should consider the harms associated with their research.
  • Researchers should generally avoid harm when possible, minimize harm when avoidance is not possible, and not conduct research when harm is excessive.

  • do not limit concern to physical and psychological risks to the participant.

  1. Political science researchers should anticipate and protect individual participants from trauma stemming from participation in research.

10.4.12 APSA Ethics: Confidentiality

  1. Political science researchers should generally keep the identities of research participants confidential; when circumstances require, researchers should adopt the higher standard of ensuring anonymity.
  1. Researchers should clearly communicate assurances of confidentiality / anonymity
  2. If confidentiality bit provided, communicate this and justify c./d. consider risks at all stages
  3. Researchers who determine that it would be unethical to share materials derived from human subjects should be prepared to justify their decision to journal editors, to reviewers, etc

10.4.13 APSA Ethics: Impact

  1. Political science researchers conducting studies on political processes should consider the broader social impacts of the research process as well as the impact on the experience of individuals directly engaged by the research. In general, political science researchers should not compromise the integrity of political processes for research purposes without the consent of individuals that are directly engaged by the research process.
  1. cases in which research that produces impacts on political processes without consent of individuals directly engaged by the research might be appropriate. [examples]

  2. Studies of interventions by third parties do not usually invoke this principle on impact. [details]

  3. This principle is not intended to discourage any form of political engagement by political scientists in their non-research activities or private lives.

  4. researchers should report likely impacts

10.4.14 APSA Ethics: Laws, Regulations, and Prospective Review

  1. Political science researchers should be aware of relevant laws and regulations governing their research related activities.

10.4.15 APSA Ethics: Shared Responsibility

  1. The responsibility to promote ethical research goes beyond the individual researcher or research team.
  1. Mentors, advisors, dissertation committee members, and instructors

  2. Graduate programs in political science should include ethics instruction in their formal and informal graduate curricula;

  3. Editors and reviewers should encourage researchers to be open about the ethical decisions …

  4. Journals, departments, and associations should incorporate ethical commitments into their mission, bylaws, instruction, practices, and procedures.

10.5 Survey experiments

  • Survey experiments are used to measure things: nothing (except answers) should be changed!
  • If the experiment in the survey is changing things then it is a field experiment in a survey, not a survey experiment

10.5.1 The list experiment: Motivation

  • Multiple survey experimental designs have been generated to make it easier for subjects to answer sensitive questions

  • The key idea is to use inference rather than measurement.

  • Subjects are placed in different conditions and the conditions affect the answers that are given in such a way that you can infer some underlying quantity of interest

10.5.2 The list experiment: Motivation

This is an obvious DAG but the main point is to be clear that the Value is the quantity of interest and the value is not affected by the treatment, Z.

10.5.3 The list experiment: Motivation

The list experiment supposes that:

  1. Subjects do not want to give a direct answer to a question
  2. They nevertheless are willing to truthfully answer an indirect question

In other words: sensitivities notwithstanding, they are happy for the researcher to make correct inferences about them or their group

10.5.4 The list experiment: Strategy

  • Respondents are given a short list and a long list.

  • The long list differs from the short list in having one extra item—the sensitive item

  • We ask how many items in each list does a respondent agree with:

    • \(Y_i(0)\) is the number of elements on a short list that a respondent agrees with
    • \(Y_i(1)\) is the number of elements on a long list that a respondent agrees with
    • \(Y_i(1) - Y_i(0)\) is an indicator for whether an individual agrees with the sensitive item
    • \(\mathbb{E}[Y_i(1) - Y_i(0)]\) is the share of people agreeing with sensitive item

10.5.5 The list experiment: Simplified example

How many of these do you agree with:

Short list Long list “Effect”
“2 + 2 = 4” “2 + 2 = 4”
“2 * 3 = 6” “2 * 3 = 6”
“3 + 6 = 8” “Climate change is real”
“3 + 6 = 8”
Answer Y(0) = 2 Y(1) = 4 Y(1) - Y(0) = 2

[Note: this is obviously not a good list. Why not?]

10.5.6 The list experiment: Design

declaration_17.3 <-
    N = 500,
    control_count = rbinom(N, size = 3, prob = 0.5),
    Y_star = rbinom(N, size = 1, prob = 0.3),
    potential_outcomes(Y_list ~ Y_star * Z + control_count) 
  ) +
  declare_inquiry(prevalence_rate = mean(Y_star)) +
  declare_assignment(Z = complete_ra(N)) + 
  declare_measurement(Y_list = reveal_outcomes(Y_list ~ Z)) +
  declare_estimator(Y_list ~ Z, .method = difference_in_means, 
                    inquiry = "prevalence_rate")

diagnosands <- declare_diagnosands(
  bias = mean(estimate - estimand),
  mean_CI_width = mean(conf.high - conf.low)

10.5.7 Diagnosis

diagnose_design(declaration_17.3, diagnosands = diagnosands)
Design Inquiry Bias Mean CI Width
declaration_17.3 prevalence_rate 0.00 0.32
(0.00) (0.00)

10.5.8 Tradeoffs: is the question really sensitive?

declaration_17.4 <- 
    N = N,
    U = rnorm(N),
    control_count = rbinom(N, size = 3, prob = 0.5),
    Y_star = rbinom(N, size = 1, prob = 0.3),
    W = case_when(Y_star == 0 ~ 0L,
                  Y_star == 1 ~ rbinom(N, size = 1, prob = proportion_hiding)),
    potential_outcomes(Y_list ~ Y_star * Z + control_count)
  ) +
  declare_inquiry(prevalence_rate = mean(Y_star)) +
  declare_assignment(Z = complete_ra(N)) + 
  declare_measurement(Y_list = reveal_outcomes(Y_list ~ Z),
                      Y_direct = Y_star - W) +
  declare_estimator(Y_list ~ Z, inquiry = "prevalence_rate", label = "list") + 
  declare_estimator(Y_direct ~ 1, inquiry = "prevalence_rate", label = "direct")

10.5.9 Diagnosis

declaration_17.4 |> 
  redesign(proportion_hiding = seq(from = 0, to = 0.3, by = 0.1), 
           N = seq(from = 500, to = 2500, by = 500)) |> 

10.5.10 Negatively correlated items

  • How would estimates be affected if the items selected for the list were negatively correlated?
  • How would subject protection be affected?

10.5.11 Negatively correlated items

rho <- -.8 

correlated_lists <- 
    N = 500,
    U = rnorm(N),
    control_1 = rbinom(N, size = 1, prob = 0.5),
    control_2 = correlate(given = control_1, rho = rho, draw_binary, prob = 0.5),
    control_count = control_1 + control_2,
    Y_star = rbinom(N, size = 1, prob = 0.3),
    potential_outcomes(Y_list ~ Y_star * Z + control_count)
  ) +
  declare_inquiry(prevalence_rate = mean(Y_star)) +
  declare_assignment(Z = complete_ra(N)) + 
  declare_measurement(Y_list = reveal_outcomes(Y_list ~ Z)) +
  declare_estimator(Y_list ~ Z) 

10.5.12 Negatively correlated items

draw_data(correlated_lists) |> ggplot(aes(control_count)) + 
  geom_histogram() + theme_bw()

10.5.13 Negatively correlated items

correlated_lists |> redesign(rho = c(-.8, 0, .8)) |> diagnose_design()
  • These trade-off against each other: the more accuracy you have the less protection you have

10.5.14 Individual or group effects?

  • This is typically used to estimate average levels

  • However you can use it in the obvious way to get average levels for groups: this is equivalent to calculating group level heterogeneous effects

  • Extending the idea you can even get individual level estimates: for instance you might use causal forests

  • You can also use this to estimate the effect of an experimental treatment on an item that’s measured using a list, without requiring individual level estimates:

\[Y_i = \beta_0 + \beta_1Z_i + \beta_2Long_i + \beta_3Z_iLong_i\]

10.5.15 Hiders and liars

  • Note that here we looked at “hiders” – people not answering the direct question truthfully

  • See Li (2019) on bounds when the “no liars” assumption is threatened — this is about whether people respond truthfully to the list experimental question

10.6 Regression discontintuity

Errors and diagnostics

10.6.1 Intuition

  • The core idea in an RDD design is that if a decision rule assigns units that are almost identical to each other to treatment and control conditions then we can infer effects for those cases by looking at those cases.

See excellent introduction: Lee and Lemieux (2010)

10.6.2 Intuition

  • Kids born on 31 August start school a year younger than kids born on 1 September: does starting younger help or hurt?

  • Kids born on 12 September 1983 are more likely to register Republican than kids born on 10 September 1983: can this identify the effects of registration on long term voting?

  • A district in which Republicans got 50.1% of the vote get a Republican representative while districts in which Republicans got 49.9% of the vote do not: does having a Republican representative make a difference for these districts?

10.6.3 Argument for identification


  • Typically the decision is based on a value on a “running variable”, \(X\). e.g. Treatment if \(X > 0\)
  • The estimand is \(\mathbb{E}[Y(1) - Y(0)|X=0]\)

Two arguments:

  1. Continuity: \(\mathbb{E}[Y(1)|X=x]\) and \(\mathbb{E}[Y(0)|X=x]\) are continuous (at \(x=0\)) in \(x\): so \(\lim_{\hat x \rightarrow 0}\mathbb{E}[Y(0)|X=\hat x] = \mathbb{E}[Y(0)|X=\hat 0]\)

  2. Local randomization: tiny things that determine exact values of \(x\) are as if random and so we can think of a local experiment around \(X=0\).

10.6.4 Argument for identification


  • continuity argument requires continuous \(x\): granularity
  • also builds off a conditional expectation function defined at \(X=0\)

Exclusion restriction is implicit in continuity: If something else happens at the threshold then the conditional expectation functions jump at the thresholds

Implicit: \(X\) is exogenous in the sense that units cannot adjust \(X\) in order to be on one or the other side of the threshold

10.6.5 Evidence

Typically researchers show:

  1. “First stage” results: assignment to treatment does indeed jump at the threshold
  2. “ITT”: outcomes jump at the threshold
  3. LATE (if fuzzy / imperfect compliance) using IV

10.6.6 Evidence

Typically researchers show:

In addition:

  1. Arguments for no other treatments at the threshold
  2. Arguments for no “sorting” at the threshold
  3. Evidence for no “heaping” at the threshold (McCrary density test)


  • argue for why estimates extend beyond the threshold
  • exclude points at the threshold (!)

10.6.7 Design

library(rdss) # for helper functions
Error in library(rdrobust): there is no package called 'rdrobust'
cutoff <- 0.5
bandwidth <- 0.5

control <- function(X) {
  as.vector(poly(X, 4, raw = TRUE) %*% c(.7, -.8, .5, 1))}
treatment <- function(X) {
  as.vector(poly(X, 4, raw = TRUE) %*% c(0, -1.5, .5, .8)) + .25}

rdd_design <-
    N = 1000,
    U = rnorm(N, 0, 0.1),
    X = runif(N, 0, 1) + U - cutoff,
    D = 1 * (X > 0),
    Y_D_0 = control(X) + U,
    Y_D_1 = treatment(X) + U
  ) +
  declare_inquiry(LATE = treatment(0) - control(0)) +
  declare_measurement(Y = reveal_outcomes(Y ~ D)) + 
  declare_sampling(S = X > -bandwidth & X < bandwidth) +
  declare_estimator(Y ~ D*X, term = "D", label = "lm") + 
    Y, X, 
    term = "Bias-Corrected",
    .method = rdrobust_helper,
    label = "optimal"

10.6.8 RDD Data plotted

Note rdrobust implements:

  • local polynomial Regression Discontinuity (RD) point estimators
  • robust bias-corrected confidence intervals

See Calonico, Cattaneo, and Titiunik (2014) and related papers ? rdrobust::rdrobust

10.6.9 RDD Data plotted

rdd_design  |> draw_data() |> ggplot(aes(X, Y, color = factor(D))) + 
  geom_point(alpha = .3) + theme_bw() +
  geom_smooth(aes(X, Y_D_0)) + geom_smooth(aes(X, Y_D_1)) + theme(legend.position = "none")

10.6.10 RDD diagnosis

rdd_design |> diagnose_design()
Estimator Mean Estimate Bias SD Estimate Coverage
lm 0.23 -0.02 0.01 0.64
(0.00) (0.00) (0.00) (0.02)
optimal 0.25 0.00 0.03 0.89
(0.00) (0.00) (0.00) (0.01)

10.6.11 Bandwidth tradeoff

rdd_design |> 
  redesign(bandwidth = seq(from = 0.05, to = 0.5, by = 0.05)) |> 
  • As we increase the bandwidth, the lm bias gets worse, but slowly, while the error falls.
  • The best bandwidth is relatively wide.
  • This is more true for the optimal estimator.

10.6.12 Geographic RDs

Are popular in political science:

  • Put a lot of pressure on assumption of no alternative treatment—including “random” country level shocks!
  • Put a lot of pressure on no sorting assumptions (why was the border put where it was; why did units settle here or there?)
  • Put a lot of pressure on SUTVA: people on one side are literally proximate to people on another

See Keele and Titiunik (2015)

11 Topics 3

11.1 Mediation

11.1.1 The problem of unidentified mediators

  • Consider a causal system like the below.
  • The effect of X on M1 and M2 can be measured in the usual way.
  • But unfortunately, if there are multiple mediators, the effect of M1 (or M2) on Y is not identified.
  • The ‘exclusion restriction’ is obviously violated when there are multiple mediators (unless you can account for them all).

11.1.2 The problem of unidentified mediators

Which effects are identified by the random assignment of \(X\)?

11.1.3 The problem of unidentified mediators

An obvious approach is to first examine the (average) effect of X on M1 and then use another manipulation to examine the (average) effect of M1 on Y.

  • But both of these average effects may be positive (for example) even if there is no effect of X on Y through M1.

11.1.4 The problem of unidentified mediators

An obvious approach is to first examine the (average) effect of X on M1 and then use another manipulation to examine the (average) effect of M1 on Y.

  • Similarly both of these average effects may be zero even if X affects on Y through M1 for every unit!

11.1.5 The problem of unidentified mediators

Both instances of unobserved confounding between \(M\) and \(Y\):

11.1.6 The problem of unidentified mediators

Both instances of unobserved confounding between \(M\) and \(Y\):

11.1.7 The problem of unidentified mediators

  • Another somewhat obvious approach is to see how the effect of \(X\) on \(Y\) in a regression is reduced when you control for \(M\).

  • If the effect of \(X\) on \(Y\) passes through \(M\) then surely there should be no effect of \(X\) on \(Y\) after you control for \(M\).

  • This common strategy associated with Baron and Kenny (1986) is also not guaranteed to produce reliable results. See for instance Green, Ha, and Bullock (2010)

11.1.8 Baron Kenny issues

df <- fabricate(N = 1000, 
                U = rbinom(N, 1, .5),     X = rbinom(N, 1, .5),
                M = ifelse(U==1, X, 1-X), Y = ifelse(U==1, M, 1-M)) 
list(lm(Y ~ X, data = df), 
     lm(Y ~ X + M, data = df)) |> texreg::htmlreg() 
Statistical models
  Model 1 Model 2
(Intercept) 0.00*** 0.00***
  (0.00) (0.00)
X 1.00*** 1.00***
  (0.00) (0.00)
M   0.00
R2 1.00 1.00
Adj. R2 1.00 1.00
Num. obs. 1000 1000
***p < 0.001; **p < 0.01; *p < 0.05

11.1.9 The problem of unidentified mediators

  • See Imai on better ways to think about this problem and designs to address it.

11.1.10 The problem of unidentified mediators: Quantities

  • Using potential outcomeswe can describe a mediation effect as (see Imai et al): \[\delta_i(t) = Y_i(t, M_i(1)) - Y_i(t, M_i(0)) \textbf{ for } t = 0,1\]
  • The direct effect is: \[\psi_i(t) = Y_i(1, M_i(t)) - Y_i(0, M_i(t)) \textbf{ for } t = 0,1\]
  • This is a decomposition, since: \[Y_i(1, M_i(1)) - Y_1(0, M_i(0)) = \frac{1}{2}(\delta_i(1) + \delta_i(0) + \psi_i(1) + \psi_i(0)) \]
  • If there are no interaction effects—ie \(\delta_i(1) = \delta_i(0), \psi_i(1) = \psi_i(0)\), then \[Y_i(1, M_i(1)) - Y_1(0, M_i(0)) = \delta_i + \psi_i\]

11.1.11 The problem of unidentified mediators: Solutions?

The bad news is that although a single experiment might identify the total effect, it can not identify these elements of the direct effect.


  • Check formal requirement for identification under single experiment design (“sequential ignorability”—that, conditional on actual treatment, it is as if the value of the mediation variable is randomly assigned relative to potential outcomes). But this is strong (and in fact unverifiable) and if it does not hold, bounds on effects always include zero (Imai et al)

  • Consider sensitivity analyses

11.1.12 Implicit mediation

You can use interactions with covariates if you are willing to make assumptions on no heterogeneity of direct treatment effects over covariates.

eg you think that money makes people get to work faster because they can buy better cars; you look at the marginal effect of more money on time to work for people with and without cars and find it higher for the latter.

This might imply mediation through transport but only if there is no direct effect heterogeneity (eg people with cars are less motivated by money).

11.1.13 The problem of unidentified mediators: Solutions?

Weaker assumptions justify parallel design

  • Group A: \(T\) is randomly assigned, \(M\) left free.
  • Group B: divided into four groups \(T\times M\) (requires two more assumptions (1) that the manipulation of the mediator only affects outcomes through the mediator (2) no interaction, for each unit, \(Y(1,m)-Y(0,m) = Y(1,m')-Y(0,m')\).)

Takeaway: Understanding mechanisms is harder than you think. Figure out what assumptions fly.

11.1.14 In CausalQueries

Lets imagine that sequential ignorability does not hold. What are our posteriors on mediation quantities when in fact all effects are mediated, effects are strong, and we have lots of data?

model <- make_model("X -> M ->Y <- X; M <-> Y")


11.1.15 In CausalQueries

We imagine a true model and consider estimands:

truth <- make_model("X -> M ->Y") |> 
  set_parameters(c(.5, .5, .1, 0, .8, .1, .1, 0, .8, .1))

queries  <- 
      indirect = "Y[X = 1, M = M[X=1]] - Y[X = 1, M = M[X=0]]",
      direct = "Y[X = 1, M = M[X=0]] - Y[X = 0, M = M[X=0]]"

truth |> query_model(queries) |> kable()
label query given using case_level mean sd cred.low cred.high
indirect Y[X = 1, M = M[X=1]] - Y[X = 1, M = M[X=0]] - parameters FALSE 0.64 NA 0.64 0.64
direct Y[X = 1, M = M[X=0]] - Y[X = 0, M = M[X=0]] - parameters FALSE 0.00 NA 0.00 0.00

11.1.16 In CausalQueries

model |> update_model(data = truth |> make_data(n = 1000)) |>
  query_distribution(queries = queries, using = "posteriors") 
Error in if (parent_nodes == "") {: argument is of length zero

Why such poor behavior? Why isn’t weight going onto indirect effects?

Turns out the data is consistent with direct effects only: specifically that whenever \(M\) is responsive to \(X\), \(Y\) is responsive to \(X\).

11.1.17 In CausalQueries

Error in if (parent_nodes == "") {: argument is of length zero

11.2 Spillovers

11.2.1 SUTVA violations (Spillovers)

Spillovers can result in the estimation of weaker effects when effects are actually stronger.

The key problem is that \(Y(1)\) and \(Y(0)\) are not sufficient to describe potential outcomes

11.2.2 SUTVA violations

Unit Location \(D_\emptyset\) \(y(D_\emptyset)\) \(D_1\) \(y(D_1)\) \(D_2\) \(y(D_2)\) \(D_3\) \(y(D_3)\) \(D_4\) \(y(D_4)\)
A 1 0 0 1 3 0 1 0 0 0 0
B 2 0 0 0 3 1 3 0 3 0 0
C 3 0 0 0 0 0 3 1 3 0 3
D 4 0 0 0 0 0 0 0 1 1 3

Table: Potential outcomes for four units for different treatment profiles. \(D_i\) is an allocation and \(y_j(D_i)\) is the potential outcome for (row) unit \(j\) given (column) \(D_i\).

  • The key is to think through the structure of spillovers.
  • Here immediate neighbors are exposed
  • In this case we can define a direct treatment (being exposed) and an indirect treatment (having a neighbor exposed) and we can work out the propensity for each unit of receiving each type of treatment
  • These may be non uniform (here central types are more likely to have teated neighbors); but we can still use the randomization to assess effects

11.2.3 SUTVA violations

0 1 2 3 4
Unit Location \(D_\emptyset\) \(y(D_\emptyset)\) \(D_1\) \(y(D_1)\) \(D_2\) \(y(D_2)\) \(D_3\) \(y(D_3)\) \(D_4\) \(y(D_4)\)
A 1 0 0 1 3 0 1 0 0 0 0
B 2 0 0 0 3 1 3 0 3 0 0
C 3 0 0 0 0 0 3 1 3 0 3
D 4 0 0 0 0 0 0 0 1 1 3
\(\bar{y}_\text{treated}\) - 3 3 3
\(\bar{y}_\text{untreated}\) 0 1 4/3 4/3
\(\bar{y}_\text{neighbors}\) - 3 2 2
\(\bar{y}_\text{pure control}\) 0 0 0 0
ATT-direct - 3 3 3
ATT-indirect - 3 2 2

11.2.4 Design

dgp <- function(i, Z, G) Z[i]/3 + sum(Z[G == G[i]])^2/5 + rnorm(1)

spillover_design <- 

  declare_model(G = add_level(N = 80), 
                     j = add_level(N = 3, zeros = 0, ones = 1)) +
  declare_inquiry(direct = mean(sapply(1:240,  # just i treated v no one treated 
    function(i) { Z_i <- (1:240) == i
                  dgp(i, Z_i, G) - dgp(i, zeros, G)}))) +
  declare_inquiry(indirect = mean(sapply(1:240, 
    function(i) { Z_i <- (1:240) == i           # all but i treated v no one treated   
                  dgp(i, ones - Z_i, G) - dgp(i, zeros, G)}))) +
  declare_assignment(Z = complete_ra(N)) + 
    neighbors_treated = sapply(1:N, function(i) sum(Z[-i][G[-i] == G[i]])),
    one_neighbor  = as.numeric(neighbors_treated == 1),
    two_neighbors = as.numeric(neighbors_treated == 2),
    Y = sapply(1:N, function(i) dgp(i, Z, G))
  ) +
  declare_estimator(Y ~ Z, 
                    inquiry = "direct", 
                    model = lm_robust, 
                    label = "naive") +
  declare_estimator(Y ~ Z * one_neighbor + Z * two_neighbors,
                    term = c("Z", "two_neighbors"),
                    inquiry = c("direct", "indirect"), 
                    label = "saturated", 
                    model = lm_robust)

11.2.5 Spillovers: direct and indirect treatments

11.2.6 Spillovers: Simulated estimates

11.2.7 Spillovers: Opportunities and Warnings

You can in principle:

  • debias estimates
  • learn about interesting processes
  • optimize design parameters

But to estimate effects you still need some SUTVA like assumption.

11.2.8 Spillovers: Opportunities and Warnings

In this example if one compared the outcome between treated units and all control units that are at least \(n\) positions away from a treated unit you will get the wrong answer unless \(n \geq 7\).

11.3 Transparency & Experimentation

11.3.1 Transparent workflows

Experimental researchers are deeply engaged in the movement towards more transparency social science research.

  • Analytic replication. This should be a no brainer. Set everything up so that replication is easy. Use quarto rmarkdown, or similar. Or produce your replication code as a package.

11.3.2 Contentious Issues

Experimental researchers are deeply engaged in the movement towards more transparency social science research.

Contentious issues (mostly):

  • Data. How soon should you make your data available? My view: as soon as possibe. Along with working papers and before publication. Before it affects policy in any case. Own the ideas not the data.

    • Hard core: no citation without (analytic) replication. Perhaps. Non-replicable results should not be influencing policy.
  • Where should you make your data available? Dataverse is focal for political science. Not personal website (mea culpa)

  • What data should you make available? Disagreement is over how raw your data should be. My view: as raw as you can but at least post cleaning and pre-manipulation.

11.3.3 Open science checklist

Experimental researchers are deeply engaged in the movement towards more transparency social science research.

  • Should you register?: Hard to find reasons against. But case strongest in testing phase rather than exploratory phase.

  • Registration: When should you register? My view: Before treatment assignment. (Not just before analysis, mea culpa)

  • Registration: Should you deviate from a preanalysis plan if you change your mind about optimal estimation strategies. My view: Yes, but make the case and describe both sets of results.

11.3.4 Two distinct rationales for registration

  • File drawer bias (Publication bias)

  • Analysis bias (Fishing)

11.3.5 File drawer bias

– Say in truth \(X\) affects \(Y\) in 50% of cases.

– Researchers conduct multiple excellent studies. But they only write up the 50% that produce “positive” results.

– Even if each individual study is indisputably correct, the account in the research record – that X affects Y in 100% of cases – will be wrong.

11.3.6 File drawer bias

– Say in truth \(X\) affects \(Y\) in 50% of cases.

– Researchers conduct multiple excellent studies. But they only write up the 50% that produce “positive” results.

– Even if each individual study is indisputably correct, the account in the research record – that X affects Y in 100% of cases – will be wrong.

11.3.7 File drawer bias

Exacerbated by:

– Publication bias – the positive results get published

– Citation bias – the positive results get read and cited

– Chatter bias – the positive results gets blogged, tweeted and TEDed.

11.3.8 Analysis bias (Fishing)

– Say in truth \(X\) affects \(Y\) in 50% of cases.

– But say that researchers enjoy discretion to select measures for \(X\) or \(Y\), or enjoy discretion to select statistical models after seeing \(X\) and \(Y\) in each case.

– Then, with enough discretion, 100% of analyses may report positive effects, even if all studies get published.

11.3.9 Analysis bias (Fishing)

– Say in truth \(X\) affects \(Y\) in 50% of cases.

– But say that researchers enjoy discretion to select measures for \(X\) or \(Y\), or enjoy discretion to select statistical models after seeing \(X\) and \(Y\) in each case.

– Then, with enough discretion, 100% of analyses may report positive effects, even if all studies get published.

11.3.10 Analysis bias (Fishing)

– Try the exact fishy test An Exact Fishy Test (https://macartan.shinyapps.io/fish/)

– What’s the problem with this test?

11.3.11 Evidence-Proofing: Illustration

  • When your conclusions do not really depend on the data

  • Eg – some evidence will always support your proposition – some interpretation of evidence will always support your proposition

  • Knowing the mapping from data to inference in advance gives a handle on the false positive rate.

11.3.12 The scope for fishing

11.3.13 Evidence from political science

Source: Gerber and Malhotra

11.3.14 More evidence from TESS

  • Malhotra tracked 221 TESS studies.
  • 20% of the null studies were published. 65% not even written up (file drawer or anticipation of publication bias)
  • 60% of studies with strong results were published.

Implications are:

  • population of results not representative
  • (subtler) individual published studies are also more likely to be overestimates

11.3.15 The problem

  • Summary: we do not know when we can or cannot trust claims made by researchers.

  • [Not a tradition specific claim]

11.3.16 Registration as a possible solution

Simple idea:

  • It’s about communication:
  • just say what you are planning on doing before you do it
  • if you don’t have a plan, say that
  • If you do things differently from what you were planning to do, say that

11.3.17 Worries and Myths

Lots of misunderstandings around registration

11.3.18 Myth: Concerns about fishing presuppose researcher dishonesty

  • Fishing can happen in very subtle ways, and may seem natural and justifiable.

  • Example:

    • I am interested in whether more democratic institutions result in better educational outcomes.
    • I examine the relationship between institutions and literacy and between institutions and school attendance.
    • The attendance measure is significant and the literacy one is not. Puzzled, I look more carefully at the literacy measure and see various outliers and indications of measurement error. As I think more I realize too that literacy is a slow moving variable and may not be the best measure anyhow. I move forward and start to analyze the attendance measure only, perhaps conducting new tests, albeit with the same data.

11.3.19 Structural challenge

Our journal review process is largely organized around advising researchers how to adjust analysis in light of findings in the data.

11.3.20 Myth: Fishing is technique specific

  • Frequentists can do it

  • Bayesians can do it too.

  • Qualitative researchers can also do it.

  • You can even do it with descriptive statistics

11.3.21 Myth: Fishing is estimand specific

  • You can do it when estimating causal effects
  • You can do it when studying mechanisms
  • You can do it when estimating counts

11.3.22 Myth: Registration only makes sense for experimental studies, not for observational studies

  • The key distinction is between prospective and retrospective studies.

  • Not between experimental and observational studies.

  • A reason (from the medical literature) why registration is especially important for experiments: because you owe it to subjects

  • A reason why registration is less important for experiments: because it is more likely that the intended analysis is implied by the design in an experimental study. Researcher degrees of freedom may be greatest for observational qualitative analyses.

11.3.23 Worry: Registration will create administrative burdens for researchers, reviewers, and journals

  • Registration will produce some burden but does not require the creation of content that is not needed anyway

  • It does shift preparation of analyses forward

  • And it also can increase the burden of developing analyses plans even for projects that don’t work. But that is in part, the point.

  • Upside is that ultimate analyses may be much easier.

11.3.24 Worry: Registration will force people to implement analyses that they know are wrong

  • Most arguments for registration in social science advocate for non-binding registration, where deviations from designs are possible, though they should be described.
  • Even if it does not prevent them, a merit of registration is that it makes deviations visible.

11.3.25 Myth: Replication (or other transparency practices) obviates the need for registration

  • There are lots of good things to do, including replication.
  • Many of these do not substitute for each other. (How to interpret a fished replication of a fished analysis?)
  • And they may likely act as complements
  • Registration can clarify details of design and analysis and ensure early preparation of material. Indeed material needed for replication may be available even before data collection

11.3.26 Worry: Registration will put researchers at risk of scooping

  • But existing registries allow people to protect registered designs for some period
  • Registration may let researchers lay claim to a design

11.3.27 Worry: Registration will kill creativity

  • This is an empirical question. However, under a nonmandatory system researchers could:
  • Register a plan for structured exploratory analysis
  • Decide that exploration is at a sufficiently early stage that no substantive registration is possible and proceed without registration.

11.3.28 Implications:

  • In neither case would the creation of a registration facility prevent exploration.

  • What it might do is make it less credible for someone to claim that they have tested a proposition when in fact the proposition was developed using the data used to test it.

  • Registration communicates when researchers are angage in exploration or not. We love exploration and should be proud of it.

11.3.29 Punchline

  • Do it!
  • But if you have reasons to deviate, deviate transparently
  • Don’t implement bad analysis just because you pre-registered
  • Instead: reconcile

11.3.30 Reconciliation

Incentives and strategies

11.3.31 Reconciliation

Table 6: Illustration of an inquiry reconciliation table.
Inquiry In the preanalysis plan In the paper In the appendix
Gender effect X X
Age effect X

11.3.32 Reconciliation

Table 7: Illustration of an answer strategy reconciliation table.
Inquiry Following A from the PAP Following A from the paper Notes
Gender effect estimate = 0.6, s.e = 0.31 estimate = 0.6, s.e = 0.25 Difference due to change in control variables [provide cross references to tables and code]

12 References

12.0.1 References

Abadie, Alberto, and Javier Gardeazabal. 2003. “The Economic Costs of Conflict: A Case Study of the Basque Country.” American Economic Review 93 (1): 113–32.
Baron, Reuben M, and David A Kenny. 1986. “The Moderator–Mediator Variable Distinction in Social Psychological Research: Conceptual, Strategic, and Statistical Considerations.” Journal of Personality and Social Psychology 51 (6): 1173.
Borusyak, Kirill, Xavier Jaravel, and Jann Spiess. 2021. “Revisiting Event Study Designs: Robust and Efficient Estimation.” arXiv Preprint arXiv:2108.12419.
Calonico, Sebastian, Matias D Cattaneo, and Rocio Titiunik. 2014. “Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs.” Econometrica 82 (6): 2295–2326.
Cinelli, Carlos, Andrew Forney, and Judea Pearl. 2022. “A Crash Course in Good and Bad Controls.” Sociological Methods & Research, 00491241221099552.
De Chaisemartin, Clément, and Xavier d’Haultfoeuille. 2020. “Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects.” American Economic Review 110 (9): 2964–96.
Deaton, Angus, and Nancy Cartwright. 2018. “Understanding and Misunderstanding Randomized Controlled Trials.” Social Science & Medicine 210: 2–21.
Ding, Peng, and Fan Li. 2019. “A Bracketing Relationship Between Difference-in-Differences and Lagged-Dependent-Variable Adjustment.” Political Analysis 27 (4): 605–15.
Freedman, David A. 2008. “On Regression Adjustments to Experimental Data.” Advances in Applied Mathematics 40 (2): 180–93.
Gardner, Martin. 1961. The Second Scientific American Book of Mathematical Puzzles and Diversions. Simon; Schuster New York.
Gerber, Alan S, and Donald P Green. 2012. Field Experiments: Design, Analysis, and Interpretation. Norton.
Goodman-Bacon, Andrew. 2021. “Difference-in-Differences with Variation in Treatment Timing.” Journal of Econometrics 225 (2): 254–77.
Green, Donald P, Shang E Ha, and John G Bullock. 2010. “Enough Already about ‘Black Box’ Experiments: Studying Mediation Is More Difficult Than Most Scholars Suppose.” The Annals of the American Academy of Political and Social Science 628 (1): 200–208.
Hall, Ned. 2004. “Two Concepts of Causation.” Causation and Counterfactuals, 225–76.
Halpern, Joseph Y. 2016. Actual Causality. MIT Press.
Ho, Daniel, Kosuke Imai, Gary King, and Elizabeth A Stuart. 2011. “MatchIt: Nonparametric Preprocessing for Parametric Causal Inference.” Journal of Statistical Software 42: 1–28.
Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal of the American Statistical Association 81 (396): 945–60.
Imai, Kosuke, and In Song Kim. 2021. “On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data.” Political Analysis 29 (3): 405–15.
Imbens, Guido W, and Donald B Rubin. 2015. Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press.
Keele, Luke J, and Rocio Titiunik. 2015. “Geographic Boundaries as Regression Discontinuities.” Political Analysis 23 (1): 127–55.
Lee, David S, and Thomas Lemieux. 2010. “Regression Discontinuity Designs in Economics.” Journal of Economic Literature 48 (2): 281–355.
Li, Yimeng. 2019. “Relaxing the No Liars Assumption in List Experiment Analyses.” Political Analysis 27 (4): 540–55.
Lin, Winston. 2012. “Agnostic Notes on Regression Adjustments to Experimental Data: Reexamining Freedman’s Critique.” arXiv Preprint arXiv:1208.2301.
Muralidharan, Karthik, Mauricio Romero, and Kaspar Wüthrich. 2023. “Factorial Designs, Model Selection, and (Incorrect) Inference in Randomized Experiments.” Review of Economics and Statistics, 1–44.
Olden, Andreas, and Jarle Møen. 2022. “The Triple Difference Estimator.” The Econometrics Journal 25 (3): 531–53.
Pearl, Judea. 2009. Causality. Cambridge university press.
Pearl, Judea, and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect. Basic books.
Pearl, Judea, and Azaria Paz. 1985. Graphoids: A Graph-Based Logic for Reasoning about Relevance Relations. University of California (Los Angeles). Computer Science Department.
Robins, James M, Miguel Angel Hernan, and Babette Brumback. 2000. “Marginal Structural Models and Causal Inference in Epidemiology.” LWW.
Robins, James M, Andrea Rotnitzky, and Lue Ping Zhao. 1994. “Estimation of Regression Coefficients When Some Regressors Are Not Always Observed.” Journal of the American Statistical Association 89 (427): 846–66.
Roth, Jonathan, Pedro HC Sant’Anna, Alyssa Bilinski, and John Poe. 2023. “What’s Trending in Difference-in-Differences? A Synthesis of the Recent Econometrics Literature.” Journal of Econometrics.
Samii, Cyrus, and P M Aronow. 2012. “On Equivalencies Between Design-Based and Regression-Based Variance Estimators for Randomized Experiments.” Statistics & Probability Letters 82 (2): 365–70.
Shpitser, Ilya, Tyler VanderWeele, and James M Robins. 2012. “On the Validity of Covariate Adjustment for Estimating Causal Effects.” arXiv Preprint arXiv:1203.3515.
Stuart, Elizabeth A, and Kerry M Green. 2008. “Using Full Matching to Estimate Causal Effects in Nonexperimental Studies: Examining the Relationship Between Adolescent Marijuana Use and Adult Outcomes.” Developmental Psychology 44 (2): 395.
Textor, Johannes, Benito van der Zander, Mark S Gilthorpe, Maciej Liśkiewicz, and George TH Ellison. 2016. “Robust Causal Inference Using Directed Acyclic Graphs: The r Package ’Dagitty’.” International Journal of Epidemiology 45 (6): 1887–94. https://doi.org/10.1093/ije/dyw341.