Causal Inference and Experimentation

Introduction to the class

Macartan Humphreys

1 Getting started

  • General aims and structure
  • Expectations
  • Pointers for exercises
  • Quick DeclareDesign intro

1.1 Aims

  • Deep understanding of key ideas in causal inference
  • Transportable tools for understanding how to evaluate and improve design
  • Applied skills for design and analysis
  • Exposure to open science practices
  • Deeper dive into some specific topics (see survey)

1.2 Syllabus and resources

1.3 The topics

Day 1: Intro

Day 2: Causality

1.4 The topics

Day 3: Estimation and Inference

Day 4:

Day 5:

1.5 Expectations

  • 5 tasks
  • (Required) Work in four “exercise teams”: 1 team (and typically 2 exercises) per session \(\times 4\)
  • (Optional) Prepare a research design or short paper, perhaps building on existing work. Typically this contains:
    • a problem statement
    • a description of a method to address the problem
    • analytic or simulation based results describing properties of the solution
    • a discussion of implications for practice.

A passing paper will illustrate subtle features of a method; a good paper will identify unknown properties of a method; en excellent paper will develop a new method.

  • Plus general reading and participation.

1.6 Exercise team job

Teams should prepare 15 - 20 minute presentations on set puzzles. Typically the task is to:

  • Take a puzzle

  • Declare and diagnose a design that shows the issue under study (e.g. some estimator produces unbiased estimates under some condition)

  • Modify the design to show behavior when conditions are violated

  • Share a report with the class. Best in self-contained documents for easy third party viewing. e.g. .html via .qmd or .Rmd

  • Presentations should be about 10 minutes for a given puzzle.

1.7 Good coding rules

1.8 Good coding rules

  • Metadata first
  • Call packages at the beginning: use pacman
  • Put options at the top
  • Call all data files once, at the top. Best to call directly from a public archive, when possible.
  • Use functions and define them at the top: comment them; useful sometimes to illustrate what they do
  • Replicate first, re-analyze second. Use sections.
  • (For replications) Have subsections named after specific tables, figures or analyses

1.9 Aim

  • First best: If someone has access to your .Rmd/.qmd file they can hit render or compile and the whole thing reproduces first time. So: Nothing local, everything relative: so please do not include hardcoded paths to your computer

  • But: often you need ancillary files for data and code. That’s OK but aims should still be that with a self contained folder someone can open a main.Rmd file, hit compile and get everything. I usually have an input and an output subfolder.

1.10 Collaborative coding / writing

  • Do not get in the business of passing attachments around
  • Documents in some cloud: git, osf, Dropbox, Drive, Nextcloud
  • General rule: only post non sensitive, non proprietary material
  • Share self contained folders; folders contain a small set of live documents plus an archive. Old versions of documents are in archive. Only one version of the most recent document is in a main folder.
  • Data is self contained folder (in) and is never edited directly
  • Update to github frequently