Key idea: you (often) can’t fully evaluate a piece of work until you have looked at the data and done the analysis yourself.
is here: https://macartan.github.io/teaching/ds-hu-2023
see also the replication repo: https://macartan.github.io/ds_hu_2003_reps/
Reading | Data |
---|---|
1.1 Daron Acemoglu, Simon Johnson, and James A. Robinson. The Colonial Origins of Comparative Development: An Empirical Investigation. AER (2001) | Data |
1.2 James Fearon and David D. Laitin. Ethnicity, insurgency, and civil war. APSR (2003). | Data |
1.3 Nathan Nunn. The long term effects of Africa’s slave trade QJE (2008) | 1, 2 |
1.4 Daron Acemoglu; Simon Johnson; James A. Robinson; Pierre Yared, Income and Democracy AER (2008) | Data |
Reading | Data |
---|---|
2.1 Alberto Alesina, Paola Giuliano, and Nathan Nunn. On the Origins of Gender Roles: Women and the Plough QJE (2013). | Data |
2.2 Raghabendra Chattopadhyay, Esther Duflo Women as Policy Makers: Evidence from a Randomized Policy Experiment in India Econometrica (2004) | Data |
2.3 Salma Mousa Building Social Cohesion Between Christians and Muslims Science (2020) | Data |
2.4 Saad Gulzer, Nicholas Haas and Benjamin Pasquale Does Political Affirmative Action Work, and for Whom? Theory and Evidence on India’s Scheduled Areas APSR 2020. | Data |
Reading | Data |
---|---|
3.1 Guy Grossman, Kristin G. Michelitch, and Carlo Prato. The Effect of Sustained Transparency on Electoral Accountability AJPS (2023) | Data |
3.2 Claudio Ferraz and Frederico Finan Electoral Accountability and Corruption: Evidence from the Audits of Local Governments AER (2011) | Data |
3.3 Pia J Raffler Does political oversight of the bureaucracy increase accountability? Field experimental evidence from a dominant party regime APSR (2022) | Data |
3.4 Thomas Fujiwara and Leonard Wantchekon Can Informed Public Deliberation Overcome Clientelism? Experimental Evidence from Benin AEJ (2013) | Data |
Reading | Data |
---|---|
4.1 Nathan Nunn and Nancy Qian U.S. Food Aid and Civil Conflict AER (2014) | Data |
4.2 Robert Blair, Di Salvatore, Jessica; Smidt, Hannah, UN Peacekeeping and Democratization in Conflict-Affected Countries APSR (2023). | Data |
4.3 Christopher Blattman; Annan, Jeannie, 2015, Can Employment Reduce Lawlessness and Rebellion? A Field Experiment with High-Risk Men in a Fragile State | data |
4.4 Karthik Muralidharan, Paul Niehaus, and Sandip Sukhtankar. Building State Capacity: Evidence from Biometric Smartcards in India. https://doi.org/10.1257/aer.20141346. AER (2016) | Data |
.html
via .qmd
or .Rmd
Some examples:
pacman
Nothing local, everything relative: so please do not include hardcoded paths to your computer
First best: if someone has access to your .Rmd
/.qmd
file they can hit render or compile and the whole thing reproduces first time.
But: often you need ancillary files for data and code. That’s OK but aims should still be that with a self contained folder someone can open a master.Rmd
file, hit compile and get everything. I usually have an input
and an output
subfolder.
Resources and ideas from the institute for replication https://i4replication.org/reproducibility.html
Longer term goal: replication package to make it easier to access and share replications like these
in
) and is never edited directlySample TOC for presentations:
Don’t skip the big picture.
See: How to critique: https://macartan.github.io/teaching/how-to-critique
Biggest message: be probing but be sympathetic
…and then use the DAG to describe new analysis, e.g. questions regarding:
If you fail to replicate:
Two distinct overarching goals:
e.g.
Home ground dominance. Holding the original M constant (i.e., the home ground of the original study), if you can show that a new answer strategy A’ yields better diagnosands than the original A, then A’ can be justified by home ground dominance.
Robustness to alternative models. A second justification for a change in answer strategy is that you can show that a new answer strategy is robust to both the original model M and a new, also plausible, M’.
Let’s review basic ideas:
Can someone walk us through the Solow model?
Production function.
\[Y = F(K,L)\] Per capita:
\[y = f(k)\] e.g.
\[y = Ak^\alpha\]
Savings are constant:
\[\text{savings} = s y_t\]
Depreciation constant share of capital: \[\text{depreciation}=\delta k\]
Law of motion of capital:
\[k_{t+1} = sk_t^\alpha - \delta k\]
Steady state has:
\[k_t = k_{t+1} = k_t^*\leftrightarrow sAk_t^{*\alpha} = \delta k^{*} \leftrightarrow k^* = \left(\frac{sA}{\delta}\right)^{\frac1{1-\alpha}}\]
Here is a complete, albeit barebones (and possibly incorrect), argument:
Say I and G are positively correlated. Does this mean that I causes G?
Say I and G are negatively correlated. Does this mean that I does not cause G?
How might you estimate the effect of I on G?
How does C help establish the link between I and G?
Where is the theory? Is in equivalent to the graph or is it something else that generates the graph?
How might you check if the proposed theory is correct?
Which of the counterarguments are strong and why?
Four arguments. For each one you should identify the:
In developing countries that discover natural resources, such as oil, the ruling elite can extract wealth without needing to tax citizens and develop the state apparatus. Because the state does not rely on taxation for government revenue, it does not need to set up accountability structures or extend its reach and citizens do not feel that they have ownership over the state. The state therefore becomes both less democratic and weaker than if it had not discovered the resources.
Rich countries are more likely to be democratic for the simple reason that when people become wealthier they refuse to be dictated to by others and they demand a role in government. The marginal effects of income increases are greater for poorer countries because the impacts on eduction are greatest at these levels. You can test this proposition by exploiting natural variation in commodity prices which provide shocks to national income, especially for countries dependent on primary commodity exports.
When countries increase trade (imports and exports), the returns to economic factors (such as labor, land and capital) are affected differently. Specifically, the returns to factors that are the most abundant are positive, while the returns to factors that are the most scarce are negative. Therefore, the relative factor endowments of a country will predict what sort of political coalitions will form (eg Land versus Labor + Capital) and which groups will favor free trade policies.
In democratic states, leaders are accountable for any losses incurred as a result of the wars that they enter into. Two states with democratic leaders are also more likely to share a common set of norms, and to engage in trade with one another. Therefore, two democracies are far less likely to enter into war with one another than a democracy and a non-democracy, or two non-democracies.
Q: Is my research design good?
A: Well let’s simulate it to see how it performs.
Q: What should I put in the simulation?
A: All elements of a research design.
Q: What are the elements of a research design?
A: M! I! D! A!
M
: DAGs, game theoretic modelsI
: ATEs, CATEs, COEs, modelsD
: Sampling schemes, assignment schemes, text analysis, interviewA
: Experiment, observational, quantitative, qualitative:
Declaration: Telling the computer what M, I, D, and A are.
Diagnosis: Estimating “diagnosands” like power, bias, rmse, error rates, ethical harm, amount learned.
Redesign : Fine-tuning features of the data and answer strategies to understand how they change the diagnosands
Different sample sizes
Different randomization procedures
Different estimation strategies
Implementation: effort into compliance versus more effort into sample size
declare_model()
declare_inquiry()
declare_assignment()
declare_measurement()
declare_inquiry
declare_estimator()
and there are more declare_
functions!
draw_data(design)
draw_estimands(design)
draw_estimates(design)
get_estimates(design, data)
run_design(design)
, simulate_design(design)
diagnose_design(design)
redesign(design, N = 200)
design |> redesign(N = c(200, 400)) |>
diagnose_designs()
compare_designs()
, compare_diagnoses()
https://raw.githubusercontent.com/rstudio/cheatsheets/master/declaredesign.pdf
N <- 100
b <- .5
design <-
declare_model(N = N, U = rnorm(N),
potential_outcomes(Y ~ b * Z + U)) +
declare_assignment(Z = simple_ra(N), Y = reveal_outcomes(Y ~ Z)) +
declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) +
declare_estimator(Y ~ Z, inquiry = "ate", .method = lm_robust)
You now have a two arm design object in memory!
If you just type design
it will run the design—a good check to make sure the design has been declared properly.
ID | U | Y_Z_0 | Y_Z_1 | Z | Y |
---|---|---|---|---|---|
001 | 0.8949488 | 0.8949488 | 1.3949488 | 1 | 1.3949488 |
002 | -0.0574970 | -0.0574970 | 0.4425030 | 1 | 0.4425030 |
003 | 0.9280977 | 0.9280977 | 1.4280977 | 0 | 0.9280977 |
004 | 0.3762109 | 0.3762109 | 0.8762109 | 1 | 0.8762109 |
005 | -0.7357462 | -0.7357462 | -0.2357462 | 0 | -0.7357462 |
006 | -0.7711031 | -0.7711031 | -0.2711031 | 1 | -0.2711031 |
inquiry | estimand |
---|---|
ate | 0.5 |
estimator | term | estimate | std.error | statistic | p.value | conf.low | conf.high | df | outcome | inquiry |
---|---|---|---|---|---|---|---|---|---|---|
estimator | Z | 0.8 | 0.19 | 4.22 | 0 | 0.43 | 1.18 | 98 | Y | ate |
estimator | term | estimate | std.error | statistic | p.value | conf.low | conf.high | df | outcome | inquiry |
---|---|---|---|---|---|---|---|---|---|---|
estimator | Z | 0.67 | 0.21 | 3.23 | 0 | 0.26 | 1.09 | 98 | Y | ate |
design | sim_ID | inquiry | estimand | estimator | term | estimate | std.error | statistic | p.value | conf.low | conf.high | df | outcome |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
design | 1 | ate | 0.5 | estimator | Z | 0.43 | 0.21 | 2.09 | 0.04 | 0.02 | 0.84 | 98 | Y |
design | 2 | ate | 0.5 | estimator | Z | 0.61 | 0.19 | 3.21 | 0.00 | 0.23 | 0.99 | 98 | Y |
design | 3 | ate | 0.5 | estimator | Z | 0.40 | 0.18 | 2.28 | 0.02 | 0.05 | 0.76 | 98 | Y |
Mean Estimate | Bias | SD Estimate | RMSE | Power | Coverage |
---|---|---|---|---|---|
0.50 | 0.00 | 0.21 | 0.21 | 0.68 | 0.94 |
(0.02) | (0.02) | (0.01) | (0.01) | (0.04) | (0.02) |
Error in eval(expr, envir, enclos): object 'run' not found
diagnosand | mean_1 | mean_2 | mean_difference | conf.low | conf.high |
---|---|---|---|---|---|
mean_estimand | 0.50 | 0.50 | 0.00 | 0.00 | 0.00 |
mean_estimate | 0.51 | 0.49 | -0.03 | -0.06 | 0.01 |
bias | 0.01 | -0.01 | -0.03 | -0.06 | 0.01 |
sd_estimate | 0.30 | 0.20 | -0.10 | -0.13 | -0.08 |
rmse | 0.30 | 0.20 | -0.10 | -0.13 | -0.08 |
power | 0.43 | 0.67 | 0.24 | 0.18 | 0.31 |
coverage | 0.94 | 0.95 | 0.01 | -0.01 | 0.04 |