Here are some pointers on the things to look for when discussing or reviewing a paper.

Discussanting

Generally discussants have 10 - 15 minutes to give comments on a paper, sometimes less. With that much time you can make 3 good comments. You should not use this time to say everything you liked or did not like about a paper and you should not get lost in the weeds. If you describe errors you have to get to the so what. The fact that there is an error is not in itself of interest. You should select your comments so that:

The really useful critiques often come from taking a really fresh perspective on a piece of work. This requires stepping back and not becoming beholden to the author’s spinning of their findings. Often useful to figure out what this is a case of? What is the general class of phenomena this speaks to? If you had lots of resources how would you address the question? If you could set it up as an experiment how would you do it? If you really had to take a policy action based on this work, which elements would give you pause? But as you take different perspectives you should try to speak the same language otherwise you can end up talking to yourself and influencing no one.

Remember as a discussant it is not about you, it is about making the paper better and helping people understand its strengths and limitations. Mostly it’s about the speaker. If you think the paper is great you do not have to drum up a critique, but you should still try to help people see why it is great. Having slides helps organize your presentation and helps people follow. A single slide with three bullets on the three big points is enough. If you have a laundry list of smaller points, share it with the speaker afterwards.

Reviewing

Your ostensible role as a reviewer is to advise an editor on whether to publish or not. To be useful your conclusions need to be reasoned and this requires going into some depth. In practice many see reviewing as a time to receive and provide constructive feedback. That’s my take too: if you have done the hard work of reading and assessing it is a relatively low cost and large benefit step to provide useful feedback. Doing so can also make you think more deeply about the work and improve your assessments.

For a formal review or referee report you have space to go into much more depth. A standard approach is to divide these reviews into three parts.

Bonus points:

The Checklist

Here is my list of what to look out for as I read a paper:

Theory

  • Is the theory internally consistent?
  • Is it consistent with past literature and findings?
  • Is it novel or surprising?
  • Are elements that are excluded or simplified plausibly unimportant for the outcomes?
  • Is the theory general or specific? Are there more general theories on which this theory could draw or contribute?

From Theory to Hypotheses

  • Is the theory really needed to generate the hypotheses?
  • Does the theory generate more hypotheses than considered?
  • Are the hypotheses really implied by the theory? Or are there ambiguities arising from say non-monotonicities or multiple equilibria?
  • Does the theory specify mechanisms?
  • Does the theory suggest heterogeneous effects?

Hypotheses

  • Are the hypotheses complex? (eg in fact 2 or 3 hypotheses bundled together)
  • Are the hypotheses falsifiable?

Evidence I: Design

  • External validity: is the population examined representative of the larger population of interest?
  • External validity: Are the conditions under which they are examined consistent with the conditions of interest?
  • Measure validity: Do the measures capture the objects specified by the theory?
  • Consistency: Is the empirical model used consistent with the theory?
  • Mechanisms: Are mechanisms tested? How are they identified?
  • Replicability: Has the study been done in a way that it can be replicated?
  • Interpretation: Do the results admit rival interpretations?

Evidence II: Analysis and Testing

  • Identification: are there concerns with reverse causality?
  • Identification: are there concerns of omitted variable bias?
  • Identification: does the model control for pre treatment variables only? Does it control or does it match?
  • Identification: Are poorly identified claims flagged as such?
  • Robustness: Are results robust to changes in the model, to subsetting the data, to changing the period of measurement or of analysis, to the addition or exclusion of plausible controls?
  • Standard errors: does the calculation of test statistics make use of the design? Do standard errors take account of plausibly clustering structures/differences in levels?
  • Presentation: Are the results presented in an intelligible way? Eg using fitted values or graphs? How can this be improved?
  • Interpretation: Can no evidence of effect be interpreted as evidence of only weak effects?

Evidence III: Other sources of bias

  • Fishing: were hypotheses generated prior to testing? Was any training data separated from test data?
  • Measurement error: is error from sampling, case selection, or missing data plausibly correlated with outcomes?
  • Spillovers / Contamination: Is it plausible that outcomes in control units were altered because of the treatment received by the treated?
  • Compliance: Did the treated really get treatment? Did the controls really not?
  • Hawthorne effects: Are subjects modifying behavior simply because they know they are under study?
  • Measurement: Is treatment the only systematic difference between treatment and control or are there differences in how items were measured?
  • Implications of Bias: Are any sources of bias likely to work for or against the hypothesis tested?

Explanation

  • Does the evidence support the particular causal account given?
  • Are mechanisms examined? Can they be?
  • Are there observable implications we might expect to see associated with different possible mechanisms?

Policy Implications

  • Do the policy implications really follow from the results?
  • If implemented would the policy changes have effects other thank those specified by the research?
  • Have the policy claims been tested directly?
  • Is the author overselling or underselling the findings?