PolMeth Europe

Taking the con out of conjoints and other musings

Macartan Humphreys

2026-05-15

Roadmap

Dia dhaoibh
A few projects, plugs and asks
The meat: Survey experiments angst and prescriptions

Dia dhaoibh

Dia dhuit

[dee ah gwitch]

Dia dhaoibh

Dia dhuit
Dia is Muire dhuit

Dia dhaoibh

Dia dhuit
Dia is Muire dhuit
Dia is Muire dhuit is Pádraig
…

A few projects

Check out our new DeclareDesign shiny https://shiny2.wzb.eu/ipi/declaredesign/
Check out our new CausalQueries shiny https://shiny2.wzb.eu/ipi/process_tracing/
Working on a ReplicateEverything package

`ReplicateEverything`

`CausalQueries`

Model

Specify that \(X\) possibly causes \(Y\) but with arbitrary confounding.

library(CausalQueries)
model <- make_model("X -> Y; X <-> Y")

plot(model)

Data

Now imagine lots of data: data in line with a 0.25 effect, but confounding. Note asymmetry.

	Y=0	Y=1
X=0	30000	10000
X=1	20000	20000

Inferences

Code

Here we ask about the ATE (not SATE) and also about an attribution estimand.

model |>
  update_model(data = data) |>
  query_model(
    queries = list(
      ATE = "Y[X=1] - Y[X=0]",
      POC = "Y[X=1] - Y[X=0] :|: X == 1 & Y == 1"), 
    using = "posteriors", cred = 99) |>
  plot()

Output

EPSS Belfast Requests

Replication panel
AI panel

Methods panels

Polmeth Europe requests

Provide checklists to the community

The meat

The Survey Experiment Boom

Survey experiments are now the most common research design in political science (Torreblanca et al. 2025).

The promise is compelling:

“side-stepping the endogeneity and collinearity concerns that threaten our ability to draw causal inferences using observational data”

— Kertzer, Renshon, and Yarhi-Milo (2021)

There is a lot to like. But there are also systematic risks of confusion about what survey experiments can and cannot do.

Roadmap: Three Risks

Risk 1 — Estimand confusion
Is the goal causal or descriptive?
→ Different goals need different designs

Risk 2 — Controls confusion
What do controls actually do? (focus on the presence not the value of controls) → They define estimands, not reduce bias or error

Risk 3 — Extrapolation confusion
Can we go from survey to world?
→ Only under very strong conditions (either stop making these claims or seek to establish them)

Evergreen advice:

Know your estimand!

— Lundberg, Johnson, and Stewart (2021)

Caveats

This is a grumpy talk

I am sorry

I know not everyone, and maybe even not most, make these errors.

I am confused about this: When these errors are made, are looking at errors in comprehension or errors (or just norms?) in communication.

Background: Two Uganda experiments

Green’s challenge on Coethnicity

Is an experiment to measure discrimination similar to an experiment to understand the effects of a treatment?
This still confuses me

The score card

Lovely survey experimental results (rather “experiment in a survey”); nasty field results
This is a kind of vanilla external validity concern

Risk 1

What is the experiment trying to do?

Causal or descriptive estimand — the distinction shapes everything

Causal vs. Descriptive Estimands

Descriptive estimand

A property something actually has: - Knowledge, beliefs, preferences - Measurable “in principle” - You may or may not need an experiment

Example: How many voters prefer female candidates?

Causal estimand

A difference between potential outcomes: - Counterfactual; cannot be measured directly even in principle - You must do inference

Example: Does a candidate being female cause vote losses? (we’ll return to this)

Why the confusion arises: very similar designs can serve both purposes, and researchers often describe descriptive estimands using causal language (“the effect of X on Y”).

Key insight

If your estimand is descriptive, maybe you don’t need an experiment. Worth checking — the experiment may add error without adding value.

Are Preferences Properties or Effects?

Figure 1: Preferences and features combine to determine choices. Manipulating features lets us infer preferences.

Are Preferences Properties or Effects?

But still a kind of formal equivalence:

Figure 2: Preferences and features combine to determine choices. Manipulating features lets us infer preferences.

Clarifies that \(\theta\) is a property in the Pearlean world. Not an event or difference in potential timestamped outcomes as in Rubin.

Are Preferences Properties or Effects?

Figure 3: Preferences and features combine to determine choices. Manipulating features lets us infer preferences.

Three arguments for treating preferences as properties (not causal effects):

No real choices: Outcomes are stated intentions, not actual decisions
Constitutive features: Changing an attribute creates a different unit — not a modified one
Transportability: Properties allow broader inference across contexts

Experiment Type 0: The Stupidest Benchmark

Design

You are interested in the average age of people in a population

You divide the subjects into two groups:

Group A you ask: in what year were you born
Group B you ask: what year is it now?

Estimation

You estimate: \(\widehat {\overline {Age}} = \mathbb E_{i \in A}[Y_i(A)] - \mathbb E_{i \in B}[Y_i(B)]\)

You have a causal estimand and have done causal inference. This is valid, it is unbiased thanks to the randomization.
You use Neyman standard errors which are valid from randomization and needed because you are have an incomplete schedule of potential outcomes.

Congratulations.

Experiment Type 1: The Priming Experiment

Scenario A — for descriptive inference

A lawyer shows subjects a picture of a weapon and measures their stress response — to infer whether they already knew a weapon was used.

→ The causal inference is just a tool. The estimand is the subject’s prior knowledge \(K\).

Scenario B — for causal inference

A political scientist asks: does being reminded of corruption increase support for the opposition?

→ The estimand is the effect of the prime itself. Fully causal.

Risks with priming experiments

The classic trap: Confusing the effect of the prime (being reminded of violence) with the effect of the thing being primed (actual exposure to violence).

Claiming you have estimated the effect of past violence when you have only estimated the effect of being reminded of it.

Experiment Type 2: The Framing Experiment

Design: Vary question wording to assess whether form affects substance. The idea of an equivalence frame is that the same question is asked in different ways.

Descriptive use: Purge framing effects to recover “true” underlying preferences
(Goldin and Reck (2015))

Causal use: How does a loss vs. gain frame change how people think about a policy?
(Druckman (2001))

Contrast with conjoints: In a conjoint, different treatments ask different questions in the same way; framing experiments ask the same question in different ways.

Experiment Type 3: The Conjoint

“A factorial survey experiment designed to measure multidimensional preferences”

— De la Cuesta, Egami, and Imai (2022)

Conjoints have two genuinely distinct use cases — rarely recognized:

	Causal Use	Descriptive Use
Goal	Effect of signal on stated choices	Measure preferences / classification rules / ideal points
Think of it as	Mimicking field experiment on choices	Elicitation of a preference function
Frequency	Rare?	Typical?
Trap	Confuse signal effect with attribute effect	Confuse classification rule with causal effect

Risk

The dual confusion

Researchers often intend descriptive inference but describe results in causal language — or vice versa. The distinction matters for design, analysis, and interpretation.

Experiment Type 3: Decriptive Conjoints

Descriptive case 1: You ask would you react under each of these imagined situations?

Ask about lots!
Goal: Get people’s own read on their potential outcomes.
Summarize beliefs using AMCE.

Descriptive case 2: You want to learn about an algorithm: what rule does a bank use when deciding to give credit? What rule does an AI model use to assess the validity of a statement.

Send in multiple queries.
Summarize rules using AMCE.

In both cases you might consider alternatives:

Ask the bank what the rule is
Inspect the algorithm directly
Perhaps: ask many questions
Use summaries other than AMCE

Experiment Type 3: Causal Conjoints

Classic candidate experiment: in a world where a given set of facts about attributes is available (and only these), how do choices depend on the elements of the sets.

Treatment is information, not attributes
Willing to believe subjects act ‘as if’ they are making a choice

Hainmueller et al external validity study two edged sword supporting this view: it might work! but the actual target applications are quite unusual

Go in peace

Risk 2

What do controls actually do?

Controls define estimands — they do not reduce confounding

The Promise of Controls

Conjoint experiments are celebrated for the ability to control many features simultaneously.

Researchers describe this as addressing confounding:

“varying other features lets them distinguish the effect of democracy from potential confounders”

— Tomz and Weeks (2013)

“holding the military power of the target constant, we reduce the possibility of the respondents drawing inferences about the target’s level of military power from the democracy treatment, which is perhaps the most obvious potential confounder”

— Bell and Quek (2018)

What is wrong here?

Random assignment already addresses confounding. Controls in conjoints do something different — they change what you are estimating.

Controls Change Which Estimand You Target

Three distinct purposes controls serve (but often confused):

Remove bias from confounding — not needed after randomization
Improve precision (reduce variance) — can actually backfire in conjoints
Define the estimand — what you are conditioning on changes what you are asking

The critical insight

Choosing which controls to include — not just their values — directly determines the estimand. This means:

Different control sets → different, incomparable estimands
Pooling across studies with different controls is dangerous

Controls Can Increase Uncertainty

A counterintuitive result?:

Adding controls at the intervention stage can increase variance in conjoint experiments.

Why? Controlling \(A_2\) at the intervention stage introduces variation that must then be removed in the analysis to recover the baseline effect. The net effect on variance can be positive.

Say: \(Y = A_1 \cdot A_2\) where \(A_2 \sim \text{Bernoulli}(0.5)\), and this is known to all subjects.

Then: if no information on \(A_2\) is provided, the variance of estimates of the effect of \(A_1\) is lower than when \(A_2\) is also controlled and randomized.

Note

This undermines the “precision argument” for adding controls in conjoints.

The standard rationale (“controls reduce variance”) does not straightforwardly apply in this context.

Preferences “over an Attribute” Reflect Valuation and Informational Content

Focus not on distribution of levels but presence.

Setup: \(Y = A_2 \times A_3\). So \(A_1\) no causal role in quality. Yet whether \(A_1\) “matters” for preferences depends entirely on what else is known.

Joint distribution of \((A_1, A_2, A_3)\):

	\(A_1 = 0\)	\(A_1 = 1\)
\(A_2=0,\; A_3=0\)	\(4/16\)	\(0\)
\(A_2=0,\; A_3=1\)	\(4/16\)	\(0\)
\(A_2=1,\; A_3=0\)	\(1/16\)	\(2/16\)
\(A_2=1,\; A_3=1\)	\(3/16\)	\(2/16\)
Total	\(12/16\)	\(4/16\)

Key feature: \(A_1=1\) is a strong signal that \(A_2=1\) (and thus informative about \(Y\), via \(A_2\)).

Preferences “over an Attribute” Reflect Valuation and Informational Content

“When IE [information equivalence] is violated, the effect of the manipulation need not correspond to the quantity of interest — the effect of beliefs about the focal attribute”

— Dafoe, Zhang, and Caughey (2018)

Results: Say \(\Pr(Y=1) = \Pr(A_2=1 \,\&\, A_3=1)\):

Controls	\(\Pr(Y{=}1\mid A_1{=}1)\)	\(\Pr(Y{=}1\mid A_1{=}0)\)	Effect
None	\(1/2\)	\(1/4\)	+0.25
\(A_2\) controlled	\(1/2\)	\(3/4\)	−0.25
\(A_2\) and \(A_3\) controlled	\(1\)	\(1\)	0

What is wrong here?

Warning

All three estimates are correct for their own estimand. None reveals that \(A_1\) is causally irrelevant.

The sign and existence of an apparent preference for \(A_1\) depends entirely on which controls are included — not on \(A_1\)’s role in the world.

Mental Model 1: Each Attribute Affects Quality Directly

Privilege, wealth, and ability each contribute directly to quality.

Figure 4: Left: respondent’s mental model. Right: what the experimenter can observe when only signals for privilege and wealth are provided — ability is unobserved.

In this world, the experiment cleanly recovers the mental model: signals map directly to beliefs about the attributes they describe.

Mental Model 2: Ability Matters, Wealth is a Collider

Wealth is produced by privilege and ability — so signalling privilege and wealth is informative about ability.

Figure 5: Left: respondent’s mental model — privilege and ability jointly produce wealth; ability drives quality. Right: what signals about privilege and wealth reveal about beliefs.

Adding the wealth control creates an apparent effect of privilege via collider bias: conditioning on wealth renders privilege informative about quality.

The Mental Models Problem: Key Implication

In Mental Model 2 (right column): subjects believe only ability drives quality, and that wealth is produced by privilege and ability. Privilege has no direct effect on quality.

But when both privilege and wealth are controlled, the implied “preferences” are identical in both worlds:

Control set	World 1	World 2
\(P\) and \(W\)	\(Y = \tfrac{1}{3} - \tfrac{1}{3}P + \tfrac{2}{3}W\)	\(Y = \tfrac{1}{3} - \tfrac{1}{3}P + \tfrac{2}{3}W\)
\(P\) only	\(Y = \tfrac{2}{3} - \tfrac{1}{3}P\)	\(Y = 0.25\)
\(W\) only	\(Y = \tfrac{1}{6} + \tfrac{2}{3}W\)	\(Y = \tfrac{1}{8} + \tfrac{1}{2}W\)

You cannot recover respondents’ mental models from conjoint data

The same pattern of responses can arise from very different underlying causal beliefs.
Showing that a signal increases an evaluation does not show that the subject believes the attribute causes quality.

Conjoint Scenario: Taste-Based Discrimination

Setup: Researchers randomize a candidate’s race and control for “criminality,” claiming this isolates taste-based discrimination:

“controlling for other features lets them assess taste-based discrimination”

— Ono and Burden (2019); Boittin, Fisher, and Mo (2024); Olinger et al. (2024)

Or: Controlling for content of speech, do observers assess actions differently for Muslim and Christian speakers?

Risks and prescriptions

Two risks:

Risk 1 — Controlled ≠ Natural direct effect
Fixing downstream beliefs exogenously estimates possible discrimination, not actual discrimination. If an employer never in fact encounters low-skill candidates, the controlled effect can be non-zero while the natural direct effect is zero.

Risk 2 — Over-control and thinning
If discrimination works through beliefs about skills, controlling skills removes the channel. You can make any direct effect disappear:

In a line of dominoes each with unit effects, only the second-to-last domino has a non-zero controlled direct effect.

The prescription

Claims like “Americans do not select doctors based on race” should specify exactly which downstream features were controlled — the conclusion depends on this.

The Real Structure

In a conjoint, there is a causal hierarchy from features to outcomes:

\[\underbrace{A_1, A_2}_{\text{Attributes}} \;\rightarrow\; \underbrace{I_1, I_2}_{\text{Information provided}} \;\rightarrow\; \underbrace{B_1, B_2}_{\text{Beliefs formed}} \;\rightarrow\; \underbrace{Y}_{\text{Evaluation}}\]

Figure 6

The experiment randomizes information (\(I\)). Subjects are not told attributes are random. They may infer \(A_2\) from \(I_1\) — just as in the real world.

Seven Estimands, One Design

Table 1

	Estimand	Definition	Identified?
1	Attribute effect	\(Y(A_1=1) - Y(A_1=0)\)	No
2	Information effect	\(Y(I_1=1) - Y(I_1=0)\)	<b>Yes ✓</b>
3	Belief effect	\(Y(B_1=1) - Y(B_1=0)\)	No
4	Conditional info effect	\(Y(I_1=1, I_2) - Y(I_1=0, I_2)\)	<b>Yes ✓</b>
5	Conditional belief effect	\(Y(B_1=1, I_2) - Y(B_1=0, I_2)\)	No
6	Controlled info effect	\(Y(I_1=1, B_2) - Y(I_1=0, B_2)\)	No
7	Controlled belief effect	\(Y(B_1=1, B_2) - Y(B_1=0, B_2)\)	No

The conjoint identifies information effects (rows 2, 4) only.

Risk 3

Can we go from survey to world?

Survey experiments manipulate signals, not features of the world

The Gap Between Claims and Evidence

Survey experiments are routinely used to make claims like these:

“allies who stood firm in the past indeed gain a reputation for resolve and are seen as more likely to stand firm in the current crisis”

— Kertzer, Renshon, and Yarhi-Milo (2021)

“shared democracy pacifies the public primarily by changing perceptions of threat and morality”

— Tomz and Weeks (2013)

“the promise to renovate schools increases the probability of support by only four percentage points over candidates who do not make these promises”

— Mares and Visconti (2020)

“support for elected governance is not contingent on the state’s providing economic benefits”

— Ridge (2024)

“movement towards the other party improves vote shares when party positions are unpopular”

— Broockman and Kalla (2026)

“the causal effect of candidate extremity on citizens’ preferences”

— Amsalem and Zoizner (2024)

These sound like claims about features of the world. They are based on effects of words in a controlled survey.

Two threats:

Estimand coherence: Maps to nowhere
Licence to transport: From effects of information to effects of attributes

Maps to Nowhere

Before asking how to export results, ask: is the target estimand even well-defined?

Conjoints estimate effects of signals cleanly. The corresponding real-world estimands — effects of gender, regime type, corruption — may not be.

Three threats to estimand existence:

	Threat	In a nutshell
1	Attributes as causes	Changing the attribute may change the unit itself
2	SUTVA violations	Many versions of the treatment, each with different effects
3	Exclusion restriction	No lever to change the attribute without side effects

Important

The effect of statements about a feature \(\neq\) the effect of the feature itself

Threat 1: Attributes as Causes

Holland’s challenge: “attributes of units are never causes”

If you change the attribute, you may no longer be talking about the same unit.

Hard cases — features constitutive of the unit:

Would a circle still have a low perimeter-to-area ratio if it had corners?
Would Trump have won in 2020 if he had been born a woman yesterday in the Arctic?

Softer reading: at minimum, the effect of “being female” requires specifying which female version of the candidate — with the same upbringing? after a transition at age 30? born female 50 years ago?

These are different interventions with different potential outcomes.

Note

A conjoint randomizes a gender label in a survey. This is well-defined. The corresponding real-world intervention is not — unless it is specified precisely (e.g. “the effect of describing the candidate as female in campaign materials”).

Threat 2: SUTVA Violations — Hidden Treatment Versions

The Stable Unit Treatment Value Assumption requires no hidden versions of treatment.

For states rather than interventions, this is almost always violated.

Example: “The effect of being a democracy”

A state democratic for 200 years?
A state that transitioned last year via election?
A state that transitioned via coup three years ago?

Each version has different potential outcomes. Averaging over them produces an estimand that corresponds to no specific real-world scenario.

Other examples: the effect of “being a migrant,” “being employed,” “being wealthy” — all underspecified without a temporal stamp and a specific pathway.

Note

This is not unique to conjoints. Observational causal inference faces the same challenge whenever treatment is defined by a state rather than an intervention. Survey experiments just make it more visible because the treatment is so easily specified in the survey.

Threat 3: Exclusion Restriction

If every lever we can imagine to change an attribute inevitably induces other effects on outcomes, the clean causal estimand does not exist.

Candidate gender:

Can we imagine a Trump victory scenario where he was female — without imagining a gender transition at some point in his life? If a transition is required, then the “gender effect” includes the effect of transitioning, which is surely not the intended estimand.

Regime type:

Can we imagine a democracy without imagining the process that produced it (elections, a revolution, foreign pressure)? Those processes have their own effects on conflict and cooperation.

Warning

The logic: if we can only change \(A_1\) by also changing \(Z\) (the lever), and \(Z\) affects outcomes directly, then the effect of \(A_1\) is entangled with the effect of \(Z\).

In audit experiments the same issue arises: sending a “Black-sounding” name CV versus a “White-sounding” name CV changes more than just the perceived race.

What Would It Take? The Six Conditions

For AMCE = ATE (survey estimand = real-world causal effect):

	Condition	The challenge
A1	Causal autonomy of attributes	Attributes often cause each other (power corrupts)
A2	Sovereignty: votes → vote shares	Abstention, misreporting, mobilization
A3	Sincere voting	Strategic behavior, bandwagon effects
A4	Context irrelevance given attributes	Culture, institutions shape preferences
A5	Signals are complete mediators	Attributes may affect behavior beyond beliefs
A6	Aligned distributions	Survey profiles ≠ real-world distribution

The DAG: Promising vs. Threatened Cases

Figure 7: Left: a structure under which AMCE can recover the effect of attributes on votes. Right: three numbered threats to this inference. Arrow 1 = causal relation between attributes (violates A1). Arrow 2 = cross-attribute updating (violates A5). Arrow 3 = direct behavioral effect (violates A5).

Three Threats to Equivalence

① Endogenous attributes (violates A1)

In the real world \(A_1\) may cause \(A_2\), even if we randomize signals independently.

Example: Power may beget corruption. A conjoint randomizes these independently; the real world does not.

Consequence: AMCE for \(A_1 = 0\), but ATE \(= 0.3\) in a worked example with identical distributions.

② Cross-attribute updating (violates A5)

Subjects update beliefs about unobserved features based on observed signals.

Example: Learning a candidate’s profession, subjects update on their gender.

Consequence: Signal of one attribute affects beliefs about another — even without information on the second attribute.

③ Direct behavioral effects (violates A5)

Attributes may affect behavior through channels beyond preferences and beliefs.

Example: A powerful candidate can intimidate voters. A wealthy candidate can buy votes.

Consequence: The conjoint captures stated preferences; real voting also responds to power and material incentives.

Conclusions

Three Risks: Summary

Risk	The confusion	The fix
1. Estimand	Is the goal descriptive or causal?	Be explicit; ask if an experiment is even necessary
2. Controls	Controls address confounding	Controls define estimands; different sets → incomparable results
3. Extrapolation	AMCE = effect in the world	Requires A1–A6; these are extremely demanding

The credibility revolution revealed how hard causal inference is in observational data. Survey experiments seemed to offer a solution — randomize what cannot be randomized in the field. But: Survey experiments do not ‘solve’ these problems. Rather they point our attention to different* problems that can be more readily solved.*

Three Prescriptions

1. Know your estimand — before you design

If your goal is descriptive, check whether an experiment is necessary. Direct measurement may be simpler and less noisy. Choose the estimand first; choose the design to fit it.

2. Be explicit about what controls do

Report which attributes are included and why. Do not pool studies with different control sets — they answer different questions. Claims about “direct effects” (e.g., taste-based discrimination) must specify what is controlled downstream.

3. Keep claims in line with design

Subjects are treated, not candidates
Stated preferences are not votes; hypothetical outcomes are not actual outcomes
Effects of information about X are not effects of X
Always specify the informational environment: “the effect of a gender signal, given information about Y and Z, on stated preferences”

What Survey Experiments Can Do Well

There is genuine power here, used right:

Conjoint experiments are excellent for:

Measuring preferences in complex multidimensional spaces
Estimating information effects under controlled conditions
Studying how signals (not attributes) shape responses
Recovering classification rules
Diagnosing how informational context shapes attitudes

The honest statement of an AMCE:

The average effect of a signal about attribute \(X\), holding other controlled information fixed, on stated preferences — averaged over the distribution of information conditions in the experiment.

This is well-defined, identifiable, and useful.

It is just not the same as the effect of \(X\) in the world.

Key

Don’t let the method determine the estimand

Thank You

Three risks in conjoint experiments:

Estimand confusion — Causal vs. descriptive: different goals need different designs, and the experiment may not be necessary for descriptive goals
Controls confusion — Controls define estimands, not purify them: which controls you include changes what question you are answering; different controls → different, incomparable estimands
Extrapolation confusion — Surveys manipulate signals, not world features: translating AMCE to real-world causal effects requires six strong assumptions, each potentially violated

The AMCE: classic, well-defined, identifiable. Use it carefully and describe it accurately.

Know your estimand!

Fin

Slán

[Salve, Sláinte]

References

Amsalem, Eran, and Alon Zoizner. 2024. “The Causal Effect of Candidate Extremity on Citizens’ Preferences: Evidence from Conjoint Experiments.” Public Opinion Quarterly 88 (3): 859–85.

Bell, Mark S, and Kai Quek. 2018. “Authoritarian Public Opinion and the Democratic Peace.” International Organization 72 (1): 227–42.

Boittin, Margaret L, Rachel S Fisher, and Cecilia Hyunjung Mo. 2024. “Evidence of Caste-Class Discrimination from a Conjoint Analysis of Law Enforcement Officers.” American Political Science Review 118 (1): 504–11.

Broockman, David E, and Joshua L Kalla. 2026. “Should Moving to the Middle Win Candidates Votes? It Depends Where Voters Are.”

Dafoe, Allan, Baobao Zhang, and Devin Caughey. 2018. “Information Equivalence in Survey Experiments.” Political Analysis 26 (4): 399–416.

De la Cuesta, Brandon, Naoki Egami, and Kosuke Imai. 2022. “Improving the External Validity of Conjoint Analysis: The Essential Role of Profile Distribution.” Political Analysis 30 (1): 19–45. https://doi.org/10.1017/pan.2020.40.

Druckman, James N. 2001. “The Implications of Framing Effects for Citizen Competence.” Political Behavior 23 (3): 225–56.

Goldin, Jacob, and Daniel Reck. 2015. “Framing Effects in Survey Research: Consistency-Adjusted Estimators.” Unpublished, Stanford Law School, CA, US.

Kertzer, Joshua D, Jonathan Renshon, and Keren Yarhi-Milo. 2021. “How Do Observers Assess Resolve?” British Journal of Political Science 51 (1): 308–30.

Lundberg, Ian, Rebecca Johnson, and Brandon M Stewart. 2021. “What Is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory.” American Sociological Review 86 (3): 532–65.

Mares, Isabela, and Giancarlo Visconti. 2020. “Voting for the Lesser Evil: Evidence from a Conjoint Experiment in Romania.” Political Science Research and Methods 8 (2): 315–28.

Olinger, Reilly, Benjamin Matejka, Rohan Chakravarty, Margaret Johnston, Eliana Ornelas, Julia Draves, Nishi Jain, et al. 2024. “Americans Do Not Select Their Doctors Based on Race.” Frontiers in Sociology 8: 1191080.

Ono, Yoshikuni, and Barry C Burden. 2019. “The Contingent Effects of Candidate Sex on Voter Choice.” Political Behavior 41 (3): 583–607.

Ridge, Hannah M. 2024. “Democratic Commitment in the Middle East: A Conjoint Analysis.” Political Science Research and Methods 12 (2): 285–300.

Tomz, Michael R, and Jessica LP Weeks. 2013. “Public Opinion and the Democratic Peace.” American Political Science Review 107 (4): 849–65.

Torreblanca, Carolina, William Dinneen, Guy Grossman, and Yiqing Xu. 2025. “The Credibility Revolution in Political Science.”

PolMeth Europe

Roadmap

Dia dhaoibh

Dia dhaoibh

Dia dhaoibh

A few projects

ReplicateEverything

ReplicateEverything

CausalQueries

Model

Data

Inferences

Code

Output

EPSS Belfast Requests

Polmeth Europe requests

The meat

The Survey Experiment Boom

Roadmap: Three Risks

Caveats

Background: Two Uganda experiments

Green’s challenge on Coethnicity

The score card

Risk 1

What is the experiment trying to do?

Causal vs. Descriptive Estimands

Are Preferences Properties or Effects?

Are Preferences Properties or Effects?

Are Preferences Properties or Effects?

Experiment Type 0: The Stupidest Benchmark

Design

Estimation

Experiment Type 1: The Priming Experiment

Experiment Type 2: The Framing Experiment

Experiment Type 3: The Conjoint

Risk

Experiment Type 3: Decriptive Conjoints

Experiment Type 3: Causal Conjoints

Risk 2

What do controls actually do?

The Promise of Controls

Controls Change Which Estimand You Target

Controls Can Increase Uncertainty

Preferences “over an Attribute” Reflect Valuation and Informational Content

Preferences “over an Attribute” Reflect Valuation and Informational Content

What is wrong here?

Mental Model 1: Each Attribute Affects Quality Directly

Mental Model 2: Ability Matters, Wealth is a Collider

The Mental Models Problem: Key Implication

Conjoint Scenario: Taste-Based Discrimination

Risks and prescriptions

The Real Structure

Seven Estimands, One Design

Risk 3

Can we go from survey to world?

The Gap Between Claims and Evidence

Maps to Nowhere

Threat 1: Attributes as Causes

Threat 2: SUTVA Violations — Hidden Treatment Versions

Threat 3: Exclusion Restriction

What Would It Take? The Six Conditions

The DAG: Promising vs. Threatened Cases

Three Threats to Equivalence

Conjoint Scenario: “Effect of Gender on Vote Share”

Conclusions

Three Risks: Summary

Three Prescriptions

What Survey Experiments Can Do Well

Thank You

Fin

References

`ReplicateEverything`

`ReplicateEverything`

`CausalQueries`