14 Module 3: Causal Inference
Causal Effects, Observational Methods, Natural Experiments, and Mediation
14.1 Overview
Randomized experiments are the gold standard for causal inference — but most interesting research questions cannot be randomized. This module covers the modern toolkit for drawing causal conclusions from observational data, and for thinking more clearly about what “causation” even means.
We work through the potential outcomes framework (Rubin), the causal graphical framework (Pearl), matching and weighting estimators, regression discontinuity, difference-in-differences, synthetic control, and causal mediation analysis. Each method is motivated by a concrete research question and implemented in R.
14.2 Learning Goals
By the end of this module you should be able to:
- Define average treatment effects (ATE, ATT, ATC) in the potential outcomes framework and explain the fundamental problem of causal inference
- Read and draw a DAG, identify confounders, mediators, colliders, and back-door paths
- Apply propensity score matching and inverse probability weighting, and assess covariate balance
- Interpret a regression discontinuity design, choose bandwidth, and test for manipulation of the running variable
- Explain the parallel trends assumption in difference-in-differences and assess its plausibility
- Implement synthetic control and interpret the placebo tests
- Distinguish total effects from direct and indirect effects, and explain when mediation analysis is and is not identified
14.3 Recommended Reading
| Paper | Why it matters |
|---|---|
| Imbens & Rubin (2015) — Causal Inference for Statistics, Social, and Biomedical Sciences | Comprehensive treatment of the potential outcomes framework with broad applications |
| Spencer, Zanna & Fong (2005) — Establishing a Causal Chain | Argues that causal mediation requires experimental manipulation of the mediator, not just statistical controls — explains the logic and provides a concrete design template |
| Pieters (2017) — Meaningful Mediation Analysis | Distinguishes between plausible causal mediation and mere statistical decomposition; shows how to interpret indirect effects without overclaiming and how to communicate them clearly |
| Rohrer et al. (2022) | Common pitfalls in path model interpretation — when your DAG leads you astray |
| Alfons & Schley (2025) | Robust mediation analysis: methods that remain valid when outliers or distributional assumptions are violated |
| Imai et al. (2010) | The formal causal framework for mediation analysis and the assumptions required for identification |
Useful Online Resources
| Resource | What it covers |
|---|---|
| Causal Inference: The Mixtape | Free online textbook covering the core causal inference toolkit with intuitive explanations and R/Stata code |
| The Effect | Accessible introduction to causal inference with minimal math, well-suited for social scientists |
| Causal Inference: What If | A rigorous but readable treatment by Hernán & Robins; math-light and available free online |
| Pearl & Mackenzie (2018) — The Book of Why | Accessible introduction to causal reasoning and DAGs for a broad scientific audience |
What This Tutorial Is About
There is a single thread running through all three modules:
Your observable may not accurately reflect the latent quantity you want to study.
In Module 1 that problem lived in your outcome variable — your scale picked up variance from constructs you never intended to measure. In Module 2 it appeared in your treatment variable and your significance test — a manipulated treatment can inject multiple signals simultaneously, and a p-value only means what it means when randomisation is actually achieved.
Both modules used the same running example: AlterEco Coffee’s eco-label experiment, where participants were randomly assigned to see (or not see) an eco-label and reported their willingness to pay (WTP, $1–$10).
This module asks a harder set of questions — what if we cannot run that clean experiment?
| Question | Tool |
|---|---|
| What does “the effect” mean for an individual? | Potential outcomes / treatment effect taxonomy |
| What if consumers freely choose eco-labelled products? | Selection on observables |
| What if a certification rule creates a natural threshold? | Regression discontinuity |
| What if eco-labelling was rolled out by policy over time? | Difference-in-differences |
| Does the eco-label work because of perceived sustainability? | Causal mediation analysis |
Every section uses simulated data with known true effects so you can see exactly when each method recovers the truth — and when it does not.
- Distinguish ATE, ATT, ATC, LATE, ITT, and CATE — and know which one answers your question
- Apply regression adjustment, IPW, covariate matching, propensity-score matching, entropy balancing, and doubly-robust estimation
- Conduct an RDD analysis and test its assumptions
- Implement DiD, an event study, and synthetic DiD
- Diagnose when standard mediation gives wrong answers (three structural failure modes)
- Use
robmedandmediationpackages; interpret sensitivity analyses - Compare bias across methods using a known data-generating system