Module 3: Causal Inference

Causal Effects, Observational Methods, Natural Experiments, and Mediation

Overview

Randomized experiments are the gold standard for causal inference — but most interesting research questions cannot be randomized. This module covers the modern toolkit for drawing causal conclusions from observational data, and for thinking more clearly about what “causation” even means.

The module covers the potential outcomes framework (Rubin), the causal graphical framework (Pearl), matching and weighting estimators, regression discontinuity, difference-in-differences, synthetic control, and causal mediation analysis. Each method is motivated by a concrete research question and implemented in R.

Learning Goals

By the end of this module you should be able to:

Define average treatment effects (ATE, ATT, ATC) in the potential outcomes framework and explain the fundamental problem of causal inference
Read and draw a DAG, identify confounders, mediators, colliders, and back-door paths
Apply propensity score matching and inverse probability weighting, and assess covariate balance
Interpret a regression discontinuity design, choose bandwidth, and test for manipulation of the running variable
Explain the parallel trends assumption in difference-in-differences and assess its plausibility
Implement synthetic control and interpret the placebo tests
Distinguish total effects from direct and indirect effects, and explain when mediation analysis is and is not identified

Recommended Reading

Paper	Why it matters
Imbens & Rubin (2015) — Causal Inference for Statistics, Social, and Biomedical Sciences	Comprehensive treatment of the potential outcomes framework with broad applications
Spencer, Zanna & Fong (2005) — Establishing a Causal Chain	Argues that causal mediation requires experimental manipulation of the mediator, not just statistical controls — explains the logic and provides a concrete design template
Pieters (2017) — Meaningful Mediation Analysis	Distinguishes between plausible causal mediation and mere statistical decomposition; shows how to interpret indirect effects without overclaiming and how to communicate them clearly
Rohrer et al. (2022)	Common pitfalls in path model interpretation — when your DAG leads you astray
Alfons & Schley (2025)	Robust mediation analysis: methods that remain valid when outliers or distributional assumptions are violated
Imai et al. (2010)	The formal causal framework for mediation analysis and the assumptions required for identification

Useful Online Resources

Resource	What it covers
Causal Inference: The Mixtape	Free online textbook covering the core causal inference toolkit with intuitive explanations and R/Stata code
The Effect	Accessible introduction to causal inference with minimal math, well-suited for social scientists
Causal Inference: What If	A rigorous but readable treatment by Hernán & Robins; math-light and available free online
Pearl & Mackenzie (2018) — The Book of Why	Accessible introduction to causal reasoning and DAGs for a broad scientific audience

What This Tutorial Is About

There is a single thread running through all three modules:

Your observable may not accurately reflect the latent quantity you want to study.

In Module 1 that problem lived in your outcome variable — your scale picked up variance from constructs you never intended to measure. In Module 2 it appeared in your treatment variable and your significance test — a manipulated treatment can inject multiple signals simultaneously, and a p-value only means what it means when randomisation is actually achieved.

Both modules used the same running example: AlterEco Coffee’s eco-label experiment, where participants were randomly assigned to see (or not see) an eco-label and reported their willingness to pay (WTP, $1–$10).

This module asks a harder set of questions — what if we cannot run that clean experiment?

Question	Tool
What does “the effect” mean for an individual?	Potential outcomes / treatment effect taxonomy
What if consumers freely choose eco-labelled products?	Selection on observables
What if a certification rule creates a natural threshold?	Regression discontinuity
What if eco-labelling was rolled out by policy over time?	Difference-in-differences
Does the eco-label work because* of perceived sustainability?*	Causal mediation analysis

Every section uses simulated data with known true effects so you can see exactly when each method recovers the truth — and when it does not.

Learning Objectives

Distinguish ATE, ATT, ATC, LATE, ITT, and CATE — and know which one answers your question
Apply regression adjustment, IPW, covariate matching, propensity-score matching, entropy balancing, and doubly-robust estimation
Conduct an RDD analysis and test its assumptions
Implement DiD, an event study, and synthetic DiD
Diagnose when standard mediation gives wrong answers (three structural failure modes)
Use robmed and mediation packages; interpret sensitivity analyses
Compare bias across methods using a known data-generating system