14  Module 3: Causal Inference

Causal Effects, Observational Methods, Natural Experiments, and Mediation

14.1 Overview

Randomized experiments are the gold standard for causal inference — but most interesting research questions cannot be randomized. This module covers the modern toolkit for drawing causal conclusions from observational data, and for thinking more clearly about what “causation” even means.

We work through the potential outcomes framework (Rubin), the causal graphical framework (Pearl), matching and weighting estimators, regression discontinuity, difference-in-differences, synthetic control, and causal mediation analysis. Each method is motivated by a concrete research question and implemented in R.


14.2 Learning Goals

By the end of this module you should be able to:

  • Define average treatment effects (ATE, ATT, ATC) in the potential outcomes framework and explain the fundamental problem of causal inference
  • Read and draw a DAG, identify confounders, mediators, colliders, and back-door paths
  • Apply propensity score matching and inverse probability weighting, and assess covariate balance
  • Interpret a regression discontinuity design, choose bandwidth, and test for manipulation of the running variable
  • Explain the parallel trends assumption in difference-in-differences and assess its plausibility
  • Implement synthetic control and interpret the placebo tests
  • Distinguish total effects from direct and indirect effects, and explain when mediation analysis is and is not identified

What This Tutorial Is About

There is a single thread running through all three modules:

Your observable may not accurately reflect the latent quantity you want to study.

In Module 1 that problem lived in your outcome variable — your scale picked up variance from constructs you never intended to measure. In Module 2 it appeared in your treatment variable and your significance test — a manipulated treatment can inject multiple signals simultaneously, and a p-value only means what it means when randomisation is actually achieved.

Both modules used the same running example: AlterEco Coffee’s eco-label experiment, where participants were randomly assigned to see (or not see) an eco-label and reported their willingness to pay (WTP, $1–$10).

This module asks a harder set of questions — what if we cannot run that clean experiment?

Question Tool
What does “the effect” mean for an individual? Potential outcomes / treatment effect taxonomy
What if consumers freely choose eco-labelled products? Selection on observables
What if a certification rule creates a natural threshold? Regression discontinuity
What if eco-labelling was rolled out by policy over time? Difference-in-differences
Does the eco-label work because of perceived sustainability? Causal mediation analysis

Every section uses simulated data with known true effects so you can see exactly when each method recovers the truth — and when it does not.

NoteLearning Objectives
  • Distinguish ATE, ATT, ATC, LATE, ITT, and CATE — and know which one answers your question
  • Apply regression adjustment, IPW, covariate matching, propensity-score matching, entropy balancing, and doubly-robust estimation
  • Conduct an RDD analysis and test its assumptions
  • Implement DiD, an event study, and synthetic DiD
  • Diagnose when standard mediation gives wrong answers (three structural failure modes)
  • Use robmed and mediation packages; interpret sensitivity analyses
  • Compare bias across methods using a known data-generating system