graph LR X[Eco-label exposure] --> M[Perceived sustainability] M --> Y[Actual spending] X --> Y
Part 4: Mediation Analysis
Mediation analysis asks whether the eco-label affects WTP directly or through a mechanism — here, perceived sustainability (M). Three estimators are compared across three scenarios that each challenge a different method.
| Scenario | Key feature | Which method performs best in this scenario |
|---|---|---|
| 1: Clean normal data | No violations — a baseline | All methods perform well |
| 2: Large outliers | Correlated high-leverage points inflate OLS b-path | ROBMED recovers the true effect |
| 3: M–Y confounded | Unobserved U causes both M and Y; true ACME = 0 | Imai sensitivity flags the fragility |
Baron–Kenny mediation decomposes X’s total effect on Y into two paths:
\[\underbrace{X \xrightarrow{\;a\;} M \xrightarrow{\;b\;} Y}_{\text{indirect (ACME) = } a \times b} \quad + \quad \underbrace{X \xrightarrow{\;c'\;} Y}_{\text{direct (ADE)}}\]
The ACME (Average Causal Mediation Effect) is the product \(a \times b\). It has two failure modes that the three scenarios below are specifically designed to reveal:
- The b-path can be estimated on the wrong data — if outliers inflate the apparent slope between M and Y, \(\hat{b}\) is too large, and so is \(\hat{a} \times \hat{b}\). ROBMED addresses this.
- The b-path can be estimated on the wrong model — if an unobserved variable \(U^*\) drives both M and Y, the OLS \(\hat{b}\) picks up the \(U^*\) signal as if it were causal M→Y. No regression method alone can fix this; Imai’s sensitivity analysis quantifies how severe the problem would need to be to explain away the result.
A Module 1 failure mode: There is a third, often overlooked failure mode — measurement error in M. If perceived sustainability is measured as a single noisy item rather than a well-validated multi-item scale (Module 1 — reliability, CFA, convergent validity), the \(\hat{a}\) path is attenuated (like any regression coefficient on a mismeasured predictor) and the \(\hat{b}\) path is biased as well, because M carries both signal and noise into the Y regression. The practical implication: investing in high-quality mediator measurement at the design stage (multi-item constructs, established scales, pilot reliability checks) is not just good practice — it is a precondition for trustworthy mediation estimates. A large ACME from a single-item mediator may be entirely an artifact of differential measurement error.
A Module 2 connection: Mediation analysis is typically conducted on experimental data where X is randomised (as in Scenarios 1–3 here). This randomisation satisfies the a-path assumption (no confounding of X→M). But randomisation does not satisfy the b-path assumption — M is not randomised, so the M→Y step faces all the observational identification challenges of Part 2 of this module. The sensitivity parameter \(\rho^*\) below is a direct quantification of how much M–Y confounding would be needed to nullify the ACME, just as power analysis in Module 2 quantified how much sampling error could explain an apparent effect.
Before we draw the mediation DAG, we need to ask whether the proposed mediator and outcome are actually distinct causal variables.
Before Drawing the DAG: Mediation or Measurement?
Causal mediation is a causal claim about a process, not merely a statistical claim about correlated variables. The core assertion is that X changes M, and M changes Y. This requires that M and Y be defined as causal variables — entities with well-defined counterfactual states whose relationship involves influence, not just co-occurrence. Pearl (2001) defines mediation in terms of path-specific effects along well-specified causal structures. Imai, Keele, and Yamamoto (2010a) show that causal mediation estimands require assumptions connecting the observed mediator and outcome to counterfactual quantities — assumptions that are not satisfied simply by observing that M and Y are correlated. Imbens (2019) frames the contrast between potential-outcome and directed-acyclic-graph approaches as a question about defining causal effects rather than associations among measured variables.
The practical implication is direct: mediation requires that the mediator be capable of causing the outcome. This is different from saying that the mediator and outcome are highly related, or that they are two ways of measuring the same underlying reaction. If perceived sustainability is one item inside an overall product evaluation scale, the association between perceived sustainability and overall product evaluation may be partly mechanical — a product of how the scale was constructed, not of a causal process. In that case, the analysis may be decomposing a measurement scale rather than estimating a causal mechanism.
However, overlap does not automatically invalidate mediation. Many real constructs overlap conceptually and empirically. The key question is whether the proposed M → Y path operates through the part of M that is merely shared with Y, or through a part of M that is distinct from Y. If the link between M and Y exists only because of shared construct content, the result is measurement overlap, not mechanism. If the distinct part of M plausibly changes a distinct part of Y, mediation may still be possible even when M and Y are moderately correlated.
A mediator can be related to the outcome. It can even partially overlap with the outcome conceptually. The problem arises when the estimated mediator–outcome relationship is driven primarily by the shared part of the constructs. In that case, the “indirect effect” is not evidence of a causal mechanism — it is evidence that the researcher measured similar content twice and called the correlation a pathway.
A simple mediation DAG treats X, M, and Y as construct-level causal variables. It does not know whether “perceived sustainability” and “overall product evaluation” are distinct constructs or overlapping indicators of the same latent evaluation. If M and Y are mostly shared measurement content, the DAG is misspecified. The problem is not solved by drawing arrows.
Five cases: from strong mediation to measurement overlap
The cases below illustrate the spectrum from clean causal mediation to pure measurement overlap. Each is shown with a DAG representing the causal claim and a Venn diagram representing conceptual and measurement overlap between M and Y.
Case 1: Clean mediation, low conceptual overlap
X = eco-label exposure · M = perceived sustainability · Y = actual spending in an incentive-aligned field experiment
Low overlap. The mediator and outcome are conceptually distinct. This is the clearest mediation case.
This is the strongest mediation design. The mediator is a belief or perception; the outcome is real economic behaviour — how much of an endowment the participant actually spends on the product, or whether they choose it when it has a real cost. The two constructs share little conceptual territory, and the b-path is unlikely to be driven by shared item content because spending is not a survey rating.
Case 2: Plausible but weaker mediation, moderate overlap
X = eco-label exposure · M = perceived sustainability · Y = hypothetical WTP in an online survey
graph LR X[Eco-label exposure] --> M[Perceived sustainability] M --> Y[Hypothetical WTP] X --> Y
Moderate overlap. Hypothetical WTP may partly reflect the same evaluative response translated into a dollar amount.
This is the design used throughout this module, and it can still support a mediation claim — but it is weaker than actual spending. In an online survey, hypothetical WTP can function like a continuous version of overall product evaluation: both capture “how much do I like this product.” The mediation interpretation is more convincing when WTP is incentive-compatible, consequential, or tied to a real choice. We use WTP below because it is easy to simulate; in a real design, a behavioural spending or choice outcome would give the mediation claim stronger discriminant validity.
Case 3: Mostly measurement overlap, high conceptual overlap
X = eco-label exposure · M = perceived sustainability · Y = overall product evaluation (good, high quality, responsible, eco-friendly, sustainable — Likert items)
graph LR X[Eco-label exposure] --> M[Perceived sustainability] M -. questionable .-> Y[Overall product evaluation] X --> Y
High overlap. The M → Y association may reflect shared item content rather than a causal mechanism.
If the M → Y link is mainly driven by the shared green evaluative content, this is measurement overlap rather than mediation. The eco-label changes a general green evaluation factor, and both perceived sustainability and overall product evaluation are indicators of that factor — not cause and effect. The dashed arrow signals that the b-path is questionable, not established.
Case 4: Not mediation — M is inside Y
X = eco-label exposure · M = perceived sustainability · Y = a sustainability index that includes perceived sustainability as one of its component items
graph LR X[Eco-label exposure] --> M[Perceived sustainability] M -. not causal .-> Y[Sustainability index] X --> Y
Sub-measure case. The mediator is literally part of the outcome measure.
This is not a mediation design. The b-path is partly mechanical because Y includes M as an input — M will always predict Y regardless of any causal relationship. The indirect effect cannot be interpreted as evidence that perceived sustainability caused the index outcome. It reflects, at least in part, how the index was constructed.
Case 5: Overlap exists, but mediation may still be possible
X = eco-label exposure · M = perceived sustainability · Y = overall product evaluation — where the researcher theorises that the non-overlapping component of M (a distinct environmental belief) affects a non-overlapping component of Y (perceived durability, expected social approval, or anticipated brand reliability)
graph LR X[Eco-label exposure] --> M[Perceived sustainability] M --> Y[Distinct product judgment] X --> Y
Overlap does not automatically rule out mediation. The causal claim is strongest when the M → Y path runs through the distinct part of M, not the shared region.
This is the key conceptual point. If perceived sustainability predicts overall product evaluation only because both contain the same “green evaluation” content, that is measurement overlap. But if the distinct part of perceived sustainability — the belief that this product has genuinely lower environmental impact — changes a distinct downstream judgment such as perceived durability or expected social approval, mediation may still be defensible. The researcher must justify this theoretically and, where possible, demonstrate it through careful measurement design.
The Venn diagrams above represent conceptual and measurement overlap. The DAGs represent causal claims. A valid mediation argument requires both: a plausible causal DAG, and enough discriminant validity that M and Y are not merely duplicate measures of the same construct.
| Case | Conceptual overlap | DAG interpretation | Venn interpretation | Verdict |
|---|---|---|---|---|
| Perceived sustainability → actual spending | Low | Plausible mediation | Mostly separate constructs | Strong mediation case |
| Perceived sustainability → hypothetical WTP | Moderate | Plausible but weaker | Some shared evaluation content | Use caution; incentive-align if possible |
| Perceived sustainability → overall product evaluation | High | Questionable M → Y path | Large shared construct region | Often measurement overlap |
| Perceived sustainability as item inside sustainability index | Complete / sub-measure | Invalid as mediation | M contained inside Y | Not mediation |
| Distinct part of M affects distinct part of Y | Partial | Possible mediation | Arrow from non-overlapping M to non-overlapping Y | Defensible with theory and design |
Before estimating ACME, ask:
In the eco-label example, perceived sustainability is a strong candidate mediator for actual spending, real product choice, or later repeat purchase — those outcomes are downstream, behaviourally distinct, and share no item content with the mediator. It is a weaker mediator for hypothetical WTP because survey WTP can function as a continuous product evaluation. It is weakest when the outcome is another immediate survey evaluation, and invalid when the mediator is literally included in the outcome scale. The central question is always whether the estimated indirect effect reflects a causal process or shared measurement content.
- Pearl, J. (2001). Direct and indirect effects. Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, 411–420.
- Imai, K., Keele, L., & Yamamoto, T. (2010a). Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science, 25(1), 51–71.
- Imai, K., Keele, L., & Tingley, D. (2010b). A general approach to causal mediation analysis. Psychological Methods, 15(4), 309–334.
- Imbens, G. W. (2019). Potential outcome and directed acyclic graph approaches to causality: Relevance for empirical practice in economics. NBER Working Paper No. 26104.
Causal Structures
Now assume that X, M, and Y have passed the conceptual distinctness check above. Under that assumption, we can represent the mediation claim with a DAG.
Setup: Three DGPs
▶ Three mediation DGPs: clean normal data, large outliers, M–Y confounded
set.seed(2025)
N_med <- 350
X <- rbinom(N_med, 1, 0.5) # eco-label (1 = exposed to label)
# ── YOUR DATA: X is your binary treatment (0/1); replace rbinom() with your
# actual treatment column. N_med should equal nrow(your_df).
# Each df_sN below maps to your real data frame with columns:
# eco_label (or your treatment name), perc_sust (your mediator), WTP (outcome).
# True parameters (Scenarios 1 & 2)
a_true <- 0.80 # eco_label → perc_sust (strong a-path)
b_true <- 0.75 # perc_sust → WTP (strong b-path)
c_prime <- 0.30 # eco_label → WTP (direct path)
ACME_true <- a_true * b_true # 0.600 (large → needs big ρ to nullify → ROBUST)
# ── YOUR DATA: in real data you do NOT know a_true or b_true; these are set
# here only for ground-truth evaluation. With real data, run the
# run_mediation_triple() function directly on your data frame.
# ── Scenario 1: Clean normal data ────────────────────────────────────────────
# No violations; both OLS and ROBMED should recover ACME ≈ 0.275
M_s1 <- a_true * X + rnorm(N_med, 0, 0.60)
Y_s1 <- c_prime * X + b_true * M_s1 + rnorm(N_med, 0, 0.60)
df_s1 <- data.frame(eco_label = X, perc_sust = M_s1, WTP = Y_s1)
# ── Scenario 2: Large correlated outliers (12% contamination) ───────────────
# Correlated shocks to M and Y residuals in the same direction: outlier obs
# suggest a much steeper M→Y slope → OLS b-path inflated → OLS ACME >> 0.275
# ROBMED MM-estimator downweights these high-leverage points → stays near truth
out_n <- ceiling(N_med * 0.12)
out_idx <- sample(N_med, out_n)
out_sgn <- rep(1, out_n)
out_mag <- runif(out_n, 6, 10) # 10–17× the baseline SD: strong leverage
eps_M2 <- rnorm(N_med, 0, 0.60)
eps_Y2 <- rnorm(N_med, 0, 0.60)
eps_M2[out_idx] <- eps_M2[out_idx] + out_sgn * out_mag # M outlier
eps_Y2[out_idx] <- eps_Y2[out_idx] + out_sgn * out_mag * 0.90 # correlated Y outlier
M_s2 <- a_true * X + eps_M2
Y_s2 <- c_prime * X + b_true * M_s2 + eps_Y2
df_s2 <- data.frame(eco_label = X, perc_sust = M_s2, WTP = Y_s2)
# ── CHECK (Scenario 2): plot perc_sust vs WTP and colour outlier points (out_idx)
# red — if a small cluster of points visually drives the OLS slope, ROBMED
# is important for your data; inspect with plot(df_s2$perc_sust, df_s2$WTP).
# ── Scenario 3: M–Y confounded via unobserved U* (true ACME = 0) ─────────────
# U* drives both perc_sust and WTP; there is NO causal M → Y path
# OLS and ROBMED both find spurious ACME (Type I error) because they cannot
# separate causal M→Y from U-induced M–Y correlation.
# U* coefficients are deliberately modest (0.28, 0.32) so the induced residual
# correlation ρ ≈ 0.22 — Imai's medsens() correctly identifies this low ρ* as
# FRAGILE, in sharp contrast to Scenario 1's ρ* ≈ 0.80.
U <- rnorm(N_med, 0, 1)
M_s3 <- a_true * X + 0.28 * U + rnorm(N_med, 0, 0.55)
Y_s3 <- c_prime * X + 0.32 * U + rnorm(N_med, 0, 0.55) # no M_s3 term!
df_s3 <- data.frame(eco_label = X, perc_sust = M_s3, WTP = Y_s3)
ACME_true_s3 <- 0 # no causal M → Y
# ── YOUR DATA: Scenario 3 represents the most dangerous real-world failure —
# an unmeasured variable (e.g., prior eco-engagement, health consciousness)
# drives both your mediator and outcome. Always run medsens() on your real
# data and report the critical ρ* alongside the ACME estimate.
cat(sprintf("N = %d | Treatment rate = %.0f%%\n", N_med, 100 * mean(X)))N = 350 | Treatment rate = 50%
▶ Three mediation DGPs: clean normal data, large outliers, M–Y confounded
cat(sprintf("Scenarios 1 & 2: true ACME = %.3f (a=%.2f, b=%.2f, c'=%.2f)\n",
ACME_true, a_true, b_true, c_prime))Scenarios 1 & 2: true ACME = 0.600 (a=0.80, b=0.75, c'=0.30)
▶ Three mediation DGPs: clean normal data, large outliers, M–Y confounded
cat(sprintf("Scenario 3: true ACME = %.3f (U* confounds both M and Y)\n",
ACME_true_s3))Scenario 3: true ACME = 0.000 (U* confounds both M and Y)
Estimating All Three Methods
▶ Run OLS delta method, ROBMED, and Imai on all three scenarios
# ── YOUR DATA: to apply this to your own data, call run_mediation_triple()
# with a data frame that has columns named eco_label (treatment), perc_sust
# (mediator), and WTP (outcome). Rename your columns to match, or edit the
# column references inside the function to match your variable names.
run_mediation_triple <- function(df, R = 500) {
m_m <- lm(perc_sust ~ eco_label, data = df)
m_y <- lm(WTP ~ eco_label + perc_sust, data = df)
# ── KEY ARGS: m_m is the a-path model (treatment → mediator);
# m_y is the b-path / outcome model (treatment + mediator → outcome).
# Both must share the same data frame. Add covariates to both models
# if confounders of the M→Y relationship are available (e.g., age, income).
# 1. OLS: a × b with delta-method CI
a_hat <- coef(m_m)["eco_label"]
b_hat <- coef(m_y)["perc_sust"]
ols_est <- a_hat * b_hat
se_ab <- sqrt(b_hat^2 * vcov(m_m)["eco_label","eco_label"] +
a_hat^2 * vcov(m_y)["perc_sust","perc_sust"])
ols_lo <- ols_est - 1.96 * se_ab
ols_hi <- ols_est + 1.96 * se_ab
# 2. ROBMED: MM-robust regression (downweights high-leverage outliers)
# ── KEY ARGS: WTP ~ m(perc_sust) + eco_label specifies outcome ~ m(mediator) +
# treatment; the m() wrapper identifies the mediator for robmed.
# method="regression" uses regression-based mediation; robust=TRUE activates
# the MM-estimator. R= controls bootstrap replications — 500 for exploration,
# 2000+ for final publication figures.
set.seed(99)
med_rob <- test_mediation(WTP ~ m(perc_sust) + eco_label,
data = df, method = "regression", robust = TRUE, R = R)
rob <- rob_extract(med_rob)
# 3. Imai et al.: quasi-Bayesian (used for medsens() sensitivity analysis)
# ── KEY ARGS: treat= names the treatment column; mediator= names the mediator;
# boot=FALSE uses quasi-Bayesian simulation (faster); sims= controls the
# number of draws — increase to 1000 for publication.
set.seed(99)
med_imai <- mediate(m_m, m_y, treat = "eco_label", mediator = "perc_sust",
boot = FALSE, sims = R)
list(a_hat = a_hat, b_hat = b_hat,
ols_est = ols_est, ols_lo = ols_lo, ols_hi = ols_hi,
rob_est = rob$est, rob_lo = rob$lo, rob_hi = rob$hi,
imai_est = med_imai$d0, imai_lo = med_imai$d0.ci[1],
imai_hi = med_imai$d0.ci[2],
med_imai = med_imai, med_rob = med_rob)
}
cat("Running three mediation scenarios (OLS + ROBMED + Imai, 500 sims each)...\n")Running three mediation scenarios (OLS + ROBMED + Imai, 500 sims each)...
▶ Run OLS delta method, ROBMED, and Imai on all three scenarios
res_s1 <- run_mediation_triple(df_s1)
res_s2 <- run_mediation_triple(df_s2)
res_s3 <- run_mediation_triple(df_s3)
# ── CHECK: compare ols_est vs. rob_est — if they differ substantially (> 0.05
# in standardised units), outliers are influential; prefer ROBMED.
# Always follow up with medsens() (see sensitivity chunk) to report ρ*.
cat(sprintf("Scenario 1 — OLS: %.3f ROBMED: %.3f (truth = %.3f)\n",
res_s1$ols_est, res_s1$rob_est, ACME_true))Scenario 1 — OLS: 0.594 ROBMED: 0.567 (truth = 0.600)
▶ Run OLS delta method, ROBMED, and Imai on all three scenarios
cat(sprintf("Scenario 2 — OLS: %.3f ROBMED: %.3f (truth = %.3f)\n",
res_s2$ols_est, res_s2$rob_est, ACME_true))Scenario 2 — OLS: 0.926 ROBMED: 0.596 (truth = 0.600)
▶ Run OLS delta method, ROBMED, and Imai on all three scenarios
cat(sprintf("Scenario 3 — OLS: %.3f ROBMED: %.3f (truth = %.3f [no M→Y path])\n",
res_s3$ols_est, res_s3$rob_est, ACME_true_s3))Scenario 3 — OLS: 0.165 ROBMED: 0.183 (truth = 0.000 [no M→Y path])
Scenario 1: Clean Normal Data — All Methods Should Work
With textbook normal errors and no confounding, OLS, ROBMED, and Imai all estimate ACME accurately. This is the baseline case each method was designed for.
| Method | a & b paths | Estimate | 95% CI | Bias | Verdict |
|---|---|---|---|---|---|
| OLS delta method | a=0.811, b=0.732 | 0.594 | [0.467, 0.720] | -0.006 | On target |
| ROBMED (MM-estimator) | – (robust MM) | 0.567 | [0.449, 0.697] | -0.033 | On target |
| Imai (quasi-Bayes) | – (uses OLS) | 0.594 | [0.478, 0.731] | -0.006 | On target |
All three methods recover the true ACME accurately here because the data meet all modelling assumptions. This interpretation depends on treating perceived sustainability and WTP as distinct enough constructs. If WTP is functioning as a disguised product evaluation scale — a continuous “how much do I like this product” rating in dollar form — the mediation interpretation is weaker, because the b-path may be driven partly by shared evaluative content rather than a genuine causal process. A behavioural outcome (actual spending, real choice) would give the mediation claim stronger discriminant validity. We use WTP throughout because it is easy to simulate continuously; in a real design, consider whether a more behaviourally distinct outcome is feasible.
Scenario 2: Large Outliers — ROBMED’s Advantage
Twelve percent of participants show large, correlated deviations in both perceived sustainability (M) and WTP (Y) in the same direction. These high-leverage observations suggest a much steeper M→Y slope than the true 0.75, inflating the OLS b-path and thus ACME. The MM-estimator in ROBMED downweights points that deviate strongly from the main data cloud.
The key is direction: when participants who score unusually high on M also score unusually high on Y (same-sign shocks), they create data points that pull the OLS regression line steeply upward. To OLS, these are just informative extreme observations — it can’t distinguish “genuinely high M causes genuinely high Y” from “an external shock hit both M and Y simultaneously.”
ROBMED’s MM-estimator assigns lower weight to observations whose residuals are large relative to the bulk of the data. The outlier participants — who are far from the M–Y regression cloud — receive close to zero weight, leaving the slope estimate dominated by the non-contaminated 88%.
Why Imai doesn’t help here: mediate() calls the same lm() models internally. When those models are biased, the quasi-Bayesian sampling averages over biased posteriors. In this outlier scenario, ROBMED is the appropriate tool — Imai’s advantage lies in confounding diagnosis (Scenario 3), not outlier robustness.
| Method | b-path | Estimate | 95% CI | Bias | Verdict |
|---|---|---|---|---|---|
| OLS delta method | 1.598 (true = 0.75) | 0.926 | [0.010, 1.842] | +0.326 | ✖ Inflated — outliers pull b-path up |
| ROBMED (MM-estimator) | 1.028 (true = 0.75) | 0.596 | [0.465, 0.732] | -0.004 | ✓ Robust — outliers down-weighted |
| Imai (quasi-Bayes) | 1.598 (true = 0.75) | 0.949 | [-0.045, 1.858] | +0.349 | ✖ Inflated — uses OLS internally |
Scenario 3: M–Y Confounded — Regression-Based Methods Fail, Imai Diagnoses
U* drives both perceived sustainability and WTP. There is no causal M→Y path. The true ACME = 0. Yet both OLS and ROBMED find a large, significant “mediation effect” because they cannot distinguish causal M→Y from U-induced M–Y correlation. This is a structural identification failure, not an estimation problem — no regression method alone can fix it. But Imai’s sensitivity analysis reveals that the result is fragile.
| Method | b-path (spurious) | Estimate | 95% CI | Bias | Verdict |
|---|---|---|---|---|---|
| OLS delta method | 0.197 (true = 0.00) | 0.165 | [0.083, 0.248] | +0.165 | ✖ Type I error — spurious mediation |
| ROBMED (MM-estimator) | 0.218 (true = 0.00) | 0.183 | [0.093, 0.275] | +0.183 | ✖ Type I error — U* not outlier-based |
| Imai (quasi-Bayes) | 0.197 (true = 0.00) | 0.164 | [0.081, 0.250] | +0.164 | ✖ Type I error — see ρ* below |
Where Imai’s Framework Shines: Sensitivity Analysis
Imai et al.’s medsens() asks: how large would the correlation between M-equation and Y-equation residuals (ρ) need to be to drive ACME to zero? This critical ρ* measures fragility:
- Large ρ*: only substantial unmeasured confounding could explain away the result → robust
- Small ρ*: even modest unmeasured confounding suffices → fragile
▶ Sensitivity analysis: clean DGP (robust) vs. confounded DGP (fragile)
# ── YOUR DATA: replace res_s1$med_imai and res_s3$med_imai with the med_imai
# object returned by run_mediation_triple() on your own data frame.
# Run medsens() on every mediation model you report — it costs little time
# and is now expected by reviewers in top journals.
# ── KEY ARGS: rho.by = 0.05 evaluates sensitivity at ρ steps of 0.05 over
# [-1, 1]; effect.type = "indirect" targets the ACME specifically;
# sims = 500 controls bootstrap draws — use 1000+ for final results.
set.seed(99)
sens_s1 <- medsens(res_s1$med_imai, rho.by = 0.05, effect.type = "indirect", sims = 500)
set.seed(99)
sens_s3 <- medsens(res_s3$med_imai, rho.by = 0.05, effect.type = "indirect", sims = 500)
# ── CHECK: look at crit_rho_s1 and crit_rho_s3 printed below —
# ρ* > 0.4 is generally considered robust (substantial confounding needed);
# ρ* < 0.2 is fragile (modest confounding would nullify the result).
# Report ρ* in your manuscript alongside the ACME point estimate and CI.
safe_crit_rho <- function(s) {
tryCatch({
idx_neg <- which(s$d0 <= 0)
if (length(idx_neg) == 0) NA_real_
else round(min(abs(s$rho[idx_neg]), na.rm = TRUE), 2)
}, error = function(e) NA_real_)
}
crit_rho_s1 <- safe_crit_rho(sens_s1)
crit_rho_s3 <- safe_crit_rho(sens_s3)
fmt_rho <- function(x) if (is.na(x)) "> 1.0" else sprintf("%.2f", x)
cat(sprintf("Scenario 1 (true ACME = %.3f): critical ρ* ≈ %s → ROBUST finding\n",
ACME_true, fmt_rho(crit_rho_s1)))Scenario 1 (true ACME = 0.600): critical ρ* ≈ 0.60 → ROBUST finding
▶ Sensitivity analysis: clean DGP (robust) vs. confounded DGP (fragile)
cat(sprintf("Scenario 3 (true ACME = 0): critical ρ* ≈ %s → FRAGILE finding\n",
fmt_rho(crit_rho_s3)))Scenario 3 (true ACME = 0): critical ρ* ≈ 0.25 → FRAGILE finding
▶ Sensitivity analysis: clean DGP (robust) vs. confounded DGP (fragile)
cat("\nSmaller ρ* = less confounding is needed to explain away the result.\n")
Smaller ρ* = less confounding is needed to explain away the result.
▶ Sensitivity analysis: clean DGP (robust) vs. confounded DGP (fragile)
# Build tidy sensitivity curves using only the guaranteed $rho and $d0 fields
lbl_s1 <- sprintf("Scenario 1: Clean data (true ACME = %.3f)", ACME_true)
lbl_s3 <- "Scenario 3: M\u2013Y confounded (true ACME = 0)"
sens_df <- bind_rows(
data.frame(rho = sens_s1$rho,
ACME = sens_s1$d0,
Scenario = lbl_s1,
stringsAsFactors = FALSE),
data.frame(rho = sens_s3$rho,
ACME = sens_s3$d0,
Scenario = lbl_s3,
stringsAsFactors = FALSE)
)
# y-annotation positions scale with the ACME range
y_robust <- ACME_true * 0.30 # ~30% of way up from zero
y_fragile <- -ACME_true * 0.30 # ~30% of way down from zero
ggplot(sens_df, aes(x = rho, y = ACME, colour = Scenario)) +
geom_hline(yintercept = 0, colour = "grey50", linewidth = 0.9) +
geom_line(linewidth = 1.4) +
{ if (!is.na(crit_rho_s1))
list(
geom_vline(xintercept = crit_rho_s1, linetype = "dashed",
colour = clr_eco, linewidth = 0.9),
geom_vline(xintercept = -crit_rho_s1, linetype = "dashed",
colour = clr_eco, linewidth = 0.9),
annotate("text", x = crit_rho_s1 + 0.03, y = y_robust,
label = sprintf("\u03c1*\u2248%s\n(ROBUST)", fmt_rho(crit_rho_s1)),
hjust = 0, size = 3.0, colour = clr_eco, fontface = "bold")
) } +
{ if (!is.na(crit_rho_s3))
list(
geom_vline(xintercept = crit_rho_s3, linetype = "dashed",
colour = clr_ctrl, linewidth = 0.9),
geom_vline(xintercept = -crit_rho_s3, linetype = "dashed",
colour = clr_ctrl, linewidth = 0.9),
annotate("text", x = crit_rho_s3 + 0.03, y = y_fragile,
label = sprintf("\u03c1*\u2248%s\n(FRAGILE)", fmt_rho(crit_rho_s3)),
hjust = 0, size = 3.0, colour = clr_ctrl, fontface = "bold")
) } +
scale_colour_manual(
values = setNames(c(clr_eco, clr_ctrl), c(lbl_s1, lbl_s3))) +
labs(x = expression(paste("Sensitivity parameter ", rho,
" (correlation between M- and Y-equation residuals)")),
y = "ACME (indirect effect)",
colour = NULL,
title = "Imai sensitivity: how much confounding would nullify the ACME?",
subtitle = paste0("\u03c1* = critical value at which ACME crosses zero",
" \u2014 smaller \u03c1* = more fragile result")) +
theme_mod3() +
theme(legend.position = "top")Both Scenario 1 and Scenario 3 produce a statistically significant ACME from OLS — they look equally convincing at face value. The sensitivity analysis reveals very different levels of trust:
- Scenario 1 (clean): ACME reaches zero only at \(\rho^* \approx\) 0.60. Even substantial unmeasured M–Y confounding cannot explain away the result. The finding is robust.
- Scenario 3 (confounded): ACME crosses zero at \(\rho^* \approx\) 0.25. Only modest confounding is enough to nullify the result. The “significant” ACME is fragile — correctly so, since the true ACME is 0.
The Imai advantage is not a better point estimate (it uses the same linear models as OLS, so it inherits the same bias in Scenario 3). The advantage is the sensitivity analysis itself — a diagnostic that Baron–Kenny does not provide. A finding that reports both the ROBMED estimate and a large ρ* from medsens() is far more credible than one that does neither.
Comprehensive Bias Comparison Across All Scenarios
| Scenario | OLS | ROBMED | Imai medsens() |
|---|---|---|---|
| Clean normal data | ✔ Unbiased | ✔ Unbiased | ✔ Large ρ* confirms robustness |
| Large outliers | ✖ Biased upward | ✔ Downweights outliers | ✖ Inherits OLS bias |
| M–Y confounded | ✖ Type I error | ✖ Type I error | ✔ Small ρ* flags fragility |
Key lesson: ROBMED addresses estimation failures (non-normality, outliers). Imai’s sensitivity analysis addresses identification failures (unmeasured confounding). They solve different problems and should be used together.
In practice: report ROBMED as your primary ACME estimate (robustness to outliers) and medsens() to quantify fragility to unmeasured confounding. A finding that survives both tests — ROBMED agrees with OLS, and ρ* is large — is the most credible mediation result you can report.
Why Confounding Causes Type I Errors: The 2×2 Design
The three-scenario analysis above treats confounding as a special case. The 2×2 design below makes the mechanism explicit by crossing two independent dimensions:
- Rows: true mediation exists vs. does not exist (b = 0)
- Columns: M–Y confounded vs. unconfounded
The critical cell is top-right — no true mediation, but confounding is active. Both OLS and ROBMED report a large, significant indirect effect despite b = 0. This is a Type I error caused entirely by the unmeasured confounder U*, not by any true mediation.
When both U→M and U→Y are active, M and Y share a hidden common cause. The mediation model mistakes their spurious correlation for a real b path — and the bootstrap CI confidently narrows around a biased estimate, making the false positive look decisive.
The fix is not a better estimator. Imai’s medsens() reveals how fragile the finding is; ROBMED guards against estimation errors from outliers. But neither can eliminate the Type I error if U* is real and unmeasured. The only solutions are design-based: measure U*, block the back-door path by design, or use an instrument for M.
Type I Error from a Shared Common Cause
Scenarios 1–3 above assumed that perceived sustainability (M) and WTP (Y) are measuring distinct constructs and that any confounding between them would come from an identifiable external variable. This section examines what happens when M and Y share a latent common cause — a factor that simultaneously inflates both, producing a spurious M–Y correlation that the b-path picks up as if it were causal.
The mechanism. Suppose there is a latent factor η (“eco-identity”) that independently causes both high perceived sustainability ratings and high WTP. The researcher does not measure η — they only see M and Y. When η is active, M and Y are correlated even when there is no causal M → Y path. The mediation model mistakes this shared-cause correlation for evidence of a b-path. The true ACME = 0, but OLS reports a large, significant indirect effect.
Scenario 3 showed that an unobserved variable U* causing both M and Y produces a spurious ACME — a Type I error that neither OLS nor ROBMED can eliminate. The simulation in this section is structurally identical. The only difference is the narrative frame: Scenario 3 calls the shared cause a “confounding variable” (prior eco-engagement); here we call it a “shared latent construct” (eco-identity). Both produce the same observed data: the same r(M, Y), the same a-path, the same b-path, the same ACME. With a single measure of M and a single measure of Y, you cannot distinguish these three causal structures from the data alone:
- M genuinely causes Y (b ≠ 0) — what mediation is trying to detect
- M and Y share a latent common cause η (construct overlap)
- An unmeasured confounder U* independently causes both M and Y
All three produce the same pattern of associations. This is not an estimation problem; it is an identification problem.
The parameter prop_shared below is the proportion of M’s variance (and Y’s variance) attributable to η. At prop_shared = 0.40, 40% of what drives M — and 40% of what drives Y — is the same latent construct. This produces r(M, Y) ≈ 0.40: a moderate correlation that most researchers would not question. Yet with no true b-path, it generates massive Type I error.
▶ Monte Carlo: Type I error from shared latent construct (500 sims × 8 levels)
set.seed(2025)
N_SIM_DV <- 500
N_MED_DV <- 350
a_dv <- 0.80 # X → M (a-path; unchanged)
c_prime_dv<- 0.30 # X → Y direct path
# prop_shared = proportion of M and Y variance explained by shared latent η
# r(M, Y | X) ≈ prop_shared (construct-induced correlation)
# True b = 0: no causal M → Y path
prop_levels <- seq(0, 0.70, by = 0.10)
discval_sim_one <- function(prop_shared) {
load <- sqrt(prop_shared) # factor loading: M = load*η + sqrt(1-load²)*ε_M
pvec <- replicate(N_SIM_DV, {
X <- rbinom(N_MED_DV, 1, 0.5)
eta <- rnorm(N_MED_DV, 0, 1) # shared latent construct
M_obs <- a_dv * X + load * eta + sqrt(max(1 - load^2, 0)) * rnorm(N_MED_DV)
# No M term in Y: true b = 0; Y depends on η directly (not through M)
Y_obs <- c_prime_dv * X + load * eta + sqrt(max(1 - load^2, 0)) * rnorm(N_MED_DV)
df_dv <- data.frame(X = X, M = M_obs, Y = Y_obs)
m_m <- lm(M ~ X, data = df_dv)
m_y <- lm(Y ~ X + M, data = df_dv)
a_hat <- coef(m_m)["X"]
b_hat <- coef(m_y)["M"]
ab_hat <- a_hat * b_hat
se_ab <- sqrt(b_hat^2 * vcov(m_m)["X", "X"] +
a_hat^2 * vcov(m_y)["M", "M"])
2 * pnorm(-abs(ab_hat / se_ab))
})
# Also compute the empirical r(M, Y) to label the x-axis
X_ex <- rbinom(5000, 1, 0.5)
eta_ex<- rnorm(5000)
M_ex <- a_dv * X_ex + load * eta_ex + sqrt(max(1 - load^2, 0)) * rnorm(5000)
Y_ex <- c_prime_dv * X_ex + load * eta_ex + sqrt(max(1 - load^2, 0)) * rnorm(5000)
r_my <- cor(M_ex, Y_ex)
data.frame(prop_shared = prop_shared, pval = pvec, r_my = r_my)
}
sim_dv <- map_dfr(prop_levels, discval_sim_one)
cat(sprintf("Discriminant validity simulation: %d levels × %d sims = %d total runs.\n",
length(prop_levels), N_SIM_DV, nrow(sim_dv)))Discriminant validity simulation: 8 levels × 500 sims = 4000 total runs.
| % variance from eta | r(M, Y) | HTMT ~ r(M,Y) | Type I error | Relative to 5% |
|---|---|---|---|---|
| 0% | 0.06 | 0.06 | 5% | 1.0x |
| 10% | 0.16 | 0.16 | 42% | 8.4x |
| 20% | 0.23 | 0.23 | 97% | 19.4x |
| 30% | 0.34 | 0.34 | 100% | 20.0x |
| 40% | 0.43 | 0.43 | 100% | 20.0x |
| 50% | 0.53 | 0.53 | 100% | 20.0x |
| 60% | 0.60 | 0.60 | 100% | 20.0x |
| 70% | 0.68 | 0.68 | 100% | 20.0x |
A moderate r(M, Y) of 0.40 already produces Type I error around 40–50%. Most researchers would not flag r = 0.40 as a problem — it is often interpreted as evidence that the mediator is correlated with the outcome, which is what the mediation model requires. Yet when that correlation is driven by a shared latent cause rather than a genuine b-path, it is entirely artefactual.
The b-path in mediation is the OLS slope of Y on M. If M and Y are both influenced by the same unmeasured source — whether you call it a confounding variable (Scenario 3) or a shared latent construct (this section) — that slope is not estimating a causal effect. It is estimating the strength of the shared influence, whatever its origin. The two framings are mathematically equivalent.
What HTMT can and cannot do. HTMT requires multiple items per construct. If M is a single-item measure and Y is a single-item measure, HTMT cannot be computed — there are no within-construct correlations to compare against. Even with multi-item scales, HTMT tests whether the constructs are measured distinctly, not whether their relationship is causal. HTMT being low (< 0.70) is evidence that the constructs do not share a common measurement dimension — a useful check that addresses the construct-overlap story. But it does not rule out an unmeasured third variable that causes both constructs independently, which is the Scenario 3 story. Both problems generate the same statistical bias; neither is diagnosable from the b-path alone.
Practical implications:
- With single-item M and Y: you have no discriminant validity information at all, and the observed r(M, Y) is entirely ambiguous — you cannot separate the causal b-path from any shared-cause bias. The Imai sensitivity parameter ρ* is your only quantitative handle on fragility; small ρ* means the finding collapses under even modest confounding.
- With multi-item scales: run CFA and report HTMT(M, Y). Values above 0.70 are a warning; above 0.85, the apparent ACME is more parsimoniously explained by construct overlap than a real mechanism. Even passing HTMT, always run
medsens()— good discriminant validity does not eliminate the Scenario 3 confounding problem. - The most reliable remedy is design-based: measure the hypothesised shared cause and include it as a covariate, or use an experimental causal chain design that manipulates M directly (Part 5). Statistical methods can diagnose fragility but cannot recover identification that the design never provided.
Summary of Methods
| Method | Key identifying assumption | Target estimand | When to use |
|---|---|---|---|
| Randomised Experiment | Random assignment; SUTVA | ATE | Can randomise |
| Linear Regression Adjustment | All confounders measured; linear functional form | ATE | Observational; confounders measured; linear DGP |
| Flexible Regression Adjustment | All confounders measured; correct non-linear specification | ATE | Observational; confounders measured; non-linear DGP |
| IPW (stabilised) | All confounders measured; positivity (overlap) | ATE | Observational; want ATE; good PS overlap |
| Entropy Balancing (WeightIt) | All confounders measured; moment balance sufficient | ATE | Observational; want ATE; many covariates to balance |
| Covariate Matching (Mahalanobis) | All confounders measured; sufficient overlap | ATT | Observational; want ATT; sufficient control units |
| Propensity Score Matching | All confounders measured; PS model correctly specified | ATT | Observational; many covariates; report PS model sensitivity |
| Doubly Robust (AIPW) | Either PS model OR outcome model is correctly specified | ATE | Observational; insurance against one misspecified model |
| Synthetic Control | Good pre-treatment fit; no interference | ATT (one treated unit) | Few treated units; many pre-treatment periods |
| Regression Discontinuity | Continuity at cutoff; no manipulation; no other discontinuities | LATE (near threshold) | Sharp threshold in a running variable |
| Difference-in-Differences (TWFE) | Parallel trends (in absence of treatment) | ATT | Panel data; policy change in subset of units |
| Synthetic DiD | Parallel trends after reweighting control units | ATT | Panel data; parallel trends may not hold exactly |
| Mediation (Baron–Kenny OLS) | Sequential ignorability; no M–Y confounders | ACME | Mediation hypothesis; well-controlled experiment |
| Mediation (Imai et al.) | Sequential ignorability; sensitivity analysis quantifies fragility | ACME | Mediation with explicit assumptions + sensitivity analysis |
| ROBMED | Sequential ignorability; robust to outliers and non-normality | ACME | Mediation; non-normal WTP data or suspected outliers |