Part 4: Mediation Analysis

Mediation analysis asks whether the eco-label affects WTP directly or through a mechanism — here, perceived sustainability (M). Three estimators are compared across three scenarios that each challenge a different method.

Scenario	Key feature	Which method performs best in this scenario
1: Clean normal data	No violations — a baseline	All methods perform well
2: Large outliers	Correlated high-leverage points inflate OLS b-path	ROBMED recovers the true effect
3: M–Y confounded	Unobserved U causes both M and Y; true ACME = 0	Imai sensitivity flags the fragility

The mediation arithmetic — and where it can break

Baron–Kenny mediation decomposes X’s total effect on Y into two paths:

\[\underbrace{X \xrightarrow{\;a\;} M \xrightarrow{\;b\;} Y}_{\text{indirect (ACME) = } a \times b} \quad + \quad \underbrace{X \xrightarrow{\;c'\;} Y}_{\text{direct (ADE)}}\]

The ACME (Average Causal Mediation Effect) is the product \(a \times b\). It has two failure modes that the three scenarios below are specifically designed to reveal:

The b-path can be estimated on the wrong data — if outliers inflate the apparent slope between M and Y, \(\hat{b}\) is too large, and so is \(\hat{a} \times \hat{b}\). ROBMED addresses this.
The b-path can be estimated on the wrong model — if an unobserved variable \(U^*\) drives both M and Y, the OLS \(\hat{b}\) picks up the \(U^*\) signal as if it were causal M→Y. No regression method alone can fix this; Imai’s sensitivity analysis quantifies how severe the problem would need to be to explain away the result.

A Module 1 failure mode: There is a third, often overlooked failure mode — measurement error in M. If perceived sustainability is measured as a single noisy item rather than a well-validated multi-item scale (Module 1 — reliability, CFA, convergent validity), the \(\hat{a}\) path is attenuated (like any regression coefficient on a mismeasured predictor) and the \(\hat{b}\) path is biased as well, because M carries both signal and noise into the Y regression. The practical implication: investing in high-quality mediator measurement at the design stage (multi-item constructs, established scales, pilot reliability checks) is not just good practice — it is a precondition for trustworthy mediation estimates. A large ACME from a single-item mediator may be entirely an artifact of differential measurement error.

A Module 2 connection: Mediation analysis is typically conducted on experimental data where X is randomised (as in Scenarios 1–3 here). This randomisation satisfies the a-path assumption (no confounding of X→M). But randomisation does not satisfy the b-path assumption — M is not randomised, so the M→Y step faces all the observational identification challenges of Part 2 of this module. The sensitivity parameter \(\rho^*\) below is a direct quantification of how much M–Y confounding would be needed to nullify the ACME, just as power analysis in Module 2 quantified how much sampling error could explain an apparent effect.

Before we draw the mediation DAG, we need to ask whether the proposed mediator and outcome are actually distinct causal variables.

Before Drawing the DAG: Mediation or Measurement?

Causal mediation is a causal claim about a process, not merely a statistical claim about correlated variables. The core assertion is that X changes M, and M changes Y. This requires that M and Y be defined as causal variables — entities with well-defined counterfactual states whose relationship involves influence, not just co-occurrence. Pearl (2001) defines mediation in terms of path-specific effects along well-specified causal structures. Imai, Keele, and Yamamoto (2010a) show that causal mediation estimands require assumptions connecting the observed mediator and outcome to counterfactual quantities — assumptions that are not satisfied simply by observing that M and Y are correlated. Imbens (2019) frames the contrast between potential-outcome and directed-acyclic-graph approaches as a question about defining causal effects rather than associations among measured variables.

The practical implication is direct: mediation requires that the mediator be capable of causing the outcome. This is different from saying that the mediator and outcome are highly related, or that they are two ways of measuring the same underlying reaction. If perceived sustainability is one item inside an overall product evaluation scale, the association between perceived sustainability and overall product evaluation may be partly mechanical — a product of how the scale was constructed, not of a causal process. In that case, the analysis may be decomposing a measurement scale rather than estimating a causal mechanism.

However, overlap does not automatically invalidate mediation. Many real constructs overlap conceptually and empirically. The key question is whether the proposed M → Y path operates through the part of M that is merely shared with Y, or through a part of M that is distinct from Y. If the link between M and Y exists only because of shared construct content, the result is measurement overlap, not mechanism. If the distinct part of M plausibly changes a distinct part of Y, mediation may still be possible even when M and Y are moderately correlated.

Overlap is a warning sign, not an automatic veto

A mediator can be related to the outcome. It can even partially overlap with the outcome conceptually. The problem arises when the estimated mediator–outcome relationship is driven primarily by the shared part of the constructs. In that case, the “indirect effect” is not evidence of a causal mechanism — it is evidence that the researcher measured similar content twice and called the correlation a pathway.

DAGs assume well-defined variables

A simple mediation DAG treats X, M, and Y as construct-level causal variables. It does not know whether “perceived sustainability” and “overall product evaluation” are distinct constructs or overlapping indicators of the same latent evaluation. If M and Y are mostly shared measurement content, the DAG is misspecified. The problem is not solved by drawing arrows.

Five cases: from strong mediation to measurement overlap

The cases below illustrate the spectrum from clean causal mediation to pure measurement overlap. Each is shown with a DAG representing the causal claim and a Venn diagram representing conceptual and measurement overlap between M and Y.

Case 1: Clean mediation, low conceptual overlap

X = eco-label exposure · M = perceived sustainability · Y = actual spending in an incentive-aligned field experiment

graph LR
  X[Eco-label exposure] --> M[Perceived sustainability]
  M --> Y[Actual spending]
  X --> Y

Low overlap. The mediator and outcome are conceptually distinct. This is the clearest mediation case.

This is the strongest mediation design. The mediator is a belief or perception; the outcome is real economic behaviour — how much of an endowment the participant actually spends on the product, or whether they choose it when it has a real cost. The two constructs share little conceptual territory, and the b-path is unlikely to be driven by shared item content because spending is not a survey rating.

Case 2: Plausible but weaker mediation, moderate overlap

X = eco-label exposure · M = perceived sustainability · Y = hypothetical WTP in an online survey

graph LR
  X[Eco-label exposure] --> M[Perceived sustainability]
  M --> Y[Hypothetical WTP]
  X --> Y

Moderate overlap. Hypothetical WTP may partly reflect the same evaluative response translated into a dollar amount.

This is the design used throughout this module, and it can still support a mediation claim — but it is weaker than actual spending. In an online survey, hypothetical WTP can function like a continuous version of overall product evaluation: both capture “how much do I like this product.” The mediation interpretation is more convincing when WTP is incentive-compatible, consequential, or tied to a real choice. We use WTP below because it is easy to simulate; in a real design, a behavioural spending or choice outcome would give the mediation claim stronger discriminant validity.

Case 3: Mostly measurement overlap, high conceptual overlap

X = eco-label exposure · M = perceived sustainability · Y = overall product evaluation (good, high quality, responsible, eco-friendly, sustainable — Likert items)

graph LR
  X[Eco-label exposure] --> M[Perceived sustainability]
  M -. questionable .-> Y[Overall product evaluation]
  X --> Y

High overlap. The M → Y association may reflect shared item content rather than a causal mechanism.

If the M → Y link is mainly driven by the shared green evaluative content, this is measurement overlap rather than mediation. The eco-label changes a general green evaluation factor, and both perceived sustainability and overall product evaluation are indicators of that factor — not cause and effect. The dashed arrow signals that the b-path is questionable, not established.

Case 4: Not mediation — M is inside Y

X = eco-label exposure · M = perceived sustainability · Y = a sustainability index that includes perceived sustainability as one of its component items

graph LR
  X[Eco-label exposure] --> M[Perceived sustainability]
  M -. not causal .-> Y[Sustainability index]
  X --> Y

Sub-measure case. The mediator is literally part of the outcome measure.

This is not a mediation design. The b-path is partly mechanical because Y includes M as an input — M will always predict Y regardless of any causal relationship. The indirect effect cannot be interpreted as evidence that perceived sustainability caused the index outcome. It reflects, at least in part, how the index was constructed.

Case 5: Overlap exists, but mediation may still be possible

X = eco-label exposure · M = perceived sustainability · Y = overall product evaluation — where the researcher theorises that the non-overlapping component of M (a distinct environmental belief) affects a non-overlapping component of Y (perceived durability, expected social approval, or anticipated brand reliability)

graph LR
  X[Eco-label exposure] --> M[Perceived sustainability]
  M --> Y[Distinct product judgment]
  X --> Y

Overlap does not automatically rule out mediation. The causal claim is strongest when the M → Y path runs through the distinct part of M, not the shared region.

This is the key conceptual point. If perceived sustainability predicts overall product evaluation only because both contain the same “green evaluation” content, that is measurement overlap. But if the distinct part of perceived sustainability — the belief that this product has genuinely lower environmental impact — changes a distinct downstream judgment such as perceived durability or expected social approval, mediation may still be defensible. The researcher must justify this theoretically and, where possible, demonstrate it through careful measurement design.

The Venn diagrams above represent conceptual and measurement overlap. The DAGs represent causal claims. A valid mediation argument requires both: a plausible causal DAG, and enough discriminant validity that M and Y are not merely duplicate measures of the same construct.

Case	Conceptual overlap	DAG interpretation	Venn interpretation	Verdict
Perceived sustainability → actual spending	Low	Plausible mediation	Mostly separate constructs	Strong mediation case
Perceived sustainability → hypothetical WTP	Moderate	Plausible but weaker	Some shared evaluation content	Use caution; incentive-align if possible
Perceived sustainability → overall product evaluation	High	Questionable M → Y path	Large shared construct region	Often measurement overlap
Perceived sustainability as item inside sustainability index	Complete / sub-measure	Invalid as mediation	M contained inside Y	Not mediation
Distinct part of M affects distinct part of Y	Partial	Possible mediation	Arrow from non-overlapping M to non-overlapping Y	Defensible with theory and design

Before estimating ACME, ask:

Is the mediator a cause of the outcome, or is it part of how the outcome is measured?
Does the M → Y relationship come from shared item wording, shared scale content, or shared latent meaning?
What part of M is supposed to change Y — the shared part or the distinct part?
Would a CFA or discriminant validity test separate M and Y as distinct constructs?
Could you remove the mediator item from the outcome scale and still have a meaningful outcome?
Is the outcome behavioural, incentive-aligned, delayed, or otherwise downstream from M?
Does the temporal ordering clearly support X → M → Y?
Would the mediation claim still make sense if M and Y were measured with entirely different methods?

In the eco-label example, perceived sustainability is a strong candidate mediator for actual spending, real product choice, or later repeat purchase — those outcomes are downstream, behaviourally distinct, and share no item content with the mediator. It is a weaker mediator for hypothetical WTP because survey WTP can function as a continuous product evaluation. It is weakest when the outcome is another immediate survey evaluation, and invalid when the mediator is literally included in the outcome scale. The central question is always whether the estimated indirect effect reflects a causal process or shared measurement content.

References

Pearl, J. (2001). Direct and indirect effects. Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, 411–420.
Imai, K., Keele, L., & Yamamoto, T. (2010a). Identification, inference and sensitivity analysis for causal mediation effects. Statistical Science, 25(1), 51–71.
Imai, K., Keele, L., & Tingley, D. (2010b). A general approach to causal mediation analysis. Psychological Methods, 15(4), 309–334.
Imbens, G. W. (2019). Potential outcome and directed acyclic graph approaches to causality: Relevance for empirical practice in economics. NBER Working Paper No. 26104.

Causal Structures

Now assume that X, M, and Y have passed the conceptual distinctness check above. Under that assumption, we can represent the mediation claim with a DAG.

Setup: Three DGPs

▶ Three mediation DGPs: clean normal data, large outliers, M–Y confounded

set.seed(2025)
N_med <- 350
X     <- rbinom(N_med, 1, 0.5)   # eco-label (1 = exposed to label)
# ── YOUR DATA: X is your binary treatment (0/1); replace rbinom() with your
#    actual treatment column. N_med should equal nrow(your_df).
#    Each df_sN below maps to your real data frame with columns:
#    eco_label (or your treatment name), perc_sust (your mediator), WTP (outcome).

# True parameters (Scenarios 1 & 2)
a_true    <- 0.80   # eco_label → perc_sust  (strong a-path)
b_true    <- 0.75   # perc_sust → WTP        (strong b-path)
c_prime   <- 0.30   # eco_label → WTP (direct path)
ACME_true <- a_true * b_true   # 0.600  (large → needs big ρ to nullify → ROBUST)
# ── YOUR DATA: in real data you do NOT know a_true or b_true; these are set
#    here only for ground-truth evaluation. With real data, run the
#    run_mediation_triple() function directly on your data frame.

# ── Scenario 1: Clean normal data ────────────────────────────────────────────
# No violations; both OLS and ROBMED should recover ACME ≈ 0.275
M_s1   <- a_true * X + rnorm(N_med, 0, 0.60)
Y_s1   <- c_prime * X + b_true * M_s1 + rnorm(N_med, 0, 0.60)
df_s1  <- data.frame(eco_label = X, perc_sust = M_s1, WTP = Y_s1)

# ── Scenario 2: Large correlated outliers (12% contamination) ───────────────
# Correlated shocks to M and Y residuals in the same direction: outlier obs
# suggest a much steeper M→Y slope → OLS b-path inflated → OLS ACME >> 0.275
# ROBMED MM-estimator downweights these high-leverage points → stays near truth
out_n   <- ceiling(N_med * 0.12)
out_idx <- sample(N_med, out_n)
out_sgn <- rep(1, out_n)
out_mag <- runif(out_n, 6, 10)   # 10–17× the baseline SD: strong leverage
eps_M2  <- rnorm(N_med, 0, 0.60)
eps_Y2  <- rnorm(N_med, 0, 0.60)
eps_M2[out_idx] <- eps_M2[out_idx] + out_sgn * out_mag          # M outlier
eps_Y2[out_idx] <- eps_Y2[out_idx] + out_sgn * out_mag * 0.90   # correlated Y outlier
M_s2   <- a_true * X + eps_M2
Y_s2   <- c_prime * X + b_true * M_s2 + eps_Y2
df_s2  <- data.frame(eco_label = X, perc_sust = M_s2, WTP = Y_s2)
# ── CHECK (Scenario 2): plot perc_sust vs WTP and colour outlier points (out_idx)
#    red — if a small cluster of points visually drives the OLS slope, ROBMED
#    is important for your data; inspect with plot(df_s2$perc_sust, df_s2$WTP).

# ── Scenario 3: M–Y confounded via unobserved U* (true ACME = 0) ─────────────
# U* drives both perc_sust and WTP; there is NO causal M → Y path
# OLS and ROBMED both find spurious ACME (Type I error) because they cannot
# separate causal M→Y from U-induced M–Y correlation.
# U* coefficients are deliberately modest (0.28, 0.32) so the induced residual
# correlation ρ ≈ 0.22 — Imai's medsens() correctly identifies this low ρ* as
# FRAGILE, in sharp contrast to Scenario 1's ρ* ≈ 0.80.
U      <- rnorm(N_med, 0, 1)
M_s3   <- a_true * X + 0.28 * U + rnorm(N_med, 0, 0.55)
Y_s3   <- c_prime * X + 0.32 * U + rnorm(N_med, 0, 0.55)   # no M_s3 term!
df_s3  <- data.frame(eco_label = X, perc_sust = M_s3, WTP = Y_s3)
ACME_true_s3 <- 0   # no causal M → Y
# ── YOUR DATA: Scenario 3 represents the most dangerous real-world failure —
#    an unmeasured variable (e.g., prior eco-engagement, health consciousness)
#    drives both your mediator and outcome. Always run medsens() on your real
#    data and report the critical ρ* alongside the ACME estimate.

cat(sprintf("N = %d  |  Treatment rate = %.0f%%\n", N_med, 100 * mean(X)))

N = 350  |  Treatment rate = 50%

▶ Three mediation DGPs: clean normal data, large outliers, M–Y confounded

cat(sprintf("Scenarios 1 & 2: true ACME = %.3f  (a=%.2f, b=%.2f, c'=%.2f)\n",
            ACME_true, a_true, b_true, c_prime))

Scenarios 1 & 2: true ACME = 0.600  (a=0.80, b=0.75, c'=0.30)

▶ Three mediation DGPs: clean normal data, large outliers, M–Y confounded

cat(sprintf("Scenario 3:      true ACME = %.3f  (U* confounds both M and Y)\n",
            ACME_true_s3))

Scenario 3:      true ACME = 0.000  (U* confounds both M and Y)

Estimating All Three Methods

▶ Run OLS delta method, ROBMED, and Imai on all three scenarios

# ── YOUR DATA: to apply this to your own data, call run_mediation_triple()
#    with a data frame that has columns named eco_label (treatment), perc_sust
#    (mediator), and WTP (outcome). Rename your columns to match, or edit the
#    column references inside the function to match your variable names.
run_mediation_triple <- function(df, R = 500) {
  m_m <- lm(perc_sust ~ eco_label, data = df)
  m_y <- lm(WTP ~ eco_label + perc_sust, data = df)
  # ── KEY ARGS: m_m is the a-path model (treatment → mediator);
  #    m_y is the b-path / outcome model (treatment + mediator → outcome).
  #    Both must share the same data frame. Add covariates to both models
  #    if confounders of the M→Y relationship are available (e.g., age, income).

  # 1. OLS: a × b with delta-method CI
  a_hat   <- coef(m_m)["eco_label"]
  b_hat   <- coef(m_y)["perc_sust"]
  ols_est <- a_hat * b_hat
  se_ab   <- sqrt(b_hat^2 * vcov(m_m)["eco_label","eco_label"] +
                  a_hat^2 * vcov(m_y)["perc_sust","perc_sust"])
  ols_lo  <- ols_est - 1.96 * se_ab
  ols_hi  <- ols_est + 1.96 * se_ab

  # 2. ROBMED: MM-robust regression (downweights high-leverage outliers)
  # ── KEY ARGS: WTP ~ m(perc_sust) + eco_label specifies outcome ~ m(mediator) +
  #    treatment; the m() wrapper identifies the mediator for robmed.
  #    method="regression" uses regression-based mediation; robust=TRUE activates
  #    the MM-estimator. R= controls bootstrap replications — 500 for exploration,
  #    2000+ for final publication figures.
  set.seed(99)
  med_rob  <- test_mediation(WTP ~ m(perc_sust) + eco_label,
                             data = df, method = "regression", robust = TRUE, R = R)
  rob      <- rob_extract(med_rob)

  # 3. Imai et al.: quasi-Bayesian (used for medsens() sensitivity analysis)
  # ── KEY ARGS: treat= names the treatment column; mediator= names the mediator;
  #    boot=FALSE uses quasi-Bayesian simulation (faster); sims= controls the
  #    number of draws — increase to 1000 for publication.
  set.seed(99)
  med_imai <- mediate(m_m, m_y, treat = "eco_label", mediator = "perc_sust",
                      boot = FALSE, sims = R)

  list(a_hat    = a_hat,          b_hat    = b_hat,
       ols_est  = ols_est,        ols_lo   = ols_lo,       ols_hi  = ols_hi,
       rob_est  = rob$est,        rob_lo   = rob$lo,       rob_hi  = rob$hi,
       imai_est = med_imai$d0,    imai_lo  = med_imai$d0.ci[1],
       imai_hi  = med_imai$d0.ci[2],
       med_imai = med_imai,       med_rob  = med_rob)
}

cat("Running three mediation scenarios (OLS + ROBMED + Imai, 500 sims each)...\n")

Running three mediation scenarios (OLS + ROBMED + Imai, 500 sims each)...

▶ Run OLS delta method, ROBMED, and Imai on all three scenarios

res_s1 <- run_mediation_triple(df_s1)
res_s2 <- run_mediation_triple(df_s2)
res_s3 <- run_mediation_triple(df_s3)
# ── CHECK: compare ols_est vs. rob_est — if they differ substantially (> 0.05
#    in standardised units), outliers are influential; prefer ROBMED.
#    Always follow up with medsens() (see sensitivity chunk) to report ρ*.
cat(sprintf("Scenario 1 — OLS: %.3f  ROBMED: %.3f  (truth = %.3f)\n",
            res_s1$ols_est, res_s1$rob_est, ACME_true))

Scenario 1 — OLS: 0.594  ROBMED: 0.567  (truth = 0.600)

▶ Run OLS delta method, ROBMED, and Imai on all three scenarios

cat(sprintf("Scenario 2 — OLS: %.3f  ROBMED: %.3f  (truth = %.3f)\n",
            res_s2$ols_est, res_s2$rob_est, ACME_true))

Scenario 2 — OLS: 0.926  ROBMED: 0.596  (truth = 0.600)

▶ Run OLS delta method, ROBMED, and Imai on all three scenarios

cat(sprintf("Scenario 3 — OLS: %.3f  ROBMED: %.3f  (truth = %.3f [no M→Y path])\n",
            res_s3$ols_est, res_s3$rob_est, ACME_true_s3))

Scenario 3 — OLS: 0.165  ROBMED: 0.183  (truth = 0.000 [no M→Y path])

Scenario 1: Clean Normal Data — All Methods Should Work

With textbook normal errors and no confounding, OLS, ROBMED, and Imai all estimate ACME accurately. This is the baseline case each method was designed for.

Scenario 1 — Clean normal data | true ACME = 0.6 | N = 350
Method	a & b paths	Estimate	95% CI	Bias	Verdict
OLS delta method	a=0.811, b=0.732	0.594	[0.467, 0.720]	-0.006	On target
ROBMED (MM-estimator)	– (robust MM)	0.567	[0.449, 0.697]	-0.033	On target
Imai (quasi-Bayes)	– (uses OLS)	0.594	[0.478, 0.731]	-0.006	On target

Interpreting Scenario 1

All three methods recover the true ACME accurately here because the data meet all modelling assumptions. This interpretation depends on treating perceived sustainability and WTP as distinct enough constructs. If WTP is functioning as a disguised product evaluation scale — a continuous “how much do I like this product” rating in dollar form — the mediation interpretation is weaker, because the b-path may be driven partly by shared evaluative content rather than a genuine causal process. A behavioural outcome (actual spending, real choice) would give the mediation claim stronger discriminant validity. We use WTP throughout because it is easy to simulate continuously; in a real design, consider whether a more behaviourally distinct outcome is feasible.

Scenario 2: Large Outliers — ROBMED’s Advantage

Twelve percent of participants show large, correlated deviations in both perceived sustainability (M) and WTP (Y) in the same direction. These high-leverage observations suggest a much steeper M→Y slope than the true 0.75, inflating the OLS b-path and thus ACME. The MM-estimator in ROBMED downweights points that deviate strongly from the main data cloud.

Why correlated outliers inflate the b-path specifically

The key is direction: when participants who score unusually high on M also score unusually high on Y (same-sign shocks), they create data points that pull the OLS regression line steeply upward. To OLS, these are just informative extreme observations — it can’t distinguish “genuinely high M causes genuinely high Y” from “an external shock hit both M and Y simultaneously.”

ROBMED’s MM-estimator assigns lower weight to observations whose residuals are large relative to the bulk of the data. The outlier participants — who are far from the M–Y regression cloud — receive close to zero weight, leaving the slope estimate dominated by the non-contaminated 88%.

Why Imai doesn’t help here: mediate() calls the same lm() models internally. When those models are biased, the quasi-Bayesian sampling averages over biased posteriors. In this outlier scenario, ROBMED is the appropriate tool — Imai’s advantage lies in confounding diagnosis (Scenario 3), not outlier robustness.

Scenario 2 — 12% correlated outlier contamination | true ACME = 0.600 | N = 350
Method	b-path	Estimate	95% CI	Bias	Verdict
OLS delta method	1.598 (true = 0.75)	0.926	[0.010, 1.842]	+0.326	✖ Inflated — outliers pull b-path up
ROBMED (MM-estimator)	1.028 (true = 0.75)	0.596	[0.465, 0.732]	-0.004	✓ Robust — outliers down-weighted
Imai (quasi-Bayes)	1.598 (true = 0.75)	0.949	[-0.045, 1.858]	+0.349	✖ Inflated — uses OLS internally

Scenario 3: M–Y Confounded — Regression-Based Methods Fail, Imai Diagnoses

U* drives both perceived sustainability and WTP. There is no causal M→Y path. The true ACME = 0. Yet both OLS and ROBMED find a large, significant “mediation effect” because they cannot distinguish causal M→Y from U-induced M–Y correlation. This is a structural identification failure, not an estimation problem — no regression method alone can fix it. But Imai’s sensitivity analysis reveals that the result is fragile.

Scenario 3 — M–Y confounded via U* | true ACME = 0.000 (no M→Y path) | N = 350
Method	b-path (spurious)	Estimate	95% CI	Bias	Verdict
OLS delta method	0.197 (true = 0.00)	0.165	[0.083, 0.248]	+0.165	✖ Type I error — spurious mediation
ROBMED (MM-estimator)	0.218 (true = 0.00)	0.183	[0.093, 0.275]	+0.183	✖ Type I error — U* not outlier-based
Imai (quasi-Bayes)	0.197 (true = 0.00)	0.164	[0.081, 0.250]	+0.164	✖ Type I error — see ρ* below

Where Imai’s Framework Shines: Sensitivity Analysis

Imai et al.’s medsens() asks: how large would the correlation between M-equation and Y-equation residuals (ρ) need to be to drive ACME to zero? This critical ρ* measures fragility:

Large ρ*: only substantial unmeasured confounding could explain away the result → robust
Small ρ*: even modest unmeasured confounding suffices → fragile

▶ Sensitivity analysis: clean DGP (robust) vs. confounded DGP (fragile)

# ── YOUR DATA: replace res_s1$med_imai and res_s3$med_imai with the med_imai
#    object returned by run_mediation_triple() on your own data frame.
#    Run medsens() on every mediation model you report — it costs little time
#    and is now expected by reviewers in top journals.
# ── KEY ARGS: rho.by = 0.05 evaluates sensitivity at ρ steps of 0.05 over
#    [-1, 1]; effect.type = "indirect" targets the ACME specifically;
#    sims = 500 controls bootstrap draws — use 1000+ for final results.
set.seed(99)
sens_s1 <- medsens(res_s1$med_imai, rho.by = 0.05, effect.type = "indirect", sims = 500)
set.seed(99)
sens_s3 <- medsens(res_s3$med_imai, rho.by = 0.05, effect.type = "indirect", sims = 500)
# ── CHECK: look at crit_rho_s1 and crit_rho_s3 printed below —
#    ρ* > 0.4 is generally considered robust (substantial confounding needed);
#    ρ* < 0.2 is fragile (modest confounding would nullify the result).
#    Report ρ* in your manuscript alongside the ACME point estimate and CI.

safe_crit_rho <- function(s) {
  tryCatch({
    idx_neg <- which(s$d0 <= 0)
    if (length(idx_neg) == 0) NA_real_
    else round(min(abs(s$rho[idx_neg]), na.rm = TRUE), 2)
  }, error = function(e) NA_real_)
}
crit_rho_s1 <- safe_crit_rho(sens_s1)
crit_rho_s3 <- safe_crit_rho(sens_s3)

fmt_rho <- function(x) if (is.na(x)) "> 1.0" else sprintf("%.2f", x)
cat(sprintf("Scenario 1 (true ACME = %.3f): critical ρ* ≈ %s  → ROBUST finding\n",
            ACME_true, fmt_rho(crit_rho_s1)))

Scenario 1 (true ACME = 0.600): critical ρ* ≈ 0.60  → ROBUST finding

▶ Sensitivity analysis: clean DGP (robust) vs. confounded DGP (fragile)

cat(sprintf("Scenario 3 (true ACME = 0):     critical ρ* ≈ %s  → FRAGILE finding\n",
            fmt_rho(crit_rho_s3)))

Scenario 3 (true ACME = 0):     critical ρ* ≈ 0.25  → FRAGILE finding

▶ Sensitivity analysis: clean DGP (robust) vs. confounded DGP (fragile)

cat("\nSmaller ρ* = less confounding is needed to explain away the result.\n")


Smaller ρ* = less confounding is needed to explain away the result.

▶ Sensitivity analysis: clean DGP (robust) vs. confounded DGP (fragile)

# Build tidy sensitivity curves using only the guaranteed $rho and $d0 fields
lbl_s1 <- sprintf("Scenario 1: Clean data (true ACME = %.3f)", ACME_true)
lbl_s3 <- "Scenario 3: M\u2013Y confounded (true ACME = 0)"

sens_df <- bind_rows(
  data.frame(rho      = sens_s1$rho,
             ACME     = sens_s1$d0,
             Scenario = lbl_s1,
             stringsAsFactors = FALSE),
  data.frame(rho      = sens_s3$rho,
             ACME     = sens_s3$d0,
             Scenario = lbl_s3,
             stringsAsFactors = FALSE)
)

# y-annotation positions scale with the ACME range
y_robust  <-  ACME_true * 0.30   # ~30% of way up from zero
y_fragile <- -ACME_true * 0.30   # ~30% of way down from zero

ggplot(sens_df, aes(x = rho, y = ACME, colour = Scenario)) +
  geom_hline(yintercept = 0, colour = "grey50", linewidth = 0.9) +
  geom_line(linewidth = 1.4) +
  { if (!is.na(crit_rho_s1))
      list(
        geom_vline(xintercept =  crit_rho_s1, linetype = "dashed",
                   colour = clr_eco, linewidth = 0.9),
        geom_vline(xintercept = -crit_rho_s1, linetype = "dashed",
                   colour = clr_eco, linewidth = 0.9),
        annotate("text", x = crit_rho_s1 + 0.03, y = y_robust,
                 label = sprintf("\u03c1*\u2248%s\n(ROBUST)", fmt_rho(crit_rho_s1)),
                 hjust = 0, size = 3.0, colour = clr_eco, fontface = "bold")
      ) } +
  { if (!is.na(crit_rho_s3))
      list(
        geom_vline(xintercept =  crit_rho_s3, linetype = "dashed",
                   colour = clr_ctrl, linewidth = 0.9),
        geom_vline(xintercept = -crit_rho_s3, linetype = "dashed",
                   colour = clr_ctrl, linewidth = 0.9),
        annotate("text", x = crit_rho_s3 + 0.03, y = y_fragile,
                 label = sprintf("\u03c1*\u2248%s\n(FRAGILE)", fmt_rho(crit_rho_s3)),
                 hjust = 0, size = 3.0, colour = clr_ctrl, fontface = "bold")
      ) } +
  scale_colour_manual(
    values = setNames(c(clr_eco, clr_ctrl), c(lbl_s1, lbl_s3))) +
  labs(x       = expression(paste("Sensitivity parameter ", rho,
                                  "  (correlation between M- and Y-equation residuals)")),
       y       = "ACME (indirect effect)",
       colour  = NULL,
       title   = "Imai sensitivity: how much confounding would nullify the ACME?",
       subtitle = paste0("\u03c1* = critical value at which ACME crosses zero",
                         "  \u2014  smaller \u03c1* = more fragile result")) +
  theme_mod3() +
  theme(legend.position = "top")

Reading the sensitivity plot

Both Scenario 1 and Scenario 3 produce a statistically significant ACME from OLS — they look equally convincing at face value. The sensitivity analysis reveals very different levels of trust:

Scenario 1 (clean): ACME reaches zero only at \(\rho^* \approx\) 0.60. Even substantial unmeasured M–Y confounding cannot explain away the result. The finding is robust.
Scenario 3 (confounded): ACME crosses zero at \(\rho^* \approx\) 0.25. Only modest confounding is enough to nullify the result. The “significant” ACME is fragile — correctly so, since the true ACME is 0.

The Imai advantage is not a better point estimate (it uses the same linear models as OLS, so it inherits the same bias in Scenario 3). The advantage is the sensitivity analysis itself — a diagnostic that Baron–Kenny does not provide. A finding that reports both the ROBMED estimate and a large ρ* from medsens() is far more credible than one that does neither.

Comprehensive Bias Comparison Across All Scenarios

Method × scenario decision guide

Scenario	OLS	ROBMED	Imai `medsens()`
Clean normal data	✔ Unbiased	✔ Unbiased	✔ Large ρ* confirms robustness
Large outliers	✖ Biased upward	✔ Downweights outliers	✖ Inherits OLS bias
M–Y confounded	✖ Type I error	✖ Type I error	✔ Small ρ* flags fragility

Key lesson: ROBMED addresses estimation failures (non-normality, outliers). Imai’s sensitivity analysis addresses identification failures (unmeasured confounding). They solve different problems and should be used together.

In practice: report ROBMED as your primary ACME estimate (robustness to outliers) and medsens() to quantify fragility to unmeasured confounding. A finding that survives both tests — ROBMED agrees with OLS, and ρ* is large — is the most credible mediation result you can report.

Why Confounding Causes Type I Errors: The 2×2 Design

The three-scenario analysis above treats confounding as a special case. The 2×2 design below makes the mechanism explicit by crossing two independent dimensions:

Rows: true mediation exists vs. does not exist (b = 0)
Columns: M–Y confounded vs. unconfounded

The critical cell is top-right — no true mediation, but confounding is active. Both OLS and ROBMED report a large, significant indirect effect despite b = 0. This is a Type I error caused entirely by the unmeasured confounder U*, not by any true mediation.

The Type I error mechanism in one sentence

When both U→M and U→Y are active, M and Y share a hidden common cause. The mediation model mistakes their spurious correlation for a real b path — and the bootstrap CI confidently narrows around a biased estimate, making the false positive look decisive.

The fix is not a better estimator. Imai’s medsens() reveals how fragile the finding is; ROBMED guards against estimation errors from outliers. But neither can eliminate the Type I error if U* is real and unmeasured. The only solutions are design-based: measure U*, block the back-door path by design, or use an instrument for M.

Type I Error from a Shared Common Cause

Scenarios 1–3 above assumed that perceived sustainability (M) and WTP (Y) are measuring distinct constructs and that any confounding between them would come from an identifiable external variable. This section examines what happens when M and Y share a latent common cause — a factor that simultaneously inflates both, producing a spurious M–Y correlation that the b-path picks up as if it were causal.

The mechanism. Suppose there is a latent factor η (“eco-identity”) that independently causes both high perceived sustainability ratings and high WTP. The researcher does not measure η — they only see M and Y. When η is active, M and Y are correlated even when there is no causal M → Y path. The mediation model mistakes this shared-cause correlation for evidence of a b-path. The true ACME = 0, but OLS reports a large, significant indirect effect.

This is the same structural problem as Scenario 3

Scenario 3 showed that an unobserved variable U* causing both M and Y produces a spurious ACME — a Type I error that neither OLS nor ROBMED can eliminate. The simulation in this section is structurally identical. The only difference is the narrative frame: Scenario 3 calls the shared cause a “confounding variable” (prior eco-engagement); here we call it a “shared latent construct” (eco-identity). Both produce the same observed data: the same r(M, Y), the same a-path, the same b-path, the same ACME. With a single measure of M and a single measure of Y, you cannot distinguish these three causal structures from the data alone:

M genuinely causes Y (b ≠ 0) — what mediation is trying to detect
M and Y share a latent common cause η (construct overlap)
An unmeasured confounder U* independently causes both M and Y

All three produce the same pattern of associations. This is not an estimation problem; it is an identification problem.

The parameter prop_shared below is the proportion of M’s variance (and Y’s variance) attributable to η. At prop_shared = 0.40, 40% of what drives M — and 40% of what drives Y — is the same latent construct. This produces r(M, Y) ≈ 0.40: a moderate correlation that most researchers would not question. Yet with no true b-path, it generates massive Type I error.

▶ Monte Carlo: Type I error from shared latent construct (500 sims × 8 levels)

set.seed(2025)
N_SIM_DV  <- 500
N_MED_DV  <- 350
a_dv      <- 0.80   # X → M (a-path; unchanged)
c_prime_dv<- 0.30   # X → Y direct path

# prop_shared = proportion of M and Y variance explained by shared latent η
# r(M, Y | X) ≈ prop_shared (construct-induced correlation)
# True b = 0: no causal M → Y path
prop_levels <- seq(0, 0.70, by = 0.10)

discval_sim_one <- function(prop_shared) {
  load <- sqrt(prop_shared)   # factor loading: M = load*η + sqrt(1-load²)*ε_M
  pvec <- replicate(N_SIM_DV, {
    X   <- rbinom(N_MED_DV, 1, 0.5)
    eta <- rnorm(N_MED_DV, 0, 1)              # shared latent construct

    M_obs <- a_dv * X + load * eta + sqrt(max(1 - load^2, 0)) * rnorm(N_MED_DV)
    # No M term in Y: true b = 0; Y depends on η directly (not through M)
    Y_obs <- c_prime_dv * X + load * eta + sqrt(max(1 - load^2, 0)) * rnorm(N_MED_DV)

    df_dv <- data.frame(X = X, M = M_obs, Y = Y_obs)
    m_m   <- lm(M ~ X,     data = df_dv)
    m_y   <- lm(Y ~ X + M, data = df_dv)

    a_hat   <- coef(m_m)["X"]
    b_hat   <- coef(m_y)["M"]
    ab_hat  <- a_hat * b_hat
    se_ab   <- sqrt(b_hat^2 * vcov(m_m)["X", "X"] +
                    a_hat^2 * vcov(m_y)["M", "M"])
    2 * pnorm(-abs(ab_hat / se_ab))
  })

  # Also compute the empirical r(M, Y) to label the x-axis
  X_ex  <- rbinom(5000, 1, 0.5)
  eta_ex<- rnorm(5000)
  M_ex  <- a_dv * X_ex + load * eta_ex + sqrt(max(1 - load^2, 0)) * rnorm(5000)
  Y_ex  <- c_prime_dv * X_ex + load * eta_ex + sqrt(max(1 - load^2, 0)) * rnorm(5000)
  r_my  <- cor(M_ex, Y_ex)

  data.frame(prop_shared = prop_shared, pval = pvec, r_my = r_my)
}

sim_dv <- map_dfr(prop_levels, discval_sim_one)
cat(sprintf("Discriminant validity simulation: %d levels × %d sims = %d total runs.\n",
            length(prop_levels), N_SIM_DV, nrow(sim_dv)))

Discriminant validity simulation: 8 levels × 500 sims = 4000 total runs.

Mediation Type I error by discriminant validity failure: true b = 0, 500 sims per level, N = 350. HTMT ≈ r(M,Y) for perfectly reliable single-item composites.
% variance from eta	r(M, Y)	HTMT ~ r(M,Y)	Type I error	Relative to 5%
0%	0.06	0.06	5%	1.0x
10%	0.16	0.16	42%	8.4x
20%	0.23	0.23	97%	19.4x
30%	0.34	0.34	100%	20.0x
40%	0.43	0.43	100%	20.0x
50%	0.53	0.53	100%	20.0x
60%	0.60	0.60	100%	20.0x
70%	0.68	0.68	100%	20.0x

What this means for mediation practice

A moderate r(M, Y) of 0.40 already produces Type I error around 40–50%. Most researchers would not flag r = 0.40 as a problem — it is often interpreted as evidence that the mediator is correlated with the outcome, which is what the mediation model requires. Yet when that correlation is driven by a shared latent cause rather than a genuine b-path, it is entirely artefactual.

The b-path in mediation is the OLS slope of Y on M. If M and Y are both influenced by the same unmeasured source — whether you call it a confounding variable (Scenario 3) or a shared latent construct (this section) — that slope is not estimating a causal effect. It is estimating the strength of the shared influence, whatever its origin. The two framings are mathematically equivalent.

What HTMT can and cannot do. HTMT requires multiple items per construct. If M is a single-item measure and Y is a single-item measure, HTMT cannot be computed — there are no within-construct correlations to compare against. Even with multi-item scales, HTMT tests whether the constructs are measured distinctly, not whether their relationship is causal. HTMT being low (< 0.70) is evidence that the constructs do not share a common measurement dimension — a useful check that addresses the construct-overlap story. But it does not rule out an unmeasured third variable that causes both constructs independently, which is the Scenario 3 story. Both problems generate the same statistical bias; neither is diagnosable from the b-path alone.

Practical implications:

With single-item M and Y: you have no discriminant validity information at all, and the observed r(M, Y) is entirely ambiguous — you cannot separate the causal b-path from any shared-cause bias. The Imai sensitivity parameter ρ* is your only quantitative handle on fragility; small ρ* means the finding collapses under even modest confounding.
With multi-item scales: run CFA and report HTMT(M, Y). Values above 0.70 are a warning; above 0.85, the apparent ACME is more parsimoniously explained by construct overlap than a real mechanism. Even passing HTMT, always run medsens() — good discriminant validity does not eliminate the Scenario 3 confounding problem.
The most reliable remedy is design-based: measure the hypothesised shared cause and include it as a covariate, or use an experimental causal chain design that manipulates M directly (Part 5). Statistical methods can diagnose fragility but cannot recover identification that the design never provided.

Summary of Methods

All methods in Module 3: assumptions, estimands, and use cases
Method	Key identifying assumption	Target estimand	When to use
Randomised Experiment	Random assignment; SUTVA	ATE	Can randomise
Linear Regression Adjustment	All confounders measured; linear functional form	ATE	Observational; confounders measured; linear DGP
Flexible Regression Adjustment	All confounders measured; correct non-linear specification	ATE	Observational; confounders measured; non-linear DGP
IPW (stabilised)	All confounders measured; positivity (overlap)	ATE	Observational; want ATE; good PS overlap
Entropy Balancing (WeightIt)	All confounders measured; moment balance sufficient	ATE	Observational; want ATE; many covariates to balance
Covariate Matching (Mahalanobis)	All confounders measured; sufficient overlap	ATT	Observational; want ATT; sufficient control units
Propensity Score Matching	All confounders measured; PS model correctly specified	ATT	Observational; many covariates; report PS model sensitivity
Doubly Robust (AIPW)	Either PS model OR outcome model is correctly specified	ATE	Observational; insurance against one misspecified model
Synthetic Control	Good pre-treatment fit; no interference	ATT (one treated unit)	Few treated units; many pre-treatment periods
Regression Discontinuity	Continuity at cutoff; no manipulation; no other discontinuities	LATE (near threshold)	Sharp threshold in a running variable
Difference-in-Differences (TWFE)	Parallel trends (in absence of treatment)	ATT	Panel data; policy change in subset of units
Synthetic DiD	Parallel trends after reweighting control units	ATT	Panel data; parallel trends may not hold exactly
Mediation (Baron–Kenny OLS)	Sequential ignorability; no M–Y confounders	ACME	Mediation hypothesis; well-controlled experiment
Mediation (Imai et al.)	Sequential ignorability; sensitivity analysis quantifies fragility	ACME	Mediation with explicit assumptions + sensitivity analysis
ROBMED	Sequential ignorability; robust to outliers and non-normality	ACME	Mediation; non-normal WTP data or suspected outliers

Researcher Checklist: Causal Mediation Analysis

Key questions about your mediation claim

Is M genuinely intermediate — does it temporally follow X and causally precede Y? Cross-sectional designs where X, M, and Y are measured simultaneously cannot establish mediation.
Can you defend sequential ignorability — no unmeasured M→Y confounders? Run Imai’s sensitivity analysis and report ρ*: how large would residual confounding need to be to reduce the ACME to zero?
Is the mediator measured reliably and discriminantly valid from the outcome? Measurement error in M attenuates the a-path and biases the b-path. If you have multi-item scales, report HTMT(M, Y): values above 0.70 are a warning that construct overlap may be producing a spurious b-path. With single-item M and Y, HTMT cannot be computed — the observed r(M, Y) is ambiguous between genuine causation and shared latent cause. This is structurally identical to M–Y confounding (Scenario 3); use Imai sensitivity and treat small ρ* as a red flag.
Are you distinguishing a significant indirect effect from a proven causal mechanism? Statistical mediation is consistent with M→Y confounding. The sensitivity analysis quantifies the ambiguity — report it.