Good measurement is the foundation of good research, yet it is the step most often skipped. This module asks: does your scale actually measure the construct you intend? — and introduces the tools to find out.
We cover discriminant validity, omitted variable bias from measurement error, measurement invariance across groups, latent class structure, and outlier detection. Each topic is treated both conceptually and with live R code.
2.2 Learning Goals
By the end of this module you should be able to:
Distinguish construct validity from reliability, and explain why reliability alone is insufficient
Diagnose discriminant validity problems using CFA and AVE/correlation comparisons
Explain how classical measurement error biases regression coefficients and in which direction
Test for measurement invariance across groups using configural, metric, and scalar CFA models
Identify latent class structure using mixture models and interpret class profiles
Detect and reason about outliers using Cook’s D, DFFITS, and leverage diagnostics
Challenges the construct validity of the Implicit Association Test — argues that IAT scores conflate racial preference with unrelated cognitive associations, making the construct unclear
Tutorial on longitudinal SEM, measurement invariance testing, and growth curve models
2.4 What This Tutorial Is About
There is a single idea running through everything in this tutorial:
Y is supposed to reflect X — but it actually reflects X plus something else.
This problem shows up in five different parts that marketing researchers encounter regularly:
Part 1 — Discriminant Validity: Your Green Purchase Intention scale is supposed to measure purchase intention (X) — but its items also pick up environmental concern (something else). The scale cannot be told apart from a different construct.
Part 2 — Omitted Variable Bias: In your secondary data, observed sales (Y) is supposed to reflect advertising spend (X) — but sales also reflects store quality (something else, which you never measured). Your estimated advertising effect is inflated.
Part 3 — Measurement Non-Invariance: Your GPI items are supposed to reflect green purchase intention (X) across both experimental conditions — but two of the items shift upward in the treatment condition for reasons unrelated to the latent construct. The items don’t mean the same thing in both groups.
Part 4 — Latent Subgroups: Your sample contains two very different types of consumers — Green Champions and Price Skeptics — who respond to green marketing completely differently. When you aggregate across both groups, your outcome variable reflects a mixture of two different data-generating processes. The treatment effect you estimate doesn’t apply cleanly to anyone.
Part 5 — Outliers: Your dataset contains a cluster of luxury flagship stores that operate on completely different economics from regular stores. Including them distorts every coefficient in your model. Collective outliers are a version of the latent-subgroup problem; point outliers are a different statistical challenge. Both can be handled without deleting data.
The statistical symptoms differ across these five parts, but the underlying logic is identical: your outcome variable is contaminated by variance that belongs to something other than the construct or cause you are trying to study.
NoteLearning Objectives
By the end of this tutorial, you will be able to:
Diagnose discriminant validity failures using HTMT and the DVI method (Pieters et al., 2025)
Identify omitted variable bias through residual diagnostics and coefficient sensitivity analysis
Test measurement invariance across groups using a configural → metric → scalar sequence in lavaan
Use modification indices to pinpoint which specific items are non-invariant
Use Gaussian mixture models (via mclust) to detect latent subgroups that distort your outcome
Apply Local Outlier Factor (LOF) and DBSCAN to identify collective and point outliers
Apply robust regression (MASS::rlm) as an alternative to deleting outliers
Explain the conceptual link between all five problems to a non-technical audience
2.5 Packages Needed
Install and load the required packages. If you haven’t installed them before, run the install.packages() lines first.
▶ Load required packages
# Uncomment to install if needed:# install.packages(c("lavaan", "semTools", "MASS", "ggplot2",# "dplyr", "tidyr", "corrplot", "knitr", "lmtest",# "mclust", "dbscan"))library(lavaan) # CFA and SEM (Parts 1 and 3)library(semTools) # htmt() and auxiliary SEM tools (Parts 1 and 3)library(MASS) # mvrnorm(): generate multivariate normal data (all parts)library(ggplot2) # Visualizations (all parts)library(dplyr) # Data manipulation (all parts)library(tidyr) # Data reshaping (Parts 2 and 3)library(corrplot) # Correlation heatmap (Part 1)library(knitr) # Nicely formatted tables (all parts)library(lmtest) # Breusch-Pagan heteroskedasticity test (Part 2)library(mclust) # Gaussian mixture models / latent class analysis (Part 4)library(dbscan) # Local outlier factor and DBSCAN clustering (Part 5)