The Logic of Randomization and What P-Values Actually Tell You
9.1 Overview
Null hypothesis significance testing is everywhere in management research, yet its logic is widely misunderstood. This module builds from the ground up: why does randomization justify causal claims? What does a p-value actually mean? And what are the hidden assumptions — about units, stimuli, and assignment mechanisms — that most researchers ignore?
We cover the permutation logic of p-values, Type I and Type II error, power, the multiple comparisons problem, researcher degrees of freedom, and the design features (between vs. within, stimulus sampling, and Latin squares) that change what you can conclude.
9.2 Learning Goals
By the end of this module you should be able to:
Derive the sampling distribution of a test statistic from the randomization distribution, not from asymptotic theory
Explain the difference between a two-sided p-value and a posterior probability
Calculate and interpret statistical power, and explain why most social science studies are underpowered
Identify researcher degrees of freedom and explain how they inflate false-positive rates
Explain the rationale for stimulus sampling and design studies that treat stimuli as random effects
Construct a Latin square design and explain when it is preferable to a fully crossed factorial
Shows that a well-known fluency–risk effect disappears entirely when tested on new stimuli — a direct empirical demonstration of why treating stimuli as fixed is a validity threat
Hands-on tutorial for SEM using lavaan, including path models and model fit assessment
9.4 What This Tutorial Is About
There is a single thread running through this tutorial and its predecessor:
Your observable may not accurately reflect the latent construct you need.
In Module 1, that problem lived in your outcome variable Y — your scale picked up variance from constructs you never intended to measure. Here, the same problem appears in two places at once: in your treatment variable X, and in the act of randomization that is supposed to make X interpretable.
This tutorial covers three ideas:
Part 1 — P-values: What they actually mean, where their inferential power comes from, and what assumptions they require — demonstrated by building null distributions from scratch using permutation.
Part 2 — The Exclusion Restriction: Just as a measured outcome Y can fail discriminant validity by absorbing multiple constructs, a manipulated treatment X can inject multiple independent signals into the system simultaneously. When this happens, your “experiment” is really a quasi-experiment — and the single coefficient on treatment conflates several distinct causal pathways.
Part 3 — Randomization: Why “clicking randomize” (the observable) is not the same as “achieving randomization” (the latent property), and why this gap grows as the construct Y you are studying becomes broader and more complex.
NoteLearning Objectives
By the end of this tutorial, you will be able to:
Explain what a p-value is — and is not — using simulation rather than formulas
Build a null distribution from scratch using label permutation
Identify the three ingredients that give a p-value inferential power
Explain why a treatment manipulation can violate discriminant validity in the same way a measurement scale can
Use open-ended text to map the constructs injected by an experimental manipulation and those driving an outcome
Articulate the difference between observable randomization and latent randomization using the construct-validity language from Module 1
Calculate the minimum observations needed for approximate orthogonality given the complexity of Y
Run diagnostic checks to assess whether your observed randomization achieved approximate orthogonality