The Logic of Randomization and What P-Values Actually Tell You
Overview
Null hypothesis significance testing is everywhere in management research, yet its logic is widely misunderstood. This module builds from the ground up: why does randomization justify causal claims? What does a p-value actually mean? And what are the hidden assumptions — about units, stimuli, and assignment mechanisms — that most researchers ignore?
We cover the permutation logic of p-values, Type I and Type II error, power, the multiple comparisons problem, researcher degrees of freedom, and the design features (between vs. within, stimulus sampling, and Latin squares) that change what you can conclude.
Learning Goals
By the end of this module you should be able to:
Derive the sampling distribution of a test statistic from the randomization distribution, not from asymptotic theory
Explain the difference between a two-sided p-value and a posterior probability
Calculate and interpret statistical power, and explain why most social science studies are underpowered
Identify researcher degrees of freedom and explain how they inflate false-positive rates
Explain the rationale for stimulus sampling and design studies that treat stimuli as random effects
Construct a Latin square design and explain when it is preferable to a fully crossed factorial
Shows that a well-known fluency–risk effect disappears entirely when tested on new stimuli — a direct empirical demonstration of why treating stimuli as fixed is a validity threat
Hands-on tutorial for SEM using lavaan, including path models and model fit assessment
What This Tutorial Is About
There is a single thread running through this tutorial and its predecessor:
Your observable may not accurately reflect the latent construct you need.
In Module 1, that problem lived in your outcome variable Y — your scale picked up variance from constructs you never intended to measure. Here, the same problem appears in two places at once: in your treatment variable X, and in the act of randomization that is supposed to make X interpretable.
This tutorial covers four ideas:
Part 1 — P-values: What they actually mean, where their inferential power comes from, and what assumptions they require — demonstrated by building null distributions from scratch using permutation.
Part 2 — Randomization: Why “clicking randomize” (the observable) is not the same as “achieving randomization” (the latent property), and why this gap grows as the construct Y you are studying becomes broader and more complex.
Part 3 — Selection Effects: Why some data-collection environments make exchangeability structurally impossible, regardless of how randomization is conducted. Non-representativeness, self-selection, survivorship, attrition, and exclusion bias are not failures of randomization — they are failures that occur before, during, or after data collection, and they cannot be fixed by collecting more data within the same design.
Part 4 — The Exclusion Restriction: Just as a measured outcome Y can fail discriminant validity by absorbing multiple constructs, a manipulated treatment X can inject multiple independent signals into the system simultaneously. When assignment is random, the treatment coefficient still estimates the causal effect of the bundle participants actually received. What becomes ambiguous is the narrower mechanistic claim: the coefficient cannot tell you whether the effect came from eco-certification, the premium packaging, demand cues, or their interaction — and that mechanism ambiguity is the experimental analogue of discriminant invalidity in measurement.
NoteLearning Objectives
By the end of this tutorial, you will be able to:
Explain what a p-value is — and is not — using simulation rather than formulas
Build a null distribution from scratch using label permutation
Identify the three ingredients that give a p-value inferential power
Explain why a treatment manipulation can violate discriminant validity in the same way a measurement scale can
Use open-ended text to map the constructs injected by an experimental manipulation and those driving an outcome
Articulate the difference between observable randomization and latent randomization using the construct-validity language from Module 1
Calculate the minimum observations needed for approximate orthogonality given the complexity of Y
Run diagnostic checks to assess whether your observed randomization achieved approximate orthogonality
Identify the four classes of selection effect (non-representativeness, self-selection, survivorship/attrition, exclusion bias) and explain why each makes exchangeability structurally unachievable within the affected sample
Distinguish survivorship bias (missing observations never entered the sample) from attrition bias (observations entered the sample but left non-randomly)
Diagnose differential attrition across conditions and explain why conditioning on a post-randomization event breaks exchangeability even in a properly randomized experiment