regression towards the mean, AS paper November 2007
On Dec 17, 2007 3:10 PM, hadley wickham <h.wickham at gmail.com> wrote:
This has nothing to do really with the question that Troels asked,
but the exposition quoted from the AA paper is unnecessarily confusing.
The phrase ``Because X0 and X1 have identical marginal
distributions ...''
throws the reader off the track. The identical marginal distributions
are irrelevant. All one needs is that the ***means*** of X0 and X1
be the same, and then the null hypothesis tested by a paired t-test
is true and so the p-values are (asymptotically) Uniform[0,1]. With
a sample size of 100, the ``asymptotically'' bit can be safely ignored
for any ``decent'' joint distribution of X0 and X1. If one further
assumes that X0 - X1 is Gaussian (which has nothing to do with X0 and
X1 having identical marginal distributions) then ``asymptotically''
turns into ``exactly''.
Another related issue is that uniform distributions don't look very uniform: hist(runif(100)) hist(runif(1000)) hist(runif(10000)) Be sure to calibrate your eyes (and your bin width) before rejecting the hypothesis that the distribution is uniform. Hadley
Thanks for the example, Hadley. To me, this suggests we should stop
teaching histograms in Stat 101 and instead use quantile plots, which
give excellent results for n=100 and even surprisingly good results
for n=10:
par(mfrow=c(2,2))
for(i in c(10, 100, 1000, 10000)) {
qqplot(runif(i), qunif(seq(1/i, 1, length=i)), main=i,
xlim=c(0,1), ylim=c(0,1),
xlab="runif", ylab="Uniform distribution quantiles")
abline(0,1,col="lightgray")
}
Kevin (drifting even further off topic)