Skip to content

permutated p values vs. normal p values

2 messages · Steve Adams, Bert Gunter

#
Hi, I am performing Cox proportional hazards
regression on a microarray dataset with 15000 genes.
The p values generated from the Cox regression (based
on normal distribution of large sample theory) showed
only 2 genes have a p value less than 0.05. However,
when I did a permutation on the dataset to obtained
permutated p values, and it turned out about 750 genes
had a permutated p value less than 0.05 (that just
happens to be equal to the number of significant genes
you would expect by chance alone). With that big
difference in the number of significant genes, which
one should I trusted? and what's reason why such a big
difference exists? My dataset is not large in sample
size (17 samples), might this be the reason? 


Thanks
#
A **guess** ... subject to correction by others.

If you had large systematic error in your experiment, nothing will turn out
"significant" (which is what you saw).

If you permute the data so that the systematic error becomes "random",
you'll get a random number of significant p-values, which is what you saw.

If the samples came from animals (or people),-- possibly performed over time
by differnet people at diffeent labs (sites)-- large systematic error that
would overwhelm small sample size is not unusual.Lack of explicit and
careful randomization/cage effects in animal experiments/ equipment and
calibration issues are some possible sources for such error.

OTOH, what I just said might be pure nonsense, so caveat emptor.

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box