Skip to content
Prev 78014 / 398502 Next

p-level in packages mgcv and gam

On Wed, 28 Sep 2005, Denis Chabot wrote:

            
Yes, you can. And this procedure gives you incorrect p-values.

  They may not be very incorrect -- it depends on how much model selection 
you do, and how strongly the feature you are selecting on is related to 
the one you are testing.

For example, using step() to choose a polynomial in x even when x is 
unrelated to y and z inflates the Type I error rate by giving a biased 
estimate of the residual mean squared error:

once<-function(){
   y<-rnorm(50);x<-runif(50);z<-rep(0:1,25)
   summary(step(lm(y~z),
         scope=list(lower=~z,upper=~z+x+I(x^2)+I(x^3)+I(x^4)),
         trace=0))$coef["z",4]
  }
[1] 0.072

which is significantly higher than you would expect for an honest level 
0.05 test.

 	-thomas