Skip to content
Prev 86776 / 398506 Next

Question about variable selection

Dear Wensui,

What you are asking about is called in psychology a "suppressor" 
variable: a predictor variable unrelated to the criterion but 
correlated with the other predictors. (X1 in the following example) 
Although it has a zero relationship with the DV, it does "really" 
help to predict the DV by removing extraneous variance from the other 
IVs.  (I am not going to touch the Wittgenstein issue of truth here). 
Should it be included in the predictor set? Yes.  Is there any easy 
way to find all possible suppressors? No.


Consider the following:

#demonstration of "suppressor effects"
library(mvtnorm)
sigma <- matrix(c(1,.5,0,.5,1,.5,0,.5,1),ncol=3)
my.data <- data.frame(rmvnorm(1000,sigma=sigma))
names(my.data) <- c("X1", "X2", "Y")
round(cor(my.data),2)
summary(lm(Y~ X1 + X2,data= my.data))

which produces
       X1   X2     Y
X1  1.00 0.45 -0.04
X2  0.45 1.00  0.51
Y  -0.04 0.51  1.00
Call:
lm(formula = Y ~ X1 + X2, data = my.data)

Residuals:
      Min       1Q   Median       3Q      Max
-2.09350 -0.58069  0.02280  0.53436  3.02017

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept)  0.02807    0.02557   1.098    0.273   
X1          -0.32849    0.02813 -11.680   <2e-16 ***
X2           0.65666    0.02861  22.951   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8081 on 997 degrees of freedom
Multiple R-Squared: 0.3465,	Adjusted R-squared: 0.3452
F-statistic: 264.4 on 2 and 997 DF,  p-value: < 2.2e-16
At 3:22 PM -0500 2/18/06, John Fox wrote:
.... (discussion of interaction from Andy Liaw)