Question about variable selection
That depends on whether the IV could have some significant interactions with other Ivs not considered in the bivariate analysis. E.g.,
iv <- expand.grid(-2:2, -2:2) y <- 3 + iv[,1] * iv[,2] + rnorm(nrow(iv), sd=0.1) summary(lm(y ~ iv[,1]))
Call:
lm(formula = y ~ iv[, 1])
Residuals:
Min 1Q Median 3Q Max
-4.06259 -1.06048 -0.02377 1.05901 4.04315
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.01908 0.41482 7.278 2.09e-07 ***
iv[, 1] 0.01417 0.29332 0.048 0.962
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.074 on 23 degrees of freedom
Multiple R-Squared: 0.0001014, Adjusted R-squared: -0.04337
F-statistic: 0.002333 on 1 and 23 DF, p-value: 0.9619
summary(lm(y ~ iv[,1] * iv[,2]))
Call:
lm(formula = y ~ iv[, 1] * iv[, 2])
Residuals:
Min 1Q Median 3Q Max
-0.22390 -0.08894 -0.01279 0.13525 0.17608
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.019083 0.026330 114.665 <2e-16 ***
iv[, 1] 0.014167 0.018618 0.761 0.455
iv[, 2] -0.005486 0.018618 -0.295 0.771
iv[, 1]:iv[, 2] 0.992865 0.013165 75.418 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1316 on 21 degrees of freedom
Multiple R-Squared: 0.9963, Adjusted R-squared: 0.9958
F-statistic: 1896 on 3 and 21 DF, p-value: < 2.2e-16
Andy
From: Wensui Liu
Dear Lister, I have a question about variable selection for regression. if the IV is not significantly related to DV in the bivariate analysis, does it make sense to include this IV into the full model with multiple IVs? Thank you so much! [[alternative HTML version deleted]]
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html