-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Wensui Liu
Sent: Saturday, February 18, 2006 3:03 PM
To: John Fox
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Question about variable selection
Dear John,
I fully understand your point that a IV might not be
significantly correlated with DV in bivariate situation but
might be significantly correlated with DV with the presense
of other IVs. But does this significant partial relationship
reflect the true relation between IV and DV and really help
to predict DV?
From here, let's go one step further. If I do multiple
resampling from
original dataset, build bivariate LM between IV and DV with
different samples, and still can't get significant result, do
you think I should give a chance to this IV by looking at its
partial relationship with DV?
Thank you so much!
On 2/18/06, John Fox <jfox at mcmaster.ca> wrote:
Dear Wensui and Andy,
When the explanatory variables are correlated it's
for the marginal relationship between and X and Y to be zero and a
partial relationship nonzero (even in the absence of
this is simply a reflection of the more general point that
marginal relationships can differ.
Regards,
John
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Wensui Liu
Sent: Saturday, February 18, 2006 2:03 PM
To: Liaw, Andy
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] Question about variable selection
Thank you so much for your reply, Andy.
But what if I am only interesed in main effects instead of
interactions?
On 2/18/06, Liaw, Andy <andy_liaw at merck.com> wrote:
That depends on whether the IV could have some significant
interactions with other Ivs not considered in the bivariate
iv <- expand.grid(-2:2, -2:2)
y <- 3 + iv[,1] * iv[,2] + rnorm(nrow(iv), sd=0.1)
Call:
lm(formula = y ~ iv[, 1])
Residuals:
Min 1Q Median 3Q Max
-4.06259 -1.06048 -0.02377 1.05901 4.04315
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.01908 0.41482 7.278 2.09e-07 ***
iv[, 1] 0.01417 0.29332 0.048 0.962
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.074 on 23 degrees of freedom Multiple
R-Squared: 0.0001014, Adjusted R-squared: -0.04337
F-statistic: 0.002333 on 1 and 23 DF, p-value: 0.9619
summary(lm(y ~ iv[,1] * iv[,2]))
Call:
lm(formula = y ~ iv[, 1] * iv[, 2])
Residuals:
Min 1Q Median 3Q Max
-0.22390 -0.08894 -0.01279 0.13525 0.17608
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.019083 0.026330 114.665 <2e-16 ***
iv[, 1] 0.014167 0.018618 0.761 0.455
iv[, 2] -0.005486 0.018618 -0.295 0.771
iv[, 1]:iv[, 2] 0.992865 0.013165 75.418 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1316 on 21 degrees of freedom
Multiple R-Squared: 0.9963, Adjusted R-squared: 0.9958
F-statistic: 1896 on 3 and 21 DF, p-value: < 2.2e-16
Andy
From: Wensui Liu
Dear Lister,
I have a question about variable selection for regression.
if the IV is not significantly related to DV in the bivariate
analysis, does it make sense to include this IV into the
with multiple IVs?
Thank you so much!
[[alternative HTML version deleted]]