Some clarificatins of anova() and summary ()
On Dec 14, 2008, at 9:40 AM, Tanmoy Talukdar wrote:
[sorry for the repost. I forgot to switch off formatting last time] I have two assignment problems... I have written this small code for regression with two regressors .
For replication purposes, it might be good to set a seed for the random number generation. set.seed(127)
n <- 50 x1 <- runif(n,1,10) x2 <- x1 + rnorm(n,0,0.5) plot(x1,x2) # x1 and x2 strongly correlated cor(x1,x2) y <- 3 + 0.5*x1 + 1.1*x2 + rnorm(n,0,2) intact.lm <- lm(y ~ x1 + x2) summary(intact.lm) anova(intact.lm)
You should also run anova on these models: intact21 <- lm(y~x2+x1) intact12 <- lm(y~x1+x2)
the questions are 1.The function summary() is convenient since the result does not depend on the order the variables are listed in the linear model definition. It has a serious downside though which is obvious in this case. Are there any signficant variables left? 2. An anova(intact.lm) table shows how much the second variable contributes to the result in addition to the first. Is there a variable significant now?Is the second variable significant?
Both anova and summary were in agreement that the P-value for addition of x2 ito a model that already 1ncluded x1 is 0.0296. One of them uses the t statistic and the other used the F statistic. I am not sure where your confusion lies.
David Winsemius > > > the results i got: > >> summary(intact.lm) > > Call: > lm(formula = y ~ x1 + x2) > > Residuals: > Min 1Q Median 3Q Max > -5.5824 -1.5314 -0.1568 1.4425 5.3374 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 3.4857 0.9354 3.726 0.000521 *** > x1 0.2537 0.6117 0.415 0.680191 > x2 1.3517 0.6025 2.244 0.029608 * > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 2.34 on 47 degrees of freedom > Multiple R-squared: 0.7483, Adjusted R-squared: 0.7376 > F-statistic: 69.87 on 2 and 47 DF, p-value: 8.315e-15 > >> anova(intact.lm) > Analysis of Variance Table > > Response: y > Df Sum Sq Mean Sq F value Pr(>F) > x1 1 737.86 737.86 134.7129 2.11e-15 *** > x2 1 27.57 27.57 5.0338 0.02961 * > Residuals 47 257.43 5.48 > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > > > my question is that , i cant see any "serious downside" in using > summary (). And in the second question I am totally clueless. I need > your help > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.