Some clarificatins of anova() and summary ()

running anova() on intact12 and intact 21 gives two different results!!
anova(intact12)
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq F value    Pr(>F)
x1         1 663.18  663.18 203.065 < 2.2e-16 ***
x2         1  35.21   35.21  10.781  0.001940 **
Residuals 47 153.49    3.27
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(intact21)
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq  F value Pr(>F)
x2         1 698.26  698.26 213.8077 <2e-16 ***
x1         1   0.12    0.12   0.0379 0.8466
Residuals 47 153.49    3.27
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

On Sun, Dec 14, 2008 at 8:56 PM, Tanmoy Talukdar
Why do you think that running lm() twice on those two models is going
to help me?  They are identical models and hence we get identical
results.The second question is now alright. I had some
misunderstanding about it.

Please tell me if you can find any "downside " in summary (). I can't find any.

i 've edited the code for that replication  issue.

set.seed(127)
n <- 50
x1 <- runif(n,1,10)
x2 <- x1 + rnorm(n,0,0.5)
plot(x1,x2) # x1 and x2 strongly correlated
cor(x1,x2)
y <- 3 + 0.5*x1 + 1.1*x2 + rnorm(n,0,2)
intact.lm <- lm(y ~ x1 + x2)
summary(intact.lm)
anova(intact.lm)

summary(intact.lm)
Call:
lm(formula = y ~ x1 + x2)

Residuals:
  Min      1Q  Median      3Q     Max
-3.4578 -1.1326  0.4551  1.2807  4.8241

Coefficients:
          Estimate Std. Error t value Pr(>|t|)
(Intercept)  3.63603    0.61944   5.870 4.23e-07 ***
x1          -0.09555    0.49114  -0.195  0.84658
x2           1.59384    0.48542   3.283  0.00194 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.807 on 47 degrees of freedom
Multiple R-squared: 0.8198,     Adjusted R-squared: 0.8121
F-statistic: 106.9 on 2 and 47 DF,  p-value: < 2.2e-16

anova(intact.lm)
Analysis of Variance Table

Response: y
        Df Sum Sq Mean Sq F value    Pr(>F)
x1         1 663.18  663.18 203.065 < 2.2e-16 ***
x2         1  35.21   35.21  10.781  0.001940 **
Residuals 47 153.49    3.27
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

On Sun, Dec 14, 2008 at 8:26 PM, David Winsemius <dwinsemius at comcast.net> wrote:
On Dec 14, 2008, at 9:40 AM, Tanmoy Talukdar wrote:

[sorry for the repost. I forgot to switch off formatting last time]

I have two assignment problems...

I have written this small code for regression with two regressors .

For replication purposes, it might be good to set a seed for the random
number generation.

set.seed(127)
n <- 50
x1 <- runif(n,1,10)
x2 <- x1 + rnorm(n,0,0.5)
plot(x1,x2) # x1 and x2 strongly correlated
cor(x1,x2)
y <- 3 + 0.5*x1 + 1.1*x2 + rnorm(n,0,2)
intact.lm <- lm(y ~ x1 + x2)
summary(intact.lm)
anova(intact.lm)

You should also run anova on these models:

intact21 <- lm(y~x2+x1)
intact12 <- lm(y~x1+x2)

the questions are

1.The function summary() is convenient since the result does not
depend on the order the variables
are listed in the linear model definition. It has a serious downside
though which is obvious in this case.
Are there any signficant variables left?

2. An anova(intact.lm) table shows how much the second variable
contributes to the result in
addition to the first. Is there a variable significant now?Is the
second variable significant?
Both anova and summary were in agreement that the P-value for addition of x2
ito a
model that already 1ncluded x1 is 0.0296. One of them uses the t statistic
and the
other used the F statistic. I am not sure where your confusion lies.

--
David Winsemius

the results i got:

summary(intact.lm)
Call:
lm(formula = y ~ x1 + x2)

Residuals:
  Min      1Q  Median      3Q     Max
-5.5824 -1.5314 -0.1568  1.4425  5.3374

Coefficients:
          Estimate Std. Error t value Pr(>|t|)
(Intercept)   3.4857     0.9354   3.726 0.000521 ***
x1            0.2537     0.6117   0.415 0.680191
x2            1.3517     0.6025   2.244 0.029608 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.34 on 47 degrees of freedom
Multiple R-squared: 0.7483,     Adjusted R-squared: 0.7376
F-statistic: 69.87 on 2 and 47 DF,  p-value: 8.315e-15

anova(intact.lm)
Analysis of Variance Table

Response: y
        Df Sum Sq Mean Sq  F value   Pr(>F)
x1         1 737.86  737.86 134.7129 2.11e-15 ***
x2         1  27.57   27.57   5.0338  0.02961 *
Residuals 47 257.43    5.48
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

my question is that , i cant see any "serious downside" in using
summary (). And in the second question I am totally clueless. I need
your help

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Some clarificatins of anova() and summary ()

Thread (11 messages)