Skip to content

Some clarificatins of anova() and summary ()

11 messages · Duncan Murdoch, David Winsemius, Tanmoy Talukdar +4 more

#
[sorry for the repost. I forgot to switch off formatting last time]

I have two assignment problems...

I have written this small code for regression with two regressors .

n <- 50
x1 <- runif(n,1,10)
x2 <- x1 + rnorm(n,0,0.5)
plot(x1,x2) # x1 and x2 strongly correlated
cor(x1,x2)
y <- 3 + 0.5*x1 + 1.1*x2 + rnorm(n,0,2)
intact.lm <- lm(y ~ x1 + x2)
summary(intact.lm)
anova(intact.lm)


the questions are

1.The function summary() is convenient since the result does not
depend on the order the variables
are listed in the linear model definition. It has a serious downside
though which is obvious in this case.
Are there any signficant variables left?

2. An anova(intact.lm) table shows how much the second variable
contributes to the result in
addition to the first. Is there a variable significant now?Is the
second variable significant?

the results i got:
Call:
lm(formula = y ~ x1 + x2)

Residuals:
    Min      1Q  Median      3Q     Max
-5.5824 -1.5314 -0.1568  1.4425  5.3374

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   3.4857     0.9354   3.726 0.000521 ***
x1            0.2537     0.6117   0.415 0.680191
x2            1.3517     0.6025   2.244 0.029608 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.34 on 47 degrees of freedom
Multiple R-squared: 0.7483,     Adjusted R-squared: 0.7376
F-statistic: 69.87 on 2 and 47 DF,  p-value: 8.315e-15
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq  F value   Pr(>F)
x1         1 737.86  737.86 134.7129 2.11e-15 ***
x2         1  27.57   27.57   5.0338  0.02961 *
Residuals 47 257.43    5.48
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1



my question is that , i cant see any "serious downside" in using
summary (). And in the second question I am totally clueless. I need
your help
#
On 14/12/2008 9:40 AM, Tanmoy Talukdar wrote:
This isn't an R question, it's a statistics question, from a statistics 
course.  You should ask your instructor.

Duncan Murdoch
#
On Dec 14, 2008, at 9:40 AM, Tanmoy Talukdar wrote:

            
For replication purposes, it might be good to set a seed for the random
number generation.

set.seed(127)
You should also run anova on these models:

intact21 <- lm(y~x2+x1)
intact12 <- lm(y~x1+x2)
Both anova and summary were in agreement that the P-value for addition  
of x2 ito a
model that already 1ncluded x1 is 0.0296. One of them uses the t  
statistic and the
other used the F statistic. I am not sure where your confusion lies.
#
Why do you think that running lm() twice on those two models is going
to help me?  They are identical models and hence we get identical
results.The second question is now alright. I had some
misunderstanding about it.

Please tell me if you can find any "downside " in summary (). I can't find any.


i 've edited the code for that replication  issue.

set.seed(127)
n <- 50
x1 <- runif(n,1,10)
x2 <- x1 + rnorm(n,0,0.5)
plot(x1,x2) # x1 and x2 strongly correlated
cor(x1,x2)
y <- 3 + 0.5*x1 + 1.1*x2 + rnorm(n,0,2)
intact.lm <- lm(y ~ x1 + x2)
summary(intact.lm)
anova(intact.lm)
Call:
lm(formula = y ~ x1 + x2)

Residuals:
   Min      1Q  Median      3Q     Max
-3.4578 -1.1326  0.4551  1.2807  4.8241

Coefficients:
           Estimate Std. Error t value Pr(>|t|)
(Intercept)  3.63603    0.61944   5.870 4.23e-07 ***
x1          -0.09555    0.49114  -0.195  0.84658
x2           1.59384    0.48542   3.283  0.00194 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.807 on 47 degrees of freedom
Multiple R-squared: 0.8198,     Adjusted R-squared: 0.8121
F-statistic: 106.9 on 2 and 47 DF,  p-value: < 2.2e-16
Analysis of Variance Table

Response: y
         Df Sum Sq Mean Sq F value    Pr(>F)
x1         1 663.18  663.18 203.065 < 2.2e-16 ***
x2         1  35.21   35.21  10.781  0.001940 **
Residuals 47 153.49    3.27
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
On Sun, Dec 14, 2008 at 8:26 PM, David Winsemius <dwinsemius at comcast.net> wrote:
#
running anova() on intact12 and intact 21 gives two different results!!
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq F value    Pr(>F)
x1         1 663.18  663.18 203.065 < 2.2e-16 ***
x2         1  35.21   35.21  10.781  0.001940 **
Residuals 47 153.49    3.27
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of Variance Table

Response: y
          Df Sum Sq Mean Sq  F value Pr(>F)
x2         1 698.26  698.26 213.8077 <2e-16 ***
x1         1   0.12    0.12   0.0379 0.8466
Residuals 47 153.49    3.27
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


On Sun, Dec 14, 2008 at 8:56 PM, Tanmoy Talukdar
<tanmoy.talukdar at gmail.com> wrote:
#
anyone please explain why this happens.. I know this happens when x1
and x2 has different sizes. but here x1 and x2 have same dimension.

On Sun, Dec 14, 2008 at 9:26 PM, Tanmoy Talukdar
<tanmoy.talukdar at gmail.com> wrote:
#
Given that this is a homework problem, I think the onus is on you to
figure this out, not on us to help you.  In almost all classes, you
are expected to work on homework by yourself, and not solicit help
from others - this is often known as "cheating".

Hadley
#
I think now I have got some understanding of the things.

y ~ x1+x2 first adds x1 to the model and then adds x2 .
But y~x2+x1 adds x2 first, so the value we get are different.

please correct me if i am wrong.
On Sun, Dec 14, 2008 at 9:48 PM, hadley wickham <h.wickham at gmail.com> wrote:
#
Tanmoy Talukdar wrote:
You are not wrong.  However, you're wearing out your welcome
a bit by posting very frequent messages to the list. I'd strongly
recommend that you find some more help locally, or find a
copy of Peter Dalgaard's "Introductory Statistics with R",
and try to work through some of these problems on your own
a bit more.  If you can demonstrate that you've really gone
away and read and thought about these things, and articulate
what still doesn't make sense to you about the way R is doing
things, and that we are not simply answering homework questions,
you will probably get useful answers ...

  good luck,
    Ben Bolker
#
Ben,
You were quite correct to indicate that Tanmoy should not use the listserver to get answers to his class assignments. Never the less, I do have some sympathy for him. The help pages for the R functions summary, anova, drop1, do not discuss the critically important issue addressed by Tanmoy's class assignment. I believe this is a serious limitation. If users do not understand the differences between the output of these three basic functions, they can easily be led astray. I am not sure who has access to the help pages, but I hope they see this Email and consider modifying the help pages so as to address the important issue highlighted by Tanmoy's class assignment.
John

John David Sorkin M.D., Ph.D.
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)

        
Tanmoy Talukdar wrote:
You are not wrong.  However, you're wearing out your welcome
a bit by posting very frequent messages to the list. I'd strongly
recommend that you find some more help locally, or find a
copy of Peter Dalgaard's "Introductory Statistics with R",
and try to work through some of these problems on your own
a bit more.  If you can demonstrate that you've really gone
away and read and thought about these things, and articulate
what still doesn't make sense to you about the way R is doing
things, and that we are not simply answering homework questions,
you will probably get useful answers ...

  good luck,
    Ben Bolker
#
Although not as good as putting it in the help pages there is
an R wiki that anyone can add to:

http://wiki.r-project.org

On Sun, Dec 14, 2008 at 5:04 PM, John Sorkin
<jsorkin at grecc.umaryland.edu> wrote: