Hi all,
I am trying to fit a linear model (using lm) with 3 predictors (2
continuous and 1 factor) for a response variable. I have no
interaction terms nor missing values (though I do not have the same
sample size in all factor cells), so I thought the results would be
pretty straightforward, with no differences due to the "type" of Sum-
of-Sqares employed. However, I am getting somewhat conflicting results
(especially for the variable "IMC", see below) when I use anova, Anova
(from car package), and drop1. Actually, drop1(), Anova(type="II") and
Anova(type="III") all give the same results, only anova() yields a
different one.
Due to "consistency" of results I am imagining i should ignore R
standard anova's results, but I'd like to understand why and the
implications this would have to any other test I conduct.
Any comments would be welcome...
Best,
Rafael Maia
> m1<-lm(PC1~IMC+ResPlum+factor(Year),data=dados)
> summary(m1)
Call:
lm(formula = PC1 ~ IMC + ResPlum + factor(Year), data = dados)
Residuals:
Min 1Q Median 3Q Max
-4.0212 -0.8983 0.1623 1.2623 2.7767
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.1260 3.9075 -1.312 0.196080
IMC 8.8518 6.1188 1.447 0.154777
ResPlum 3.0034 0.7625 3.939 0.000276 ***
factor(Year)2 -1.2602 0.5186 -2.430 0.019069 *
factor(Year)3 0.6307 0.9014 0.700 0.487675
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 1.659 on 46 degrees of freedom
Multiple R-squared: 0.4001, Adjusted R-squared: 0.3479
F-statistic: 7.669 on 4 and 46 DF, p-value: 8.033e-05
> anova(m1)
Analysis of Variance Table
Response: PC1
Df Sum Sq Mean Sq F value Pr(>F)
IMC 1 16.844 16.844 6.1204 0.0171142 *
ResPlum 1 44.860 44.860 16.3003 0.0002027 ***
factor(Year) 2 22.720 11.360 4.1278 0.0224493 *
Residuals 46 126.596 2.752
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
> drop1(m1, test="F")
Single term deletions
Model:
PC1 ~ IMC + ResPlum + factor(Year)
Df Sum of Sq RSS AIC F value Pr(F)
<none> 126.596 56.368
IMC 1 5.759 132.355 56.637 2.0928 0.1547766
ResPlum 1 42.700 169.295 69.191 15.5155 0.0002757 ***
factor(Year) 2 22.720 149.316 60.786 4.1278 0.0224493 *
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
> Anova(m1)
Anova Table (Type II tests)
Response: PC1
Sum Sq Df F value Pr(>F)
IMC 5.759 1 2.0928 0.1547766
ResPlum 42.700 1 15.5155 0.0002757 ***
factor(Year) 22.720 2 4.1278 0.0224493 *
Residuals 126.596 46
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
> Anova(m1, type="III")
Anova Table (Type III tests)
Response: PC1
Sum Sq Df F value Pr(>F)
(Intercept) 4.736 1 1.7210 0.1960800
IMC 5.759 1 2.0928 0.1547766
ResPlum 42.700 1 15.5155 0.0002757 ***
factor(Year) 22.720 2 4.1278 0.0224493 *
Residuals 126.596 46
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Differences between (A)anova 's
2 messages · Rafael Maia, Kingsford Jones
Hi Rafael, some comments below... On Sat, Sep 20, 2008 at 5:35 PM, Rafael Maia
<queirozrafaelmv at yahoo.com.br> wrote:
Hi all, I am trying to fit a linear model (using lm) with 3 predictors (2 continuous and 1 factor) for a response variable. I have no interaction terms nor missing values (though I do not have the same sample size in all factor cells), so I thought the results would be pretty straightforward, with no differences due to the "type" of Sum-of-Sqares employed. However, I am
The issue is whether your predictors are orthogonal. If they are, they are not "competing" for the same information and the order that they enter the model will not affect the testing.
getting somewhat conflicting results (especially for the variable "IMC", see below) when I use anova, Anova (from car package), and drop1. Actually, drop1(), Anova(type="II") and Anova(type="III") all give the same results, only anova() yields a different one.
See ?anova.lm, help(Anova, package=car), and ?drop1.lm. anova.lm is testing the terms sequentially. Anova tests the terms given that all other terms are in the model (Type = 'III'), or given that all terms that at not at a higher level in the hierarchy of terms (e..g. x1^2 and x1:x2 are at a higher level than x1) are in the model (Type = 'II'). drop1.lm tests droping terms 1 at a time, while respecting the 'hierarchy' (this is often referred to as 'marginality constraints').
Due to "consistency" of results I am imagining i should ignore R standard anova's results, but I'd like to understand why and the implications this would have to any other test I conduct.
I think it is generally better not to automate the process of choosing hypotheses of interest, but rather to form tests/comparisons explicitly. One good method to do this is to compare nested models, where the difference in the two models reflects the hypothesis of interest. e.g., to test the ResPlum term given that all other terms are in the model m2 <- update(m1, . ~ . -ResP) anova(m1, m2) Note there are many methods available for model testing/comparison (e.g., F or LR tests, or Cp, AIC, BIC, ...) Similarly, you can get away from the default method of testing factor levels in summary.lm by explicitly choosing your contrasts to test hypotheses of interest. The entire process can be generalized via the General Linear Hypothesis Test -- see e.g., help(glh.test, package = gmodels) HTH, Kingsford Jones