Hi,
If I used drop1 in a weighted lm fit, it seems to ignore the weights
in the AIC calculation of the dropped terms, see the example below.
Can this be right?
Yan
--------------------
library(car)
> unweighted.model <- lm(trSex ~ (river+length +depth)^2-
length:depth, dno2)
> Anova(unweighted.model)
Anova Table (Type II tests)
Response: trSex
Sum Sq Df F value Pr(>F)
river 0.1544 1 3.3151 0.07100 .
length 0.1087 1 2.3334 0.12911
depth 0.1917 1 4.1145 0.04461 *
river:length 0.0049 1 0.1049 0.74652
river:depth 0.3008 1 6.4567 0.01226 *
Residuals 5.9168 127
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> drop1(unweighted.model)
Single term deletions
Model:
trSex ~ (river + length + depth)^2 - length:depth
Df Sum of Sq RSS AIC
<none> 5.92 -401.97
river:length 1 0.004889 5.92 -403.86
river:depth 1 0.30 6.22 -397.37
### Both drop1 & Anova suggest dropping river:length
### Compare with the following:
> weighted.model <- lm(trSex ~ (river+length +depth)^2-length:depth,
dno2, weights=males+females)
> Anova(weighted.model)
Anova Table (Type II tests)
Response: trSex
Sum Sq Df F value Pr(>F)
river 1.471 1 3.3775 0.06843 .
length 1.002 1 2.2999 0.13187
depth 2.974 1 6.8295 0.01005 *
river:length 0.075 1 0.1733 0.67790
river:depth 2.020 1 4.6397 0.03313 *
Residuals 55.303 127
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> drop1(weighted.model)
Single term deletions
Model:
trSex ~ (river + length + depth)^2 - length:depth
Df Sum of Sq RSS AIC
<none> 55.30 -104.71
river:length 1 -49.33 5.97 -402.79
river:depth 1 -49.04 6.26 -396.39
Drop1 and weights
5 messages · Yan Wong, Brian Ripley
It looks like at some point weighted.residuals() was changed, and broke this. The models are fitted correctly, but AIC is wrong. However, note that it does work correctly for a glm() fit which gives you a workaround. You example is not reproducible, so I was unable to test the workaround nor the correction which will short;y be in R-patched.
On Wed, 1 Mar 2006, Yan Wong wrote:
Hi, If I used drop1 in a weighted lm fit, it seems to ignore the weights in the AIC calculation of the dropped terms, see the example below. Can this be right? Yan -------------------- library(car)
unweighted.model <- lm(trSex ~ (river+length +depth)^2-
length:depth, dno2)
Anova(unweighted.model)
Anova Table (Type II tests)
Response: trSex
Sum Sq Df F value Pr(>F)
river 0.1544 1 3.3151 0.07100 .
length 0.1087 1 2.3334 0.12911
depth 0.1917 1 4.1145 0.04461 *
river:length 0.0049 1 0.1049 0.74652
river:depth 0.3008 1 6.4567 0.01226 *
Residuals 5.9168 127
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
drop1(unweighted.model)
Single term deletions
Model:
trSex ~ (river + length + depth)^2 - length:depth
Df Sum of Sq RSS AIC
<none> 5.92 -401.97
river:length 1 0.004889 5.92 -403.86
river:depth 1 0.30 6.22 -397.37
### Both drop1 & Anova suggest dropping river:length
### Compare with the following:
weighted.model <- lm(trSex ~ (river+length +depth)^2-length:depth,
dno2, weights=males+females)
Anova(weighted.model)
Anova Table (Type II tests)
Response: trSex
Sum Sq Df F value Pr(>F)
river 1.471 1 3.3775 0.06843 .
length 1.002 1 2.2999 0.13187
depth 2.974 1 6.8295 0.01005 *
river:length 0.075 1 0.1733 0.67790
river:depth 2.020 1 4.6397 0.03313 *
Residuals 55.303 127
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
drop1(weighted.model)
Single term deletions
Model:
trSex ~ (river + length + depth)^2 - length:depth
Df Sum of Sq RSS AIC
<none> 55.30 -104.71
river:length 1 -49.33 5.97 -402.79
river:depth 1 -49.04 6.26 -396.39
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On 2 Mar 2006, at 13:25, Prof Brian Ripley wrote:
It looks like at some point weighted.residuals() was changed, and broke this. The models are fitted correctly, but AIC is wrong. However, note that it does work correctly for a glm() fit which gives you a workaround.
Thank you.
You example is not reproducible, so I was unable to test the workaround nor the correction which will short;y be in R-patched.
Indeed, I realised this, but assumed that the problem could be understood even without my dataset. I'll test my data on the patched version when it becomes available, and re-post then. Thanks again Yan
On Thu, 2 Mar 2006, Yan Wong wrote:
On 2 Mar 2006, at 13:25, Prof Brian Ripley wrote:
It looks like at some point weighted.residuals() was changed, and broke this. The models are fitted correctly, but AIC is wrong. However, note that it does work correctly for a glm() fit which gives you a workaround.
Thank you.
You example is not reproducible, so I was unable to test the workaround nor the correction which will shortly be in R-patched.
Indeed, I realised this, but assumed that the problem could be understood even without my dataset. I'll test my data on the patched version when it becomes available, and re-post then.
Now I have tried a similar example, I think that the workaround is correct but potentially confusing due to another error elsewhere. Using drop1 on a gaussian glm fit with weights gives the correct deviance and the correct AICs up to an additive constant. However, AIC() on such a fit gives the wrong AIC, since gaussian()$aic is incorrect (or at least it treats the weights as case weights, which is not the interpretation everything else uses). (I've changed that in R-devel.) So in 2.2.1 patched
library(MASS) hills.lm <- lm(time ~ dist + climb, data=hills, weights=1/dist^2) drop1(hills.lm)
Single term deletions
Model:
time ~ dist + climb
Df Sum of Sq RSS AIC
<none> 437.64 94.41
dist 1 164.05 601.68 103.55
climb 1 8.66 446.29 93.10
hills.glm <- glm(time ~ dist + climb, data=hills, weights=1/dist^2) drop1(hills.glm)
Single term deletions
Model:
time ~ dist + climb
Df Deviance AIC
<none> 437.64 21.30
dist 1 601.68 30.44
climb 1 446.29 19.98
and in R-devel
hills.glm <- glm(time ~ dist + climb, data=hills, weights=1/dist^2) drop1(hills.glm)
Single term deletions
Model:
time ~ dist + climb
Df Deviance AIC
<none> 437.64 323.87
dist 1 601.68 333.01
climb 1 446.29 322.55
(These differ only in which additive constants are included, with that in
drop1.glm in 2.2.1 patched being based on an incorrect calculation,
which for this purpose does not matter.)
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
5 days later
On 2 Mar 2006, at 17:56, Prof Brian Ripley wrote:
On Thu, 2 Mar 2006, Yan Wong wrote:
Indeed, I realised this, but assumed that the problem could be understood even without my dataset. I'll test my data on the patched version when it becomes available, and re-post then.
I've just tried R-patched, and it now works as expected. Thanks for
the quick fix. As you point out, the glm and lm versions differ in
AIC reported, but only by an additive constant, and so the tests are
not affected.
Yan
--------------------
> weighted.lm <- lm(trSex ~ (river+length +depth)^2-length:depth,
dno2, weights=males+females)
> d1.lm <- drop1(weighted.lm, test="F");
> weighted.glm <- glm(trSex ~ (river+length +depth)^2-length:depth,
dno2, weights=males+females, family="gaussian")
> d1.glm <- drop1(weighted.glm, test="F");
>
> #differ by a constant
> d1.lm$AIC - d1.glm$AIC
[1] 628.1381 628.1381 628.1381
>
> Anova(weighted.lm)
Anova Table (Type II tests)
Response: trSex
Sum Sq Df F value Pr(>F)
river 1.471 1 3.3775 0.06843 .
length 1.002 1 2.2999 0.13187
depth 2.974 1 6.8295 0.01005 *
river:length 0.075 1 0.1733 0.67790
river:depth 2.020 1 4.6397 0.03313 *
Residuals 55.303 127
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
> d1.lm
Single term deletions
Model:
trSex ~ (river + length + depth)^2 - length:depth
Df Sum of Sq RSS AIC F value Pr(F)
<none> 55.303 -104.710
river:length 1 0.075 55.379 -106.529 0.1733 0.67790
river:depth 1 2.020 57.324 -101.938 4.6397 0.03313 *
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
> d1.glm
Single term deletions
Model:
trSex ~ (river + length + depth)^2 - length:depth
Df Deviance AIC F value Pr(F)
<none> 55.30 -732.85
river:length 1 55.38 -734.67 0.1733 0.67790
river:depth 1 57.32 -730.08 4.6397 0.03313 *
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1