[We do have a TooMuchAtOnce category, but it is easier to cope with short separate reports with informative subject lines for, e.g. the BUGS list.]
On Mon, 18 Dec 2000 james.lindsey@luc.ac.be wrote:
R1.2.0 with Linux RH5.2 I do not believe that the problems below are new to 1.2 but I only cover this sort of thing once a year in my course and some of that happened to be last Friday so too late to report for 1.2. I see that one or two things that I was going to report have been corrected. I like the fact that interactions now show : instead of . Here is some output with comments inserted.
[...]
wt <- codes(Reg66)!=codes(Reg71)
Why not just wt <- Reg66 != Reg71 ? [...]
First problem: with diagonal values weighted out in a transition matrix (mover-stayer model), glm wrongly estimates the diagonals to be the observed plus 0.1. This was correct in 90.1 a year ago but already wrong in 1.0.1. I don't have any versions in between installed.
I think this is the same as
[Rd] glm gives incorrect results for zero-weight cases (PR#780)
in which case it is already fixed in R-patched. That does look to have
solved this one too. Note that glm has *not* `estimated the diagonals'
at all. These are predictions (the cases are not in the model), and
predict.glm did get them right.
predict(z, data)
1 2 3 4 5 6 7 8
0.61337 3.30334 3.45098 4.51559 3.33786 6.02783 6.17548 7.24009
9 10 11 12 13 14 15 16
3.50109 6.19106 6.33871 7.40332 4.57309 7.26306 7.41071 8.47532
[...]
Second problem: the man page for factors states that the levels are sorted, but gl does not do this. It keeps them in the order of the
There is a help page for factor(), not for factors, and I don't believe that does state so. It says that *factor* sorts them by default, which is true. Where is the `man page for factors'?
labels. This is inconsistent with everything else, confusing for students, and liable to induce errors in data analysis. In the example below, it is not too important because the baseline category is the same in the two cases but that will not generally be the case. As an addition comment, I would strongly recommend that, if labels can be coerced to numeric, they be ordered, not sorted. This is because the order is very confusing if there are more than 9: for example, the sorted 10 comes near the beginning and not at the end.
You can do that, of course, by specifying the order of the levels. However, we teach students that the details of the coding of linear models are details, and that they should regard coefficients as secondary and predictions as primary. aov() has the right idea in suppressing the individual coefficients. If you want to interpret coefficients you need to understand codings. Period. I don't see the value of changing the default for factor: it would not be backwards-compatible, and some people have thought thought what the default would be and accepted it.
reg66 <- gl(4,1,16,labels=c("CC","LY","WM","GL"))
z
Call: glm(formula = Freq ~ Reg66 + Reg71, family = poisson)
Coefficients:
(Intercept) Reg66GL Reg66LY Reg66WM Reg71GL Reg71LY
0.6134 3.9022 2.6900 2.8376 3.9597 2.7245
Reg71WM
2.8877
Degrees of Freedom: 15 Total (i.e. Null); 9 Residual
Null Deviance: 40740
Residual Deviance: 19880 AIC: 20000
print(za <- glm(Freq~reg66+Reg71,family=poisson))
Call: glm(formula = Freq ~ reg66 + Reg71, family = poisson)
Coefficients:
(Intercept) reg66LY reg66WM reg66GL Reg71GL Reg71LY
0.6134 2.6900 2.8376 3.9022 3.9597 2.7245
Reg71WM
2.8877
Degrees of Freedom: 15 Total (i.e. Null); 9 Residual
Null Deviance: 40740
Residual Deviance: 19880 AIC: 20000
Third problem: if mean value contrasts are used, the level names are not attached to the variable name, but simply the number of the category. This rightly drives students mad and is simply uninterpretable if the levels have been sorted. It takes a great deal of extreme care to try to find out to what the numbers refer.
Are we supposed to guess that `mean value contrasts' means contr.sum? In which case the numbers refer to the number of the *contrast*: they do not refer to levels, nor are single levels relevant. As in
contr.sum(levels(Reg66))
[,1] [,2] [,3] CC 1 0 0 GL 0 1 0 LY 0 0 1 WM -1 -1 -1
contr.treatment(levels(Reg66))
GL LY WM CC 0 0 0 GL 1 0 0 LY 0 1 0 WM 0 0 1 Now, what should the column labels be in the first case? And in what sense are these `mean value contrasts'? Do you really want labels like "CCvWM"? They could get very cumbersome. (One could argue that contr.treatment is already wrong, but as they are not even contrasts ....) [...]
Additional comment: I recommend that the aic functions for poisson and binomial have the explicit calculation of the log density replaced by the corresponding d function with the log option. This will avoid the production of NAs in certain cases of zeroes in tables.
Good idea, but could you supply examples so we can put them in the regression tests?
Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._