R-beta: R0.62.3 problems - R-help

John Maindonald · 1998-09-01T22:42:51Z

From: Prof Brian Ripley > From: Jim Lindsey > > > 2. Binary factor variables no longer have the category label stuck on > > the end of the variable name in output from glm(). This is very > > misleading for at least two reasons: (i) there is no way to tell if a > > variable is factor or not just by looking at the output, (ii) in > > various contexts, the level printed out may be the first or the > > second, and it is essential to know wh

John Maindonald

Tue, Sep 1, 1998 3:42 PM #

From: Prof Brian Ripley <ripley at stats.ox.ac.uk>

I consider that the S behaviour is misleading and confusing, and should
not be copied.  I'd been pleased to find that R did use the same form of
labelling for 3+-level factors as for binary factors.

For 3+ level factors in S, one can tell from the output whether the
parameterisation is "helmert" or "contrast".  For binary factors the
labelling is, in my view confusingly, identical.  I am sensitive to
this because I sorted this point out for someone last week.  (In this
special [binary] case the coefficients and SE's for "helmert" are
smaller by a factor of 2.)

Actually I consider that the output ought to identify what
parameterisation has been used.  I consider, also, that the S decision
to make "helmert" the default is unsatisfactory.  While helmert makes
sense for computation, it is almost never sensible for output.  

Perhaps the issue is that the handling of the computation, which ought
to be hidden from the user, should be separated from the
paramterisation of the output.  In fact, in both S and (I expect) in
R, they are linked.



John Maindonald               email : john.maindonald at anu.edu.au        
Statistical Consulting Unit,  phone : (6249)3998        
c/o CMA, SMS,                 fax   : (6249)5549  
John Dedman Mathematical Sciences Building
Australian National University
Canberra ACT 0200
Australia
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brian Ripley

Tue, Sep 1, 1998 11:58 PM #

[Should this divert to R-devel: seems to me to be more appropriate there?
Even more appropriate if it had been discussed during development, not two
releases later!]

On Wed, 2 Sep 1998, John Maindonald wrote:

I suspect you meant `for binary factors as for 3+level factors in S!' This
was an undocumented difference between R and S, and as I believe deliberate
differences should be documented, I suspect this was not one of them. 
(Certainly when I raised it, no one suggested that it was deliberate.)

How does one do this? If you mean from the printed output from a print or
summary method, I only know how to do this only if those were the only two
possibilities, which they are not. 

Let me point out that in 0.62.3 (but not 0.62.1) you can find out what the
coding used is by looking at the contrasts component of the object, so you
_can_ find the parametrization (as my dictionary spells it) from the output
of lm or glm, whatever the coding (and there is an essentially infinite set
of possibilities).

The coding _is_ now contained in the R output. I think you are thinking in
SAS-like terms and mean that you want that output printed by print or
print.summary methods (which?). And/or print.coef methods or precisely
what? Given that each factor can have a different coding, this could lead
to a very much larger output. (You did appreciate that arbitrary contrasts
could be attached to each factor?)  Would it not be better (and in the
spirit of S) to have a separate function to print out the codings used, or
to print out a coding-free view of the fit? (Hint: there is such a function
in S.) 

You can very easily set the default coding for your own use, so what is
your concern over the global default?  In balanced designs I would say that
a (block-)orthogonal coding is much less likely to mislead, but I at least
do not wish to impose my views on the rest of the users.

I agree that they should be separate, and that is the point of a lot of my
recent work on filling holes in R. There is an important point here. lm can
be used in more than one way; print.lm is designed for regression and
print.aov for analysis of variance, with model.tables and dummy.coef to
examine the output in a coding-independent way.  So I suspect the `output'
you are complaining about may be from inappropriate tools, and there is
certainly room for you to contribute new tools expressing printed output
you find illuminating.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._