Skip to content
Prev 2574 / 7420 Next

reporting results from binomial glm with categorical variable

Hi Matt,

This is only obliquely an R question.  Here is an answer nonetheless.

If you have G levels of the categorical factor then there are exactly G 
means to estimate (irrespective of the outcome type).  This means that 
you cannot estimate an overall grand mean *and* the individual level 
means, as there would then be G+1 parameters for G means and the 
estimates would be non-unique...  I suspect that you already knew this 
though.

The way around this is to impose some sort of constraint on the overall 
mean and the level means.  Commonly this is done by assigning one of the 
level `deviations' to be zero -- this is called a corner-point 
constraint.  Another type is sum-to-zero where there is a grand mean 
(actually the mean) and G deviations that are constrained by their sum.  
This is the constraint that you mentioned.  There are others, of course, 
but less common.  One that I find very useful is to omit estimating the 
overall mean and just estimate the G factor level means.  Generally 
though, the choice of constraint is not all that important but 
corner-point constraints can be easier to interpret, sometimes.

If you do want to use sum-to-zero constraints then all you need to do is 
alter the `contrast' attribute of your categorical variable.  This is 
done in R using the C() function (note capitalisation).  Your glm() call 
would use a formula like cbind( nsuccess,nfailure)~1+C(myFac,"sum").

How to report the results?  Good question...  For me, it depends 
strongly on what information I want to convey.  Typically, for this kind 
of analysis, that would be the means of the factor levels (unless there 
is more to this than we are seeing).  This is most easily done using R's 
inbuilt prediction functions (see ?predict.glm for example).  A call to 
this function would have a newdata argument given as a G row data frame 
with one row for each level of the factor.  Note that it will not matter 
which contrasts you give it -- they will all perform equally well (they 
are all equally valid).

I hope this helped (it is certainly long enough),

Scott

PS  A couple of good references (oldies but goodies) for topics related 
to this are
Lane and Nelder (1982) Analysis of covariance and standardisation as 
instances of prediction.  Biometrics, 38, 613-621
Nelder (1994) The statistics of linear models: back to basics.  
Statistics and Computing, 4, 221-234
On 02/12/11 09:54, Matthew Forister wrote: