Skip to content

Binomial glms with very small numbers

5 messages · Patrick Connolly, Spencer Graves, Brian Ripley

#
V&R describes binomial GLMs with mortality out of 20 budworms.

Is it appropriate to use the same approach with mortality out of
numbers as low as 3?  I feel reticent to do so with data that is not
very continuous.  There are one continuous and one categorical
independent variables.

Would it be more appropriate to treat the response as an ordered
factor with four levels?  If so, what family would one use?

TIA
#
The advisability of using "glm" with mortality depends not on the 
size of sample groups but on the assumption of independence:  Whether 
you have 3 individuals per group or 30 or 1, is it plausible to assume 
that all individuals represented in your data.frame have independent 
chances of survival give the potentially explanatory variables?  If the 
answer is "yes", then "glm" is appropriate.  If the answer is "no", then 
some other tool may be preferable.  However, "glm" is quick and easy in 
R, and I might start with that, even if I felt the assumption of 
independence was violated.  If I found nothing there, I would not likely 
find anything with techniques that handled more appropriately the 
violations of independence. 

      Similarly, I can't see how it would matter whether potentially 
explanatory variables were continuous or categorical, as long as a 
categorical variable were appropriately coded as a factor (or 
"character", which is then treated as a factor) if it has more than 2 
levels. 

      Hope this helps. 
      spencer graves
Patrick Connolly wrote:

            
#
On Wed, 14-Jan-2004 at 05:15PM -0800, Spencer Graves wrote:
|>       The advisability of using "glm" with mortality depends not on
|> the size of sample groups but on the assumption of independence:
|> Whether you have 3 individuals per group or 30 or 1, is it

I think we can assume independence.  What concerned me more was the
fact that there will be rather a lot of 0s and 1s, corresponding to
-Inf and Inf on the transformed scale.  Only half the possible values
(namely, 1 & 2) will be usable in the fitting.  On second thoughts,
since the response can be given as a binary, perhaps I was
unnecessarily concerned.


|> plausible to assume that all individuals represented in your
|> data.frame have independent chances of survival give the
|> potentially explanatory variables?  If the answer is "yes", then
|> "glm" is appropriate.  If the answer is "no", then some other tool
|> may be preferable.  However, "glm" is quick and easy in R, and I
|> might start with that, even if I felt the assumption of
|> independence was violated.  If I found nothing there, I would not
|> likely find anything with techniques that handled more
|> appropriately the violations of independence.

Thanks for that suggestion.

|> 
|>       Similarly, I can't see how it would matter whether potentially 
|> explanatory variables were continuous or categorical, as long as a 
|> categorical variable were appropriately coded as a factor (or 
|> "character", which is then treated as a factor) if it has more than 2 
|> levels. 

I didn't think it would make a difference but I included it in case
someone more knowledgeable had reasons why it did.

Thanks.
#
Yes, but "glm" maximizes the binomial likelihood assuming 
log(p/(1-p)) is a linear model.  Therefore, you don't have to transform 
the 0's and 1's.  There are cases where a particular combination of 
potential explanatory variables will clearly separate mortalities from 
survivors.  I don't know that the algorithm does with such cases, but it 
should send a slope essentially to infinite.  However, if you don't have 
this case, "glm" should do what you want. 

      hope this helps.  spencer graves
Patrick Connolly wrote:

            
#
On Wed, 14 Jan 2004, Spencer Graves wrote:

            
Even in that case glm will do what you want (return fitted probabilities 
rather close to 0 or 1), just somewhat inefficiently.  The standard errors 
are often hard to interpret and Wald tests are misleading. Even that case 
is covered in V&R!   We also discuss the problems of interpreting the 
residual deviance (in the current edition, and in the complements for 
earlier eds) when the expected number of either successes or failures is 
small (and what small is: it will be with n=3).