lmer and a response that is a proportion
On Sun, 3 Dec 2006, John Fox wrote:
Dear Cameron,
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Cameron Gillies Sent: Sunday, December 03, 2006 1:58 PM To: r-help at stat.math.ethz.ch Subject: [R] lmer and a response that is a proportion Greetings all, I am using lmer (lme4 package) to analyze data where the response is a proportion (0 to 1). It appears to work, but I am wondering if the analysis is treating the response appropriately - i.e. can lmer do this?
As far as I know, you can specify the response as a proportion, in which case the binomial counts would be given via the weights argument -- at least that's how it's done in glm(). An alternative that should be equivalent is to specify a two-column matrix with counts of "successes" and "failures" as the response. Simply giving the proportion of successes without the counts wouldn't be appropriate.
I have used both family=binomial and quasibinomial - is one more appropriate when the response is a proportion? The coefficient estimates are identical, but the standard errors are larger with family=binomial.
The difference is that in the binomial family the dispersion is fixed to 1, while in the quasibinomial family it is estimated as a free parameter. If the standard errors are larger with family=binomial, then that suggests that the data are underdispersed (relative to the binomial); if the difference is substantial -- the factor is just the square root of the estimated dispersion -- then the binomial model is probably not appropriate for the data.
John's last deduction is appropriate to a GLM, but not necessarily to a GLMM. I don't have detailed experience with lmer for binomial, but I do for various other fitting routines for GLMM. Remember there are at least two sources of randomness in a GLMM, and let us keep it simple and have just a subject effect and a measurement error. Then if over-dispersion is happening within subjects, forcing the binomial dispersion (at the measurement level) to 1 tends to increase the estimate of the subject-level variance component to compensate, and in turn increase some of the standard errors. (Please note the 'tends' in that para, as the details of the design do matter. For cognescenti, think about plot and sub-plot treatments in a split-plot design.)
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595