Skip to content

lmer and a response that is a proportion

9 messages · Cameron Gillies, Brian Ripley, John Fox +3 more

#
Greetings all,

I am using lmer (lme4 package) to analyze data where the response is a
proportion (0 to 1).  It appears to work, but I am wondering if the analysis
is treating the response appropriately - i.e. can lmer do this?

I have used both family=binomial and quasibinomial - is one more appropriate
when the response is a proportion?  The coefficient estimates are identical,
but the standard errors are larger with family=binomial.

Thanks very much for any insight you may have!
Cam


Cam Gillies
PhD Candidate
Biological Sciences
University of Alberta
#
Dear Cameron,
As far as I know, you can specify the response as a proportion, in which
case the binomial counts would be given via the weights argument -- at least
that's how it's done in glm(). An alternative that should be equivalent is
to specify a two-column matrix with counts of "successes" and "failures" as
the response. Simply giving the proportion of successes without the counts
wouldn't be appropriate.
The difference is that in the binomial family the dispersion is fixed to 1,
while in the quasibinomial family it is estimated as a free parameter. If
the standard errors are larger with family=binomial, then that suggests that
the data are underdispersed (relative to the binomial); if the difference is
substantial -- the factor is just the square root of the estimated
dispersion -- then the binomial model is probably not appropriate for the
data.

I hope this helps,
 John
#
On Sun, 3 Dec 2006, John Fox wrote:

            
John's last deduction is appropriate to a GLM, but not necessarily to a 
GLMM. I don't have detailed experience with lmer for binomial, but I do 
for various other fitting routines for GLMM.  Remember there are at least 
two sources of randomness in a GLMM, and let us keep it simple and have 
just a subject effect and a measurement error.  Then if over-dispersion is 
happening within subjects, forcing the binomial dispersion (at the 
measurement level) to 1 tends to increase the estimate of the 
subject-level variance component to compensate, and in turn increase some
of the standard errors.

(Please note the 'tends' in that para, as the details of the design do 
matter.  For cognescenti, think about plot and sub-plot treatments in a 
split-plot design.)
#
Dear Brian and John,

Thanks for your insight.  I'll clarify a couple of things incase it changes
your advice.

My response is a ratio of two measures taken during a bird's path, which
varies from 0  to 1, so I cannot convert it columns of the number of
successes.  It has to be reported as the proportion.  I could logit
transform it to make it normal, but I am trying to avoid that so I can
analyze it directly.

The subjects are individual birds and I have a range of sample sizes from
each bird (from 8 to >200, average of about 75 measurements/bird).

Thanks!
Cam
On 12/3/06 3:47 PM, "Prof Brian Ripley" <ripley at stats.ox.ac.uk> wrote:

            
#
Dear Cameron,

Given your description, I thought that this might be the case. 

I'd first examine the distribution of the response variable to see what it
looks like. If the values don't push the boundaries of 0 and 1, and their
distribution is unimodal and reasonably symmetric, I'd consider analyzing
them directly using normally distributed errors. If the values do stack up
near 0, 1, or both, I'd consider a transformation, or perhaps a different
family (depending on the pattern); in particular, if they stack up near both
0 and 1, a logit or similar transformation could help. Finally, if you have
many values of 0, 1, or both, then a transformation isn't promising (and,
indeed, the logit wouldn't be defined for these values). In any event, I'd
check diagnostics after a preliminary fit.

I hope this helps,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
--------------------------------
#
Would beta regression solve your problem? (package betareg)

Simon.
John Fox wrote:

  
    
#
Hello Simon and John,

I'm afraid I need to include random effects, both a random intercept and
possibly random coefficients and it doesn't look like betareg can do that.

John, the data is spread along the range of 0 to 1 with most values closer
to 1, so it does transform well using the logit transformation.  I was
trying to avoid that though because I was not sure what impact the
transformation would have on the random effects or interpretation of the
coefficients.  

Thanks again!
Cam
On 12/3/06 7:46 PM, "Simon Blomberg" <blomsp at ozemail.com.au> wrote:

            
#
Cameron Gillies <cgillies <at> ualberta.ca> writes:
Kevin Wright has posted wish on R-wiki for beta mixed effects model. There is no
package for this, but there was a nice article describing such a model. Well, it
is a start.

Gregor
#
Hi Cam,

I like John's suggestion too.  The only thing that I would add to it is
that you might find it worthwhile to use lme() instead of lmer(). The
former permits flexible modeling of the variance, whereas to my knowledge
the latter doesn't, yet.  You might find that with judicious modeling of
the variance, the model assumptions could reasonably be met.

Good luck,

Andrew
On Mon, December 4, 2006 3:38 pm, Cameron Gillies wrote:
Andrew Robinson
Senior Lecturer in Statistics                       Tel: +61-3-8344-9763
Department of Mathematics and Statistics            Fax: +61-3-8344 4599
University of Melbourne, VIC 3010 Australia
Email: a.robinson at ms.unimelb.edu.au    Website: http://www.ms.unimelb.edu.au