lmer and a response that is a proportion

John Fox

Sun, Dec 3, 2006 6:24 PM

Dear Cameron,

Given your description, I thought that this might be the case. 

I'd first examine the distribution of the response variable to see what it
looks like. If the values don't push the boundaries of 0 and 1, and their
distribution is unimodal and reasonably symmetric, I'd consider analyzing
them directly using normally distributed errors. If the values do stack up
near 0, 1, or both, I'd consider a transformation, or perhaps a different
family (depending on the pattern); in particular, if they stack up near both
0 and 1, a logit or similar transformation could help. Finally, if you have
many values of 0, 1, or both, then a transformation isn't promising (and,
indeed, the logit wouldn't be defined for these values). In any event, I'd
check diagnostics after a preliminary fit.

I hope this helps,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
--------------------------------

-----Original Message-----
From: Cameron Gillies [mailto:cgillies at ualberta.ca] 
Sent: Sunday, December 03, 2006 6:31 PM
To: Prof Brian Ripley; John Fox
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] lmer and a response that is a proportion

Dear Brian and John,

Thanks for your insight.  I'll clarify a couple of things 
incase it changes your advice.

My response is a ratio of two measures taken during a bird's 
path, which varies from 0  to 1, so I cannot convert it 
columns of the number of successes.  It has to be reported as 
the proportion.  I could logit transform it to make it 
normal, but I am trying to avoid that so I can analyze it directly.

The subjects are individual birds and I have a range of 
sample sizes from each bird (from 8 to >200, average of about 
75 measurements/bird).

Thanks!
Cam

On 12/3/06 3:47 PM, "Prof Brian Ripley" <ripley at stats.ox.ac.uk> wrote:

On Sun, 3 Dec 2006, John Fox wrote:

Dear Cameron,

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch 
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Cameron 
Gillies
Sent: Sunday, December 03, 2006 1:58 PM
To: r-help at stat.math.ethz.ch
Subject: [R] lmer and a response that is a proportion

Greetings all,

I am using lmer (lme4 package) to analyze data where the

response is

a proportion (0 to 1).  It appears to work, but I am wondering if 
the analysis is treating the response appropriately -

i.e. can lmer

do this?

As far as I know, you can specify the response as a proportion, in 
which case the binomial counts would be given via the weights 
argument -- at least that's how it's done in glm(). An alternative 
that should be equivalent is to specify a two-column matrix with 
counts of "successes" and "failures" as the response.

Simply giving

the proportion of successes without the counts wouldn't be

appropriate.

I have used both family=binomial and quasibinomial - is one more 
appropriate when the response is a proportion?  The coefficient 
estimates are identical, but the standard errors are larger with 
family=binomial.

The difference is that in the binomial family the

dispersion is fixed

to 1, while in the quasibinomial family it is estimated as a free 
parameter. If the standard errors are larger with family=binomial, 
then that suggests that the data are underdispersed

(relative to the

binomial); if the difference is substantial -- the factor

is just the

square root of the estimated dispersion -- then the

binomial model is

probably not appropriate for the data.

John's last deduction is appropriate to a GLM, but not

necessarily to

a GLMM. I don't have detailed experience with lmer for

binomial, but I

do for various other fitting routines for GLMM.  Remember

there are at

least two sources of randomness in a GLMM, and let us keep

it simple

and have just a subject effect and a measurement error.  Then if 
over-dispersion is happening within subjects, forcing the binomial 
dispersion (at the measurement level) to 1 tends to increase the 
estimate of the subject-level variance component to

compensate, and in

turn increase some of the standard errors.

(Please note the 'tends' in that para, as the details of

the design do

matter.  For cognescenti, think about plot and sub-plot

treatments in

a split-plot design.)

lmer and a response that is a proportion

Thread (9 messages)