lmer and a response that is a proportion
Hello Simon and John, I'm afraid I need to include random effects, both a random intercept and possibly random coefficients and it doesn't look like betareg can do that. John, the data is spread along the range of 0 to 1 with most values closer to 1, so it does transform well using the logit transformation. I was trying to avoid that though because I was not sure what impact the transformation would have on the random effects or interpretation of the coefficients. Thanks again! Cam
On 12/3/06 7:46 PM, "Simon Blomberg" <blomsp at ozemail.com.au> wrote:
Would beta regression solve your problem? (package betareg) Simon. John Fox wrote:
Dear Cameron, Given your description, I thought that this might be the case. I'd first examine the distribution of the response variable to see what it looks like. If the values don't push the boundaries of 0 and 1, and their distribution is unimodal and reasonably symmetric, I'd consider analyzing them directly using normally distributed errors. If the values do stack up near 0, 1, or both, I'd consider a transformation, or perhaps a different family (depending on the pattern); in particular, if they stack up near both 0 and 1, a logit or similar transformation could help. Finally, if you have many values of 0, 1, or both, then a transformation isn't promising (and, indeed, the logit wouldn't be defined for these values). In any event, I'd check diagnostics after a preliminary fit. I hope this helps, John -------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------
-----Original Message-----
From: Cameron Gillies [mailto:cgillies at ualberta.ca]
Sent: Sunday, December 03, 2006 6:31 PM
To: Prof Brian Ripley; John Fox
Cc: r-help at stat.math.ethz.ch
Subject: Re: [R] lmer and a response that is a proportion
Dear Brian and John,
Thanks for your insight. I'll clarify a couple of things
incase it changes your advice.
My response is a ratio of two measures taken during a bird's
path, which varies from 0 to 1, so I cannot convert it
columns of the number of successes. It has to be reported as
the proportion. I could logit transform it to make it
normal, but I am trying to avoid that so I can analyze it directly.
The subjects are individual birds and I have a range of
sample sizes from each bird (from 8 to >200, average of about
75 measurements/bird).
Thanks!
Cam
On 12/3/06 3:47 PM, "Prof Brian Ripley" <ripley at stats.ox.ac.uk> wrote:
On Sun, 3 Dec 2006, John Fox wrote:
Dear Cameron,
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Cameron
Gillies
Sent: Sunday, December 03, 2006 1:58 PM
To: r-help at stat.math.ethz.ch
Subject: [R] lmer and a response that is a proportion
Greetings all,
I am using lmer (lme4 package) to analyze data where the
response is
a proportion (0 to 1). It appears to work, but I am wondering if
the analysis is treating the response appropriately -
i.e. can lmer
do this?
As far as I know, you can specify the response as a proportion, in
which case the binomial counts would be given via the weights
argument -- at least that's how it's done in glm(). An alternative
that should be equivalent is to specify a two-column matrix with
counts of "successes" and "failures" as the response.
Simply giving
the proportion of successes without the counts wouldn't be
appropriate.
I have used both family=binomial and quasibinomial - is one more
appropriate when the response is a proportion? The coefficient
estimates are identical, but the standard errors are larger with
family=binomial.
The difference is that in the binomial family the
dispersion is fixed
to 1, while in the quasibinomial family it is estimated as a free
parameter. If the standard errors are larger with family=binomial,
then that suggests that the data are underdispersed
(relative to the
binomial); if the difference is substantial -- the factor
is just the
square root of the estimated dispersion -- then the
binomial model is
probably not appropriate for the data.
John's last deduction is appropriate to a GLM, but not
necessarily to
a GLMM. I don't have detailed experience with lmer for
binomial, but I
do for various other fitting routines for GLMM. Remember
there are at
least two sources of randomness in a GLMM, and let us keep
it simple
and have just a subject effect and a measurement error. Then if
over-dispersion is happening within subjects, forcing the binomial
dispersion (at the measurement level) to 1 tends to increase the
estimate of the subject-level variance component to
compensate, and in
turn increase some of the standard errors.
(Please note the 'tends' in that para, as the details of
the design do
matter. For cognescenti, think about plot and sub-plot
treatments in
a split-plot design.)
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.