Skip to content

glm for ratio [0,1] data

5 messages · Bálint Czúcz, Rubén Roa, Peter Solymos +2 more

#
Dear List,

does anyone know a good way to perform GLM on ratio data (i.e. data
between 0 and 1)? Binomial GLM is quite straightforward to use if you
have integer numbers for successes/failures. But how to proceed if you
only have the ratio? This can occur in a multitude of ways, e.g the
response variable is the estimated cover of a species, percentage of
canopy lost, etc.

One solution I know about is to try to transform such responses to
normal with the arcsine-squarroot transformation, and use lm on the
transformed response -- e.g. Crawley (2007, The R Book, p. 570.)
explicitely suggests this strategy.

But I would still be interested if there is a glm approach that could
be used with the untransformed data. After hours spent with searching
for literature on such a glm, I couldn't find any. Do you know of
some?

I would also be interested what happens if I just proceed with a
binomial glm with the response being between [0,1] and weights left to
1. I know glm() will throw a warning -- but it also produces an
output. Can this output contain some valid, interpretable results, or
is it completely bullshit because of the violation of the assumptions?

Thank you!
B?lint


--
B?lint Cz?cz
Institute of Ecology and Botany of the Hungarian Academy of Sciences
H-2163 V?cr?t?t, Alkotm?ny u. 2-4. HUNGARY
Tel: +36 28 360122/137  +36 70 7034692
magyar nyelv? blog: http://atermeszettorvenye.blogspot.com/
#
B?lint Cz?cz wrote:
There is a glm-ish approach to model data in the interval (0,1) using 
the Beta distribution and the betareg function of the betareg package.
See also:
Ferrari, S.L.P., Cribari-Neto, F. 2004. Beta regression for modelling 
rates and proportions. Journal of Applied Statistics 31(7):799-815.

HTH

Rub?n
#
Hi B?lint,

Here are my two cents.

By using LM with transformed data (which transformation can also be
logit, loglog, cloglog, probit) you loose the Binomial error
structure, because you won't follow the trial/success experiment
scheme. But percent cover is not that kind of [0,1] data where this
sampling is assumed, I think that's why you have asked :)

If your data is an estimate of a hidden response, than there must be
ways to account for this, but I can only recall an example where e.g.
Y is Poisson, but you observe it as ordinal (0, few, many). So you can
establish cutoff values to get ordinal response from you percent
cover, and use a hierarchical model in BUGS/JAGS (see WinBUGS manual
for an example).

Cheers,

Peter
On Mon, Aug 31, 2009 at 6:24 AM, B?lint Cz?cz<czucz at botanika.hu> wrote:
#
Hi,
Venables and Ripley, commenting on the use of glm with binomial family
(MASS book, page 190):
"If  the response is a numeric vector it is assumed to hold the data in
a ratio form, y[i] = s[i]/a[i], in which case tha a[i]s must be given as
a vector of weights using the weights argument".
So, if your ratio comes from e.g. estimating cover as s[i] cells occupied
from a total of a[i] cells in a sampling grid, you still can use the
binomial glm.

I recal that another possible aproch could be betaregression (see package
betareg).

Cheers,

Marcelino

Con fecha 31/8/2009, "Peter Solymos" <solymos at ualberta.ca> escribi?: