Dear List, does anyone know a good way to perform GLM on ratio data (i.e. data between 0 and 1)? Binomial GLM is quite straightforward to use if you have integer numbers for successes/failures. But how to proceed if you only have the ratio? This can occur in a multitude of ways, e.g the response variable is the estimated cover of a species, percentage of canopy lost, etc. One solution I know about is to try to transform such responses to normal with the arcsine-squarroot transformation, and use lm on the transformed response -- e.g. Crawley (2007, The R Book, p. 570.) explicitely suggests this strategy. But I would still be interested if there is a glm approach that could be used with the untransformed data. After hours spent with searching for literature on such a glm, I couldn't find any. Do you know of some? I would also be interested what happens if I just proceed with a binomial glm with the response being between [0,1] and weights left to 1. I know glm() will throw a warning -- but it also produces an output. Can this output contain some valid, interpretable results, or is it completely bullshit because of the violation of the assumptions? Thank you! B?lint -- B?lint Cz?cz Institute of Ecology and Botany of the Hungarian Academy of Sciences H-2163 V?cr?t?t, Alkotm?ny u. 2-4. HUNGARY Tel: +36 28 360122/137 +36 70 7034692 magyar nyelv? blog: http://atermeszettorvenye.blogspot.com/
glm for ratio [0,1] data
5 messages · Bálint Czúcz, Rubén Roa, Peter Solymos +2 more
B?lint Cz?cz wrote:
Dear List, does anyone know a good way to perform GLM on ratio data (i.e. data between 0 and 1)? Binomial GLM is quite straightforward to use if you have integer numbers for successes/failures. But how to proceed if you only have the ratio? This can occur in a multitude of ways, e.g the response variable is the estimated cover of a species, percentage of canopy lost, etc.
There is a glm-ish approach to model data in the interval (0,1) using the Beta distribution and the betareg function of the betareg package. See also: Ferrari, S.L.P., Cribari-Neto, F. 2004. Beta regression for modelling rates and proportions. Journal of Applied Statistics 31(7):799-815. HTH Rub?n
Hi B?lint, Here are my two cents. By using LM with transformed data (which transformation can also be logit, loglog, cloglog, probit) you loose the Binomial error structure, because you won't follow the trial/success experiment scheme. But percent cover is not that kind of [0,1] data where this sampling is assumed, I think that's why you have asked :) If your data is an estimate of a hidden response, than there must be ways to account for this, but I can only recall an example where e.g. Y is Poisson, but you observe it as ordinal (0, few, many). So you can establish cutoff values to get ordinal response from you percent cover, and use a hierarchical model in BUGS/JAGS (see WinBUGS manual for an example). Cheers, Peter
On Mon, Aug 31, 2009 at 6:24 AM, B?lint Cz?cz<czucz at botanika.hu> wrote:
Dear List, does anyone know a good way to perform GLM on ratio data (i.e. data between 0 and 1)? Binomial GLM is quite straightforward to use if you have integer numbers for successes/failures. But how to proceed if you only have the ratio? This can occur in a multitude of ways, e.g the response variable is the estimated cover of a species, percentage of canopy lost, etc. One solution I know about is to try to transform such responses to normal with the arcsine-squarroot transformation, and use lm on the transformed response -- e.g. Crawley (2007, The R Book, p. 570.) explicitely suggests this strategy. But I would still be interested if there is a glm approach that could be used with the untransformed data. After hours spent with searching for literature on such a glm, I couldn't find any. Do you know of some? I would also be interested what happens if I just proceed with a binomial glm with the response being between [0,1] and weights left to 1. I know glm() will throw a warning -- but it also produces an output. Can this output contain some valid, interpretable results, or is it completely bullshit because of the violation of the assumptions? Thank you! B?lint -- B?lint Cz?cz Institute of Ecology and Botany of the Hungarian Academy of Sciences H-2163 V?cr?t?t, Alkotm?ny u. 2-4. HUNGARY Tel: +36 28 360122/137 ?+36 70 7034692 magyar nyelv? blog: http://atermeszettorvenye.blogspot.com/
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20090831/98c70f62/attachment.pl>
Hi, Venables and Ripley, commenting on the use of glm with binomial family (MASS book, page 190): "If the response is a numeric vector it is assumed to hold the data in a ratio form, y[i] = s[i]/a[i], in which case tha a[i]s must be given as a vector of weights using the weights argument". So, if your ratio comes from e.g. estimating cover as s[i] cells occupied from a total of a[i] cells in a sampling grid, you still can use the binomial glm. I recal that another possible aproch could be betaregression (see package betareg). Cheers, Marcelino Con fecha 31/8/2009, "Peter Solymos" <solymos at ualberta.ca> escribi?:
Hi B??lint, Here are my two cents. By using LM with transformed data (which transformation can also be logit, loglog, cloglog, probit) you loose the Binomial error structure, because you won't follow the trial/success experiment scheme. But percent cover is not that kind of [0,1] data where this sampling is assumed, I think that's why you have asked :) If your data is an estimate of a hidden response, than there must be ways to account for this, but I can only recall an example where e.g. Y is Poisson, but you observe it as ordinal (0, few, many). So you can establish cutoff values to get ordinal response from you percent cover, and use a hierarchical model in BUGS/JAGS (see WinBUGS manual for an example). Cheers, Peter On Mon, Aug 31, 2009 at 6:24 AM, B??lint Cz??cz<czucz at botanika.hu> wrote:
Dear List, does anyone know a good way to perform GLM on ratio data (i.e. data between 0 and 1)? Binomial GLM is quite straightforward to use if you have integer numbers for successes/failures. But how to proceed if you only have the ratio? This can occur in a multitude of ways, e.g the response variable is the estimated cover of a species, percentage of canopy lost, etc. One solution I know about is to try to transform such responses to normal with the arcsine-squarroot transformation, and use lm on the transformed response -- e.g. Crawley (2007, The R Book, p. 570.) explicitely suggests this strategy. But I would still be interested if there is a glm approach that could be used with the untransformed data. After hours spent with searching for literature on such a glm, I couldn't find any. Do you know of some? I would also be interested what happens if I just proceed with a binomial glm with the response being between [0,1] and weights left to 1. I know glm() will throw a warning -- but it also produces an output. Can this output contain some valid, interpretable results, or is it completely bullshit because of the violation of the assumptions? Thank you! B??lint -- B??lint Cz??cz Institute of Ecology and Botany of the Hungarian Academy of Sciences H-2163 V??cr??t??t, Alkotm??ny u. 2-4. HUNGARY Tel: +36 28 360122/137 ??+36 70 7034692 magyar nyelv?? blog: http://atermeszettorvenye.blogspot.com/
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology