Mariano, There is a huge and important difference between the two approaches suggested for your data. The log ratio of proportions (i.e. the empirical logit of the Yes proportion) estimates the residual variance. The binomial model assumes the residual variance is determined by the arbitrary (and made-up) sample size of 20 "tries" per response, in combination with the estimated mean proportions. To see the arbitrariness, if you don't already, re-express your proportions out of 200, instead of 20, because 0/200, 10/200, ... 200/200 also give your observed responses. The coefficient estimates will be the approximately same but their variances will not. (If you didn't have additional random effects in the model, the coefficient estimates would be exactly the same but the variances would be 1/10's those from N=20). If you are going to use the binomial GLM, I believe you must add overdispersion to the model. Either as an individual random effect, or by using a quasibinomial response distribution. Overdispersion is not necessary for the log proportion response because the residual error variance conceptually estimates that overdispersion. Philip
mixed model with proportion data
2 messages · Dixon, Philip M [STAT], Cade, Brian
Mariano: Just as a follow up on Phil Dixon's comment that is I think spot on, you probably are better off modeling the response as the logit of the proportions. But to more easily deal with true zeros or ones, and to avoid the back-transformation bias associated with means on nonlinear transformations like the logit, you might want to consider estimating your models with logistic quantile regression (see Bottai et al. 2010. Statistics in Medicine 29: 309-317) rather than some mean regression model. This is easily done with a fixed-effects model from the quantreg package. There also are mixed-effects variants of quantile regression but I've not tried to use them in the logistic quantile framework. Some other poster suggested beta regression, which also might be reasonable. In my experience, the logistic quantile regression model has greater flexibility to handle true zeros and ones and odd dispersion patterns than beta regression. And of course, you can back-transform the quantile estimates in the logit scale to the proportion scale without bias. Brian Brian S. Cade, PhD U. S. Geological Survey Fort Collins Science Center 2150 Centre Ave., Bldg. C Fort Collins, CO 80526-8818 email: cadeb at usgs.gov <brian_cade at usgs.gov> tel: 970 226-9326 On Wed, Mar 8, 2017 at 6:20 AM, Dixon, Philip M [STAT] <pdixon at iastate.edu> wrote:
Mariano, There is a huge and important difference between the two approaches suggested for your data. The log ratio of proportions (i.e. the empirical logit of the Yes proportion) estimates the residual variance. The binomial model assumes the residual variance is determined by the arbitrary (and made-up) sample size of 20 "tries" per response, in combination with the estimated mean proportions. To see the arbitrariness, if you don't already, re-express your proportions out of 200, instead of 20, because 0/200, 10/200, ... 200/200 also give your observed responses. The coefficient estimates will be the approximately same but their variances will not. (If you didn't have additional random effects in the model, the coefficient estimates would be exactly the same but the variances would be 1/10's those from N=20). If you are going to use the binomial GLM, I believe you must add overdispersion to the model. Either as an individual random effect, or by using a quasibinomial response distribution. Overdispersion is not necessary for the log proportion response because the residual error variance conceptually estimates that overdispersion. Philip
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology