Hi R users, Just a query as to whether lme4 can handle beta-binomial distributions as I read that this was not available. If not, any suggestions on how to handle such a distribution to plot the following model: y<-cbind(Biotic,Abiotic) m1<-lmer(y~Treatment+Month.rain+(1|Month)+(1|Block/EnclosureID/Quadrat)) y referring to percentage cover of biotic matter. Cheers, Christine
Beta-binomial distributions with lmer?
16 messages · Christine Griffiths, ONKELINX, Thierry, Ben Bolker +5 more
No. You can use a quasi-binomial model, although the support is a little bit spotty (and beware that quasi- models may falsely report inflation of the random effects). Ben Bolker
Christine Griffiths wrote:
Hi R users, Just a query as to whether lme4 can handle beta-binomial distributions as I read that this was not available. If not, any suggestions on how to handle such a distribution to plot the following model: y<-cbind(Biotic,Abiotic) m1<-lmer(y~Treatment+Month.rain+(1|Month)+(1|Block/EnclosureID/Quadrat)) y referring to percentage cover of biotic matter. Cheers, Christine
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
Thanks. I was hoping for a miracle that this had been developed within the last couple of months. I am on the stats learning curve and am not quite sure how flexible to be with regards to distributions. Is quasibinomial acceptable, despite having data with a lot of 0s and a lot of 100s? Many thanks in advance, Christine
--On 10 June 2009 09:18 -0400 Ben Bolker <bolker at ufl.edu> wrote:
No. You can use a quasi-binomial model, although the support is a little bit spotty (and beware that quasi- models may falsely report inflation of the random effects). Ben Bolker Christine Griffiths wrote:
Hi R users, Just a query as to whether lme4 can handle beta-binomial distributions as I read that this was not available. If not, any suggestions on how to handle such a distribution to plot the following model: y<-cbind(Biotic,Abiotic) m1<-lmer(y~Treatment+Month.rain+(1|Month)+(1|Block/EnclosureID/Quadrat)) y referring to percentage cover of biotic matter. Cheers, Christine
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
That's a good question, answers will differ. Since "all models are wrong" anyway, provided that a mean-variance relationship of V = phi*N*p*(1-p) seems plausible, I would say you should go for it. You're near the cutting edge anyway ... (I don't have a copy, but you might see whether Zuur et al's book has anything to say on the subject -- they're very pragmatic ecologists, and I think they use GEE/quasi models quite a lot ...) Ben Bolker
Christine Griffiths wrote:
Thanks. I was hoping for a miracle that this had been developed within the last couple of months. I am on the stats learning curve and am not quite sure how flexible to be with regards to distributions. Is quasibinomial acceptable, despite having data with a lot of 0s and a lot of 100s? Many thanks in advance, Christine --On 10 June 2009 09:18 -0400 Ben Bolker <bolker at ufl.edu> wrote:
No. You can use a quasi-binomial model, although the support is a little bit spotty (and beware that quasi- models may falsely report inflation of the random effects). Ben Bolker Christine Griffiths wrote:
Hi R users, Just a query as to whether lme4 can handle beta-binomial distributions as I read that this was not available. If not, any suggestions on how to handle such a distribution to plot the following model: y<-cbind(Biotic,Abiotic) m1<-lmer(y~Treatment+Month.rain+(1|Month)+(1|Block/EnclosureID/Quadrat)) y referring to percentage cover of biotic matter. Cheers, Christine
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
Dear Christine, We had recently a vivid discussion on whether it is appropriate to model percentages by a (quasi)binomial model. We were modelling the precentage of leaves that is missing from trees. The mixed model with the binomial family had random effects with extremly small variances. My colleague argued that this percentage did not come from a bernouilli experiment. And hence the binomial family was not appropriate. He suggested to put the percentage on a 0 to 100 scale and apply a log(x+1) transformation. This resulted in a linear mixed model with random effects that had reasonable variances. This convinced me that the binomial family only makes sense with binary data. HTH, Thierry ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Ben Bolker Verzonden: woensdag 10 juni 2009 15:59 Aan: Christine Griffiths CC: r-sig-mixed-models at r-project.org Onderwerp: Re: [R-sig-ME] Beta-binomial distributions with lmer? That's a good question, answers will differ. Since "all models are wrong" anyway, provided that a mean-variance relationship of V = phi*N*p*(1-p) seems plausible, I would say you should go for it. You're near the cutting edge anyway ... (I don't have a copy, but you might see whether Zuur et al's book has anything to say on the subject -- they're very pragmatic ecologists, and I think they use GEE/quasi models quite a lot ...) Ben Bolker
Christine Griffiths wrote:
Thanks. I was hoping for a miracle that this had been developed within
the last couple of months. I am on the stats learning curve and am not quite sure how flexible to
be with regards to distributions. Is quasibinomial acceptable, despite having data with a lot of 0s and a lot of 100s? Many thanks in advance, Christine --On 10 June 2009 09:18 -0400 Ben Bolker <bolker at ufl.edu> wrote:
No. You can use a quasi-binomial model, although the support is a little bit spotty (and beware that quasi- models may falsely report inflation of the random effects). Ben Bolker Christine Griffiths wrote:
Hi R users, Just a query as to whether lme4 can handle beta-binomial distributions as I read that this was not available. If not, any suggestions on how to handle such a distribution to plot
the following model: y<-cbind(Biotic,Abiotic) m1<-lmer(y~Treatment+Month.rain+(1|Month)+(1|Block/EnclosureID/Quadr at)) y referring to percentage cover of biotic matter. Cheers, Christine
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu /
www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
Yes, but ... If the data get "scrunched" near 100% (as well as near zero), then I'm not sure that this procedure would lead to stable variances? (If it does, that's great.) Why not logit((proportion+m)/(1+2*m)) [where m is a small value which can be interpreted as coming from a Bayesian prior, if you like] instead? Once we've done all that, we're getting pretty close to a quasi-binomial model anyway ... (It sounds like all the N values are the same in this example anyway, so there's no scaling of variance with N to worry about.)
ONKELINX, Thierry wrote:
Dear Christine, We had recently a vivid discussion on whether it is appropriate to model percentages by a (quasi)binomial model. We were modelling the precentage of leaves that is missing from trees. The mixed model with the binomial family had random effects with extremly small variances. My colleague argued that this percentage did not come from a bernouilli experiment. And hence the binomial family was not appropriate. He suggested to put the percentage on a 0 to 100 scale and apply a log(x+1) transformation. This resulted in a linear mixed model with random effects that had reasonable variances. This convinced me that the binomial family only makes sense with binary data. HTH, Thierry ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Ben Bolker Verzonden: woensdag 10 juni 2009 15:59 Aan: Christine Griffiths CC: r-sig-mixed-models at r-project.org Onderwerp: Re: [R-sig-ME] Beta-binomial distributions with lmer? That's a good question, answers will differ. Since "all models are wrong" anyway, provided that a mean-variance relationship of V = phi*N*p*(1-p) seems plausible, I would say you should go for it. You're near the cutting edge anyway ... (I don't have a copy, but you might see whether Zuur et al's book has anything to say on the subject -- they're very pragmatic ecologists, and I think they use GEE/quasi models quite a lot ...) Ben Bolker Christine Griffiths wrote:
Thanks. I was hoping for a miracle that this had been developed within
the last couple of months. I am on the stats learning curve and am not quite sure how flexible to
be with regards to distributions. Is quasibinomial acceptable, despite having data with a lot of 0s and a lot of 100s? Many thanks in advance, Christine --On 10 June 2009 09:18 -0400 Ben Bolker <bolker at ufl.edu> wrote:
No. You can use a quasi-binomial model, although the support is a little bit spotty (and beware that quasi- models may falsely report inflation of the random effects). Ben Bolker Christine Griffiths wrote:
Hi R users, Just a query as to whether lme4 can handle beta-binomial distributions as I read that this was not available. If not, any suggestions on how to handle such a distribution to plot
the following model: y<-cbind(Biotic,Abiotic) m1<-lmer(y~Treatment+Month.rain+(1|Month)+(1|Block/EnclosureID/Quadr at)) y referring to percentage cover of biotic matter. Cheers, Christine
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu /
www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
Dear Ben and Thierry, Thank you for the advice. I tried to do both suggested methods, however got stumped on Ben's suggestion of logit. Thierry's suggestion did improve the variances (e.g. 7.7e-04 to 1.94 for the residual variance) when I used quasipoisson family errors. Given that the values aren't discrete I am not sure this is correct. Ben you only suggest this method if it leads to "stable variance". I have tried searching what is meant by this term, but have not found any information. If you could clarify or point me in the right direction I would gratefully appreciate the assistance. Cheers Christine
--On 10 June 2009 10:25 -0400 Ben Bolker <bolker at ufl.edu> wrote:
Yes, but ... If the data get "scrunched" near 100% (as well as near zero), then I'm not sure that this procedure would lead to stable variances? (If it does, that's great.) Why not logit((proportion+m)/(1+2*m)) [where m is a small value which can be interpreted as coming from a Bayesian prior, if you like] instead? Once we've done all that, we're getting pretty close to a quasi-binomial model anyway ... (It sounds like all the N values are the same in this example anyway, so there's no scaling of variance with N to worry about.) ONKELINX, Thierry wrote:
Dear Christine, We had recently a vivid discussion on whether it is appropriate to model percentages by a (quasi)binomial model. We were modelling the precentage of leaves that is missing from trees. The mixed model with the binomial family had random effects with extremly small variances. My colleague argued that this percentage did not come from a bernouilli experiment. And hence the binomial family was not appropriate. He suggested to put the percentage on a 0 to 100 scale and apply a log(x+1) transformation. This resulted in a linear mixed model with random effects that had reasonable variances. This convinced me that the binomial family only makes sense with binary data. HTH, Thierry ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Ben Bolker Verzonden: woensdag 10 juni 2009 15:59 Aan: Christine Griffiths CC: r-sig-mixed-models at r-project.org Onderwerp: Re: [R-sig-ME] Beta-binomial distributions with lmer? That's a good question, answers will differ. Since "all models are wrong" anyway, provided that a mean-variance relationship of V = phi*N*p*(1-p) seems plausible, I would say you should go for it. You're near the cutting edge anyway ... (I don't have a copy, but you might see whether Zuur et al's book has anything to say on the subject -- they're very pragmatic ecologists, and I think they use GEE/quasi models quite a lot ...) Ben Bolker Christine Griffiths wrote:
Thanks. I was hoping for a miracle that this had been developed within
the last couple of months. I am on the stats learning curve and am not quite sure how flexible to
be with regards to distributions. Is quasibinomial acceptable, despite having data with a lot of 0s and a lot of 100s? Many thanks in advance, Christine --On 10 June 2009 09:18 -0400 Ben Bolker <bolker at ufl.edu> wrote:
No. You can use a quasi-binomial model, although the support is a little bit spotty (and beware that quasi- models may falsely report inflation of the random effects). Ben Bolker Christine Griffiths wrote:
Hi R users, Just a query as to whether lme4 can handle beta-binomial distributions as I read that this was not available. If not, any suggestions on how to handle such a distribution to plot
the following model: y<-cbind(Biotic,Abiotic) m1<-lmer(y~Treatment+Month.rain+(1|Month)+(1|Block/EnclosureID/Quadr at)) y referring to percentage cover of biotic matter. Cheers, Christine
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu /
www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
-- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
---------------------- Christine Griffiths School of Biological Sciences University of Bristol Woodland Road Bristol BS8 1UG Tel: 0117 9287593 Fax 0117 3317985 Christine.Griffiths at bristol.ac.uk http://www.bio.bris.ac.uk/research/mammal/tortoises.html
Hi, You can fit this model with AD Model Builders random effects module which is now freely available at http://admb-project.org. Rather than arguing a priori about the applicability of various models I prefer to fit them and look at the estimates and diagnostics for each one. dave
David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3S3 Canada Phone/FAX 250-655-3364 http://otter-rsch.com
Christine Griffiths wrote:
Dear Ben and Thierry, Thank you for the advice. I tried to do both suggested methods, however got stumped on Ben's suggestion of logit. Thierry's suggestion did improve the variances (e.g. 7.7e-04 to 1.94 for the residual variance) when I used quasipoisson family errors. Given that the values aren't discrete I am not sure this is correct. Ben you only suggest this method if it leads to "stable variance". I have tried searching what is meant by this term, but have not found any information. If you could clarify or point me in the right direction I would gratefully appreciate the assistance. Cheers Christine
If you transformed the data in some significant way, then the residual variances aren't necessarily going to be comparable, so I'm not sure I would take that as confirmation. I think Thierry meant to suggest a LMM (i.e., assume normal distributions, no transformation after the initial one) rather than a GLMM (link function/exponential-family distribution or quasi-distribution). You may find more on "stabilizing variance" rather than "stable variance" -- what I meant was that the variability in the Pearson residuals (residuals scaled by the expected standard deviation, which is what lmer gives you) should be independent of the fitted value -- so try plot(sqrt(residuals(model)) ~ fitted(model)) and see if the "amplitude" appears reasonably constant (this is approximately the same as the "scale-location" plot that plot.lm gives you for a linear model).
--On 10 June 2009 10:25 -0400 Ben Bolker <bolker at ufl.edu> wrote:
Yes, but ... If the data get "scrunched" near 100% (as well as near zero), then I'm not sure that this procedure would lead to stable variances? (If it does, that's great.) Why not logit((proportion+m)/(1+2*m)) [where m is a small value which can be interpreted as coming from a Bayesian prior, if you like] instead? Once we've done all that, we're getting pretty close to a quasi-binomial model anyway ... (It sounds like all the N values are the same in this example anyway, so there's no scaling of variance with N to worry about.) ONKELINX, Thierry wrote:
Dear Christine, We had recently a vivid discussion on whether it is appropriate to model percentages by a (quasi)binomial model. We were modelling the precentage of leaves that is missing from trees. The mixed model with the binomial family had random effects with extremly small variances. My colleague argued that this percentage did not come from a bernouilli experiment. And hence the binomial family was not appropriate. He suggested to put the percentage on a 0 to 100 scale and apply a log(x+1) transformation. This resulted in a linear mixed model with random effects that had reasonable variances. This convinced me that the binomial family only makes sense with binary data. HTH, Thierry ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Ben Bolker Verzonden: woensdag 10 juni 2009 15:59 Aan: Christine Griffiths CC: r-sig-mixed-models at r-project.org Onderwerp: Re: [R-sig-ME] Beta-binomial distributions with lmer? That's a good question, answers will differ. Since "all models are wrong" anyway, provided that a mean-variance relationship of V = phi*N*p*(1-p) seems plausible, I would say you should go for it. You're near the cutting edge anyway ... (I don't have a copy, but you might see whether Zuur et al's book has anything to say on the subject -- they're very pragmatic ecologists, and I think they use GEE/quasi models quite a lot ...) Ben Bolker Christine Griffiths wrote:
Thanks. I was hoping for a miracle that this had been developed within the last couple of months. I am on the stats learning curve and am not quite sure how flexible to be with regards to distributions. Is quasibinomial acceptable, despite having data with a lot of 0s and a lot of 100s? Many thanks in advance, Christine --On 10 June 2009 09:18 -0400 Ben Bolker <bolker at ufl.edu> wrote:
No. You can use a quasi-binomial model, although the support is a little bit spotty (and beware that quasi- models may falsely report inflation of the random effects). Ben Bolker Christine Griffiths wrote:
Hi R users, Just a query as to whether lme4 can handle beta-binomial distributions as I read that this was not available. If not, any suggestions on how to handle such a distribution to plot the following model: y<-cbind(Biotic,Abiotic) m1<-lmer(y~Treatment+Month.rain+(1|Month)+(1|Block/EnclosureID/Quadr at)) y referring to percentage cover of biotic matter. Cheers, Christine
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
-- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
---------------------- Christine Griffiths School of Biological Sciences University of Bristol Woodland Road Bristol BS8 1UG Tel: 0117 9287593 Fax 0117 3317985 Christine.Griffiths at bristol.ac.uk http://www.bio.bris.ac.uk/research/mammal/tortoises.html
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
Dear Ben, Indeed. I suggested to use a LMM with the transformed data. Then I would have a look at how the residuals behave. Best regards, Thierry ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-sig-mixed-models-bounces at r-project.org [mailto:r-sig-mixed-models-bounces at r-project.org] Namens Ben Bolker Verzonden: woensdag 10 juni 2009 20:14 Aan: Christine Griffiths CC: r-sig-mixed-models at r-project.org Onderwerp: Re: [R-sig-ME] Beta-binomial distributions with lmer?
Christine Griffiths wrote:
Dear Ben and Thierry, Thank you for the advice. I tried to do both suggested methods, however got stumped on Ben's suggestion of logit. Thierry's suggestion
did improve the variances (e.g. 7.7e-04 to 1.94 for the residual variance) when I used quasipoisson family errors. Given that the values aren't discrete I am not sure this is correct. Ben you only suggest this method if it leads to "stable variance". I have tried searching what is meant by this term, but have not found any information. If you could clarify or point me in the right direction I
would gratefully appreciate the assistance.
Cheers Christine
If you transformed the data in some significant way, then the residual variances aren't necessarily going to be comparable, so I'm not sure I would take that as confirmation. I think Thierry meant to suggest a LMM (i.e., assume normal distributions, no transformation after the initial one) rather than a GLMM (link function/exponential-family distribution or quasi-distribution). You may find more on "stabilizing variance" rather than "stable variance" -- what I meant was that the variability in the Pearson residuals (residuals scaled by the expected standard deviation, which is what lmer gives you) should be independent of the fitted value -- so try plot(sqrt(residuals(model)) ~ fitted(model)) and see if the "amplitude" appears reasonably constant (this is approximately the same as the "scale-location" plot that plot.lm gives you for a linear model). Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
Hello All, I would like to use GLMMs with a binary response variable (logit link) to model the effects of three environmental covariates on whether resource units were used or unused by a wildlife species. I have 15 different study areas, and very different numbers of used and unused units in each. I'm interested in using fixed effects parameters estimates to predict the relative probabilities that resource units will be used across the entire population of study areas. Numbers of used and unused units in each area look something like this: Area Unused Used 01 281 2 02 4415 1 03 343 30 04 256 1 05 2052 4 06 4050 1 07 238 2 08 743 3 09 2476 18 10 2524 1 11 805 1 12 754 4 13 272 1 14 52 1 15 124 1 I've been using study area as a grouping factor for a random intercept and random slope effects: fullmodel<-glmer(Used~1+x1+x2+x3+(1+x1+x2+x3|Area), family=binomial, data=mydata) Using 'glmer', I've been able to fit models to my data without convergence issues, model fit is pretty good, and the results seem to make sense. My questions are: Given that the number of used units in each area are very unbalanced, to what degree can I generalize across the entire population of study areas? Will my estimates for the fixed effects parameters be so reliant on areas 3 and 9 that I'm really just limited to inferences on these two areas? Is there a way to quantify the relative weight of each study area in the estimation of the fixed effects parameters (i.e. the degree to which I can generalize across the entire population of study areas)? I've read of borrow strength, which will certainly play a big role with this dataset, but I haven't found any examples of datasets that are as unbalanced as mine. I realize that my questions relate to mixed models in general and less to their implementation in R, so I hope I'm not out-of-line in posting these questions here. I'd guess there are probably answers to these questions in the literature, so I'd truly appreciate any advice on where I should look for more info. Thanks in advance, -Grant
Hello, as a simple quick check, have you tried fitting your model without area 3/9 or without both of them and compared the estimates? You could then also look at how well your fixed effects estimates predict the values in the left-out areas. HTH Cheers, Luca ----- Original Message ----- From: Grant T. Stokke <gts127 at psu.edu> To: r-sig-mixed-models at r-project.org Sent: Wed, 10 Jun 2009 23:41:52 -0400 (EDT) Subject: [R-sig-ME] GLMMs with unequal group sizes Hello All, I would like to use GLMMs with a binary response variable (logit link) to model the effects of three environmental covariates on whether resource units were used or unused by a wildlife species. I have 15 different study areas, and very different numbers of used and unused units in each. I'm interested in using fixed effects parameters estimates to predict the relative probabilities that resource units will be used across the entire population of study areas. Numbers of used and unused units in each area look something like this: Area Unused Used 01 281 2 02 4415 1 03 343 30 04 256 1 05 2052 4 06 4050 1 07 238 2 08 743 3 09 2476 18 10 2524 1 11 805 1 12 754 4 13 272 1 14 52 1 15 124 1 I've been using study area as a grouping factor for a random intercept and random slope effects: fullmodel<-glmer(Used~1+x1+x2+x3+(1+x1+x2+x3|Area), family=binomial, data=mydata) Using 'glmer', I've been able to fit models to my data without convergence issues, model fit is pretty good, and the results seem to make sense. My questions are: Given that the number of used units in each area are very unbalanced, to what degree can I generalize across the entire population of study areas? Will my estimates for the fixed effects parameters be so reliant on areas 3 and 9 that I'm really just limited to inferences on these two areas? Is there a way to quantify the relative weight of each study area in the estimation of the fixed effects parameters (i.e. the degree to which I can generalize across the entire population of study areas)? I've read of borrow strength, which will certainly play a big role with this dataset, but I haven't found any examples of datasets that are as unbalanced as mine. I realize that my questions relate to mixed models in general and less to their implementation in R, so I hope I'm not out-of-line in posting these questions here. I'd guess there are probably answers to these questions in the literature, so I'd truly appreciate any advice on where I should look for more info. Thanks in advance, -Grant _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Hello Grant, related to the previous remark (re-fitting the model without some of the areas), you might be interested in the influence.ME package, which I developed. Although focused on measures of influential data regarding variables at the level of the area (in your example), you could use the estex() function which will return the fixed estimates iteratively excluding each of the areas. If you need help with using the package, you can contact me off-list. Kind regards, Rense Nieuwenhuis
On 11 jun 2009, at 10:12, Luca Borger wrote:
Hello, as a simple quick check, have you tried fitting your model without area 3/9 or without both of them and compared the estimates? You could then also look at how well your fixed effects estimates predict the values in the left-out areas. HTH Cheers, Luca ----- Original Message ----- From: Grant T. Stokke <gts127 at psu.edu> To: r-sig-mixed-models at r-project.org Sent: Wed, 10 Jun 2009 23:41:52 -0400 (EDT) Subject: [R-sig-ME] GLMMs with unequal group sizes Hello All, I would like to use GLMMs with a binary response variable (logit link) to model the effects of three environmental covariates on whether resource units were used or unused by a wildlife species. I have 15 different study areas, and very different numbers of used and unused units in each. I'm interested in using fixed effects parameters estimates to predict the relative probabilities that resource units will be used across the entire population of study areas. Numbers of used and unused units in each area look something like this: Area Unused Used 01 281 2 02 4415 1 03 343 30 04 256 1 05 2052 4 06 4050 1 07 238 2 08 743 3 09 2476 18 10 2524 1 11 805 1 12 754 4 13 272 1 14 52 1 15 124 1 I've been using study area as a grouping factor for a random intercept and random slope effects: fullmodel<-glmer(Used~1+x1+x2+x3+(1+x1+x2+x3|Area), family=binomial, data=mydata) Using 'glmer', I've been able to fit models to my data without convergence issues, model fit is pretty good, and the results seem to make sense. My questions are: Given that the number of used units in each area are very unbalanced, to what degree can I generalize across the entire population of study areas? Will my estimates for the fixed effects parameters be so reliant on areas 3 and 9 that I'm really just limited to inferences on these two areas? Is there a way to quantify the relative weight of each study area in the estimation of the fixed effects parameters (i.e. the degree to which I can generalize across the entire population of study areas)? I've read of borrow strength, which will certainly play a big role with this dataset, but I haven't found any examples of datasets that are as unbalanced as mine. I realize that my questions relate to mixed models in general and less to their implementation in R, so I hope I'm not out-of-line in posting these questions here. I'd guess there are probably answers to these questions in the literature, so I'd truly appreciate any advice on where I should look for more info. Thanks in advance, -Grant
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
On Wed, Jun 10, 2009 at 11:41 PM, Grant T. Stokke<gts127 at psu.edu> wrote:
Hello All, I would like to use GLMMs with a binary response variable (logit link) to model the effects of three environmental covariates on whether resource units were used or unused by a wildlife species. I have 15 different study areas, and very different numbers of used and unused units in each. I'm interested in using fixed effects parameters estimates to predict the relative probabilities that resource units will be used across the entire population of study areas. Numbers of used and unused units in each area look something like this: Area Unused Used 01 281 2 02 4415 1 03 343 30 04 256 1 05 2052 4 06 4050 1 07 238 2 08 743 3 09 2476 18 10 2524 1 11 805 1 12 754 4 13 272 1 14 52 1 15 124 1 I've been using study area as a grouping factor for a random intercept and random slope effects: fullmodel<-glmer(Used~1+x1+x2+x3+(1+x1+x2+x3|Area), family=binomial, data=mydata)
Does this mean that the number of Unused units is not included anywhere in the model?
Well there are papers that argue against this sort of ad hoc approach. See for example http://psychology3.anu.edu.au/people/smithson/details/Pubs/Smithson_Verkuilen06.pdf which is about beta regression for bounded dependent variables.
David A. Fournier P.O. Box 2040, Sidney, B.C. V8l 3S3 Canada Phone/FAX 250-655-3364 http://otter-rsch.com
Sorry, I should have been more clear. The number of unused units is included. The response variable is "Used", so that used units have Used=1 and unused units have Used=0. Thanks for the input. I've been looking at the influence of the areas with many used units on parameter estimates (by removing areas 3 and 9), and they appear to be more influential than I would like. Ideally, I'd like each study area to receive approximately equal weight in parameter estimation. Is that possible? Would it be inappropriate to use the 'weight' option in glmer to give study areas more equal weight? Any suggestions? Thanks, -Grant ----- Original Message ----- From: "Daniel Ezra Johnson" <danielezrajohnson at gmail.com> To: "Grant T. Stokke" <gts127 at psu.edu> Cc: <r-sig-mixed-models at r-project.org> Sent: Thursday, June 11, 2009 8:56 AM Subject: Re: [R-sig-ME] GLMMs with unequal group sizes
On Wed, Jun 10, 2009 at 11:41 PM, Grant T. Stokke<gts127 at psu.edu> wrote:
Hello All, I would like to use GLMMs with a binary response variable (logit link) to model the effects of three environmental covariates on whether resource units were used or unused by a wildlife species. I have 15 different study areas, and very different numbers of used and unused units in each. I'm interested in using fixed effects parameters estimates to predict the relative probabilities that resource units will be used across the entire population of study areas. Numbers of used and unused units in each area look something like this: Area Unused Used 01 281 2 02 4415 1 03 343 30 04 256 1 05 2052 4 06 4050 1 07 238 2 08 743 3 09 2476 18 10 2524 1 11 805 1 12 754 4 13 272 1 14 52 1 15 124 1 I've been using study area as a grouping factor for a random intercept and random slope effects: fullmodel<-glmer(Used~1+x1+x2+x3+(1+x1+x2+x3|Area), family=binomial, data=mydata)
Does this mean that the number of Unused units is not included anywhere in the model?