Dear R-sig-me, How can I validate the fit of a bionomial (presence/absence) GLM? Normally in linear modelling there is a nice array of tools (tukey-anscombe, QQ plots, residuals vs explanatory variables, correlation plots) that can be used to convince yourself that the fit is ok. But when you start dealing with a bionomial (presence absence) GLM, the whole thing kind of breaks down and starts getting ugly. For a poisson GLM, you can go for pearson residuals - but what would be the equivalent for a bionomial GLM? Does anyone have suggestions how to approach this problem? Are there any "best-practices" that I am unaware of in this regard? Best wishes, Mark
Model validation for Presence / Absence (binomial) GLMs
6 messages · Hugh Sturrock, Mark Payne, Chris Howden +2 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130627/98259f26/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130627/d6d22610/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20130628/1a74d55f/attachment.pl>
Chris Howden <chris at ...> writes:
This is something I always battle with given the plethora of great model fitting methods available for other models. I always use a variant of Hugh's suggestion and look at the % of correct predictions between models as a quick model fitting statistic. And for overdispersion I believe one way is to fit individual level random effects and see if this is a substantively better model. There is more on this in the wiki http://glmm.wikidot.com/faq
Yes, but this is unidentifiable for Bernoulli responses (as also explained there). It's not as systematic, but where possible I like to compare parametric fits to a less-parametric fit, either a (marginal) GAM fit or binning the data and computing (marginal) mean proportions (and possibly binomial CIs) within bins (the latter is essentially the basis of the Hosmer-Lemeshow test). The effects of other variables might lead to either a false positive or a false negative when comparing non-parametric marginal to parametric conditional predictions, but it's a start. Ben Bolker
Ben Bolker <bbolker at ...> writes:
Chris Howden <chris <at> ...> writes:
This is something I always battle with given the plethora of great model fitting methods available for other models. I always use a variant of Hugh's suggestion and look at the % of correct predictions between models as a quick model fitting statistic. And for overdispersion I believe one way is to fit individual level random effects and see if this is a substantively better model. There is more on this in the wiki http://glmm.wikidot.com/faq
--- snip --- Also see the binomTools package on CRAN for some diagnostic tests for binomial models. Ken