Model fit with a Poisson, cross-validation?
I'm not sure I would call this "cross-validation" (unless I misunderstand what you're trying to do). CV usually means re-fitting the model with one or more data points held out each time to see how much the results vary. Andrew Gelman talks a lot in (Gelman and Hill 2006) about "posterior predictive checks", which may be close to what you have in mind. Depending on what you want to do, the raw material would normally be provided by the simulate() method for a fitted GLMM (mer) object, but I think it doesn't work with the current released version of lme4 -- there is one in the "allcoef" branch. An alternative is to download <http://glmm.wikidot.com/local--files/basic-glmm-simulation/glmmfuns.R> and use the my.mer.sim() function to simulate from the fitted model. For what it's worth, your description of your fitting process sounds sensible. good luck, Ben Bolker
Lindsay Reynolds wrote:
Hello List, I am in the process of learning mixed models in R and have a basic question. I am currently working on a model selection analysis with a suite of mixed models and Poisson-distributed count data. After reading Bolker et al 2009 (Trends In Ecology & Evolution 24:127-135) and having a basic understanding of standard model selection analysis (Burnam & Anderson) I was convinced that I could use the AICc alone to determine the best models. However, it has been suggested to me that I also include some sort of "R^2" value in my analysis to measure absolute fit of the model to the data. Since this does not exist for mixed models with Poisson distributed data, is was further suggested that I try cross-validating my models by building a predicted data set that I could compare to my observed data set.Can anyone point me to references who have done this sort of thing with mixed models in R? I would be much obliged. More details on my analysis: My data are counts of trees established per year within 'site'. I have built several models that include various combinations of climate variables as fixed explanatory variables and all models have 'site' as a random effect. In every model I include a continuous predictor variable called 'year' that accounts for the fact that we expect there to always be more young trees than older trees due to natural mortality. (year = 1,2,3... n). I have tested for overdispersion using penalized, weighted residual sum of squares (pwrss) divided by the number of observations: pwrss/n. The values range between 0.9 and 2. I have interpreted this as my data are not too overdispersed so I have continued with using the Poisson distribution in my models. Also, I have run all my models with Poisson and with quasiPoisson and the results are very similar. My models look like this, with variations on the fixed effects: rosite<-glmer(trees~wy+wy1+year+(1|site), family=poisson) Many thanks, Lindsay ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Lindsay Reynolds Ph.D. Candidate Graduate Degree Program in Ecology Office location: Forestry 208 Colorado State University Campus Delivery 1472 Ft. Collins, CO 80523-1878
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc