Modelling count data in glmer with an apriori model selection
On 17-04-17 08:51 PM, Lorraine Scotson wrote:
Hi All, I am modeling bear distribution in Lao PDR, with sign count data collected on transects, in glmer, using a degrees of freedom spending, apriori modeling approach. I have calculated the number of degrees of freedom my model can afford based on my effective sample size (i.e. number of line transects), with degrees of freedoms calculated as the number of non-intercept model-generated coefficients to be estimated. I have study site as a random effect (n=7).
Out of curiosity, how many df *can* you afford (how many line transects)?
My objectives are to model bear occurrence as a function of covariates, to rank those covariates in order of importance, and predict the distribution of bears throughout the whole country (i.e extrapolate outside study sites). This is my first experience with an apriori modelling strategy, and i have a number of questions for which i have not found answers in the published literature. I would be grateful for any advice anyone may have: - how many degrees of freedom will including a 7-level random effect incur?
If you don't allow for variation in covariate effect across sites, 1. If you allow for (correlated) variation in n covariate effects across sites, n*(n+1)/2. (The number of levels of the random effect does not affect this conclusion, although 7 sites is small for using a random effect - you might end up with a singular model, and have to decide what to do about it).
- My understanding is that i must pick my probability distribution (i.e. Poisson, Neg Bin) apriori, and so i cannot use the usual post model checks to determine is my chosen distribution was appropriate. Is this correct?
You should choose your probability distribution a priori, but you *can* (and should) use post-fitting checks (scale-location, Q-Q, overdispersion analysis, etc.) to see if there are any big problems with your choice.
- My understanding is that i'll be penalized an extra degree of freedom by using a Negative Binomial distribution. Is this correct?
Yes. But this is a case where "saving" a degree of freedom wouldn't be wise.
- How do i decide between using a Poisson or a Negative binomial distribution? Is there some post hoc checks i can do, without exploring the relationship between the response and the predictors, to inform my decision?
Yes. Check for overdispersion.
(The literature tells me that count data are rarely Poisson distributed, and that Negative binomial is the most common distribution that accounts for over dispersion. I have ruled out zero-inflation; my response has plenty of zero's, but i feel they they will be accounted for by the model covariates). - In the context of my study objectives, what are the consequences of using a Poisson distribution when my data are really Negative Binomial (i.e. does the distribution of the residuals of the response really matter?)?
If your data are overdispersed (variance greater than expected from Poisson), you will be in big trouble -- all of your conclusions (p-values, confidence intervals) will be overconfident. I would recommend http://bbolker.github.io/mixedmodels-misc/ , especially "GLMM FAQ" and "supplementary materials for Bolker (2015)", both of which have sections on overdispersion. It would be possible to use a "quasi-likelihood approach" -- correct your estimated confidence intervals and p-values (as well as AICs etc.) for overdispersion, without explicitly using an overdispersed distribution.
Many thanks in advance for any insights you can offer. Best wishes Lorraine