Modelling count data in glmer with an apriori model selection - R-SIG-mixed-models

Mon, Apr 17, 2017 5:51 PM #

Hi All,

I am modeling bear distribution in Lao PDR, with sign count data collected
on transects, in glmer, using a degrees of freedom spending, apriori
modeling approach. I have calculated the number of degrees of freedom my
model can afford based on my effective sample size (i.e. number of line
transects), with degrees of freedoms calculated as the number of
non-intercept model-generated coefficients to be estimated. I have study
site as a random effect (n=7).

My objectives are to model bear occurrence as a function of covariates, to
rank those covariates in order of importance, and predict the distribution
of bears throughout the whole country (i.e extrapolate outside study
sites). This is my first experience with an apriori modelling strategy, and
i have a number of questions for which i have not found answers in the
published literature. I would be grateful for any advice anyone may have:

- how many degrees of freedom will including a 7-level random effect incur?

- My understanding is that i must pick my probability distribution (i.e.
Poisson, Neg Bin) apriori, and so i cannot use the usual post model checks
to determine is my chosen distribution was appropriate. Is this correct?

- My understanding is that i'll be penalized an extra degree of freedom by
using a Negative Binomial distribution. Is this correct?

- How do i decide between using a Poisson or a Negative binomial
distribution?  Is there some post hoc checks i can do, without exploring
the relationship between the response and the predictors, to inform my
decision?

(The literature tells me that count data are rarely Poisson distributed,
and that Negative binomial is the most common distribution that accounts
for over dispersion. I have ruled out zero-inflation; my response has
plenty of zero's, but i feel they they will be accounted for by the model
covariates).

- In the context of my study objectives, what are the consequences of using
a Poisson distribution when my data are really Negative Binomial (i.e. does
the distribution of the residuals of the response really matter?)?

Many thanks in advance for any insights you can offer.

Best wishes
Lorraine

Lorraine Scotson, PhD Candidate,
Department of Fisheries, Wildlife and Conservation Biology,
University of Minnesota, USA

Skype ID: lorrainescotson
Tel: +44141 6282079

	[[alternative HTML version deleted]]

Ben Bolker

Mon, Apr 17, 2017 6:03 PM #

On 17-04-17 08:51 PM, Lorraine Scotson wrote:

Out of curiosity, how many df *can* you afford (how many line transects)?

If you don't allow for variation in covariate effect across sites, 1.
   If you allow for (correlated) variation in n covariate effects across
sites, n*(n+1)/2.  (The number of levels of the random effect does not
affect this conclusion, although 7 sites is small for using a random
effect - you might end up with a singular model, and have to decide what
to do about it).

You should choose your probability distribution a priori, but you
*can* (and should) use post-fitting checks (scale-location, Q-Q,
overdispersion analysis, etc.) to see if there are any big problems with
your choice.

Yes.  But this is a case where "saving" a degree of freedom wouldn't
be wise.

Yes.  Check for overdispersion.

If your data are overdispersed (variance greater than expected from
Poisson), you will be in big trouble -- all of your conclusions
(p-values, confidence intervals) will be overconfident.

  I would recommend http://bbolker.github.io/mixedmodels-misc/ ,
especially "GLMM FAQ" and "supplementary materials for Bolker (2015)",
both of which have sections on overdispersion.

  It would be possible to use a "quasi-likelihood approach" -- correct
your estimated confidence intervals and p-values (as well as AICs etc.)
for overdispersion, without explicitly using an overdispersed distribution.