Dealing with Overdispersion in Count Data with Mixed Modeling
On 11/11/2010 02:11 PM, David Stainbrook wrote:
Ben, Putting aside the issues of model convergence, I would like to focus on the issue of how to deal with overdispersion in count data. Typically, a negative binomial or quasi-poisson model deals with overdispersion in situations where it is not appropriate to use Poisson. However, to my knowledge, lme4 does not allow multiple random effects using the negative binomial family
[correction: it doesn't implement negative binomial models at all, although I believe that *in* principle this could be added by using an additional 'k' parameter that controlled the mean-variance relationship, in a way analogous to glm.nb() in the MASS package]. and quasi-poisson has issues with the
likelihood estimation. As you suggested in a previous email, and from the article at <http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=82701 <http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=82701>>, translating the model to a lognormal-Poisson model by using an individual-level random variable and using family=poisson should deal with the overdispersion.
Yes.
I wanted to clarify what you meant by the individual level to make sure that I am doing it correctly. I already have a random intercept for each individual (1|LT), where LT=lowest tag number for each individual, to account for pseudoreplication. For the individual-level random variable to deal with overdispersion, did you mean adding (1|Observation), where Observation is a vector from 1 to the total number of observations or records in the dataset?
yes, exactly.
This is demonstrated here in a very simple model with no covariates: model_1<-lmer(Y~1 + (1|LT) + (1|Observation), data=df3, ! family=poisson) and here with an additional covariate (covx) with both fixed and random effects: model_2<-lmer(Y~1 + (1+covx|LT) + (1|Observation) + covx, data=df3, family=poisson) Is this what you meant and is it coded correctly? I tried this code with an older version of R and lme4, and it yielded an error and no summary data, but after updating both the version of R and the lme4 library, I received a warning "Number of levels of a grouping factor for the random effects is *equal* to n, the number of observations", but it yielded summary information that I took to be legitimate.
That is what I would expect. Ben Bolker
Thanks for the help,
..................................................................
David Stainbrook
M.S. Graduate Research Assistant
Pennsylvania Cooperative Fish & Wildlife Research Unit
The Pennsylvania State University
..................................................................
On Mon, Nov 1, 2010 09:53 PM, *Ben Bolker <bbolker at gmail.com>* wrote:
[cc'ing r-sig-mixed-models: it's best to keep sending replies back to
the list so they can be archived and others can read them, or offer input]
On 10-11-01 01:28 PM, David Stainbrook wrote:
> Ben,
>
> Thanks for your input. I read that article that you suggested and it
> appears they used SAS and Genstat to do their analysis.
Yes (although as I said at the time, I wouldn't actually trust the
methods that they used in Genstat for this problem. I just think their
description of the problem is clear).
> Is it possible
> to use the Poisson-lognormal model in R or translate the model to this
> using R and lmer? Another professor mentioned that I may be able to get
> it to work using a negative binomial model in SAS or ADModel Builder.
> What do you suggest?
Yes, you can use the Poisson-lognormal in recent versions of lme4,
simply by including an individual-level random variable. You may get
warnings.
You could indeed use a negative binomial model in SAS or AD Model
Builder (in ADMB you could also use the lognormal-Poisson model).
> Do you have any idea why Doug allowed the lmer function to fit
> quasipoisson if he doesn't feel that the results will be reliable? I
> would have trusted my results and wouldn't have had any idea that they
> might have been unreliable if he had not said that.
I believe he implemented it a while ago and his opinions have now
changed. (I agree that it might be a good idea to disable this
functionality.)
> Also, do you have any idea how to increase both the default number of
> function evaluations and iterations with the control statement within
> the lmer model statement?
...,control=list(maxIter=2000,maxFN=3000),... should work.
* you seem to be tackling a difficult problem. I appreciate that
you're offering full details on your problem (full scripts and data),
but it's going to take someone else at least half an hour (and probably
quite a bit more) to get up to speed on what you're doing and what's not
working; unfortunately, that's more than most anyone has time for,
unless the problem happens to be something very close to their
interests. Unfortunately, you may well need to find local help for this
(your advisor? a friendly stats professor or graduate student?) - <I already exhausted those options, hence contacting you and Doug>
* it's possible, depending on the complexity of your model, that
you're simply trying to fit too complicated a model. You do have a lot
of data points, but some of your covariates may be strongly correlated.
Have you tried:
- seeing if you can successfully fit a subset of the data points
(this could be faster, allowing you to debug quicker)?
- seeing if you can successfully fit a subset of the covariates, or
which covariates or combinations of covariates are problematic?
- seeing if you can successfully fit a non-mixed (GLM) model,
treating 'individual' as a fixed effect?
- simulating data, possibly in a simplified form, to see if you can
get the right answer when you know what it is?
* lme4 is quite finicky about convergence, on the philosophy that it's
better not to give an answer than to give a wrong one.
R does have its advantages, but if you're up to working with SAS or AD
Model Builder I would recommend you also try those approaches -- see if
you run into the same problems. But I would definitely try some of the
trouble-shooting strategies above, first.
good luck,
Ben Bolker
> Thanks again,
>
> David
>
> On Thu, Oct 28, 2010 04:49 PM, *Ben Bolker <bbolker at gmail.com>*
wrote:
>
> My advice would be to use an individual-level random variable
> (translating to a lognormal-Poisson model, which is qualitatively
> similar to a negative binomial) -- see e.g. Elston et al 2001 for
a
> decent explanation, although you should not necessarily trust the
> numeric methods they use ...
>
> [Elston, D. A., R. Moss, T. Boulinier, C. Arrowsmith, and X. Lambin.
> 2001. Analysis of Aggregation, a Worked Example: Numbers of Ticks on
Red
> Grouse Chicks. Parasitology 122, no. 05: 563-569.
> doi:10.1017/S0031182001007740.
>
>
>
> (Doug, thanks for the vote of confidence!)
>
> cheers
> Ben
>
>
>
>
>