Poisson mixed models: Non-integer response variable in lmer?
[cc'ing back to r-sig-mixed]
On 03/15/2011 01:37 AM, Daniel Barton wrote:
Thanks for your response Ben. I spent some time playing some simulation games in R with vectors drawn from distributions with known parameters and then scaling them for giggles, but as you noted, upsetting the expected mean-variance relationship was clearly the important issue from the get-go (dividing a vector of poisson distributed rates makes them underdispersed!). Yet my friend/colleague made this very strange argument: if /scaling the data creates poisson-distributed rates/ from overdispersed count data, i.e. if we divide the original data with say a mean of 3 and variance of 30 by 8, then isn't this the right thing to do?
I get the general point, although I guess in this case you would want to divide the original data by 10 (mean = 3/10 = var = 30/100) ? This just seemed against all of my training (even though as I noted
I'm not a statistician) because the point there is that the original count data is overdispersed, not that dividing it by some effort variable makes it seem poisson distributed (even though it's now rate data... odd). Is there some critical reference that I've missed that I could just point to that suggests /not /to engage in such strange practices?
I don't think there's a reference: welcome to the cutting edge ... I agree (and was almost going to mention) that under other circumstances (quasi-likelihood estimation), we do almost the equivalent of this scaling in order to remove overdispersion. I don't think this will necessarily work right (I haven't thought it all the way through) with sampling periods of different lengths/sizes, though. Ben Bolker
Best,
Dan Barton
On Mon, Mar 14, 2011 at 6:39 PM, Ben Bolker <bbolker at gmail.com
<mailto:bbolker at gmail.com>> wrote:
On 11-03-14 07:14 PM, Daniel Barton wrote:
> Hello,
> Thanks to everyone who contributes to this list! I often
find random
> questions I have answered in the archives of this list.
>
> My specific question of the moment, a simplified example of what
I'm doing
> that I hope illustrates my question...
>
> If we have a poisson-distributed response variable in a mixed
model
> such as called by:
>
> lmer(amrotot ~ year + (year|route), family=poisson(link=log))
>
> where amrotot is an integer count, year is, well, the year (as
a linear
> predictor, not a factor) and route is a sampling unit. If
'exposure' varies
> by route, we can define another model with an offset such as:
>
> lmer(amrotot ~ year + (year|route), offset=effort,
family=poisson(link=log))
>
> this all seems, generally good and fine. A colleague asked
me why not
> use (amrotot/effort) as the response variable, but this of course
results in
> a non-integer response variable. Yet it turns out, lmer (or glm,
for that
> matter) will indeed estimate a model using the non-integer
response variable
> (amrotot/effort) but gives warnings. I understand that poisson
regression
> assumes a poisson-distributed integer response variable, but I was
curious
> about *why* lmer would provide results for non-integer response
variables
> such as (amrotot/effort) and if these results are valid or somehow
> comparable to results where amrotot is the response and effort is
an offset,
> with special reference to the confidence intervals of the random
effects.
> Using non-integer response variables in poisson regression looks
and seems
> wrong to me, but IANA statistician and maybe lmer is doing
something I don't
> quite get to make this work.
It won't work: there's a reason that generalized models are restricted
to count data. In particular, in Poisson models the assumption is that
the (expected) variance is equal to the (expected) mean for any data
point: if you can scale the data points, then the variance-to-mean
relationship will change with the units used, something you probably
don't want.
e.g. if the sampling period is 1 hour and you have 1 count in the
sampling period, the mean and variance will both be 1 (unitless); if you
divide the counts by 60 to get counts per minute, then the variance will
be scaled to 1/3600 (counts/minute)^2 ....
You ask why glmer (and glm) lets you do this. It's generally difficult
to decide if one should prohibit, or just warn about, a practice that
seems odd. Sometimes there are indeed plausible scenarios (although to
be honest I can't think of one in this case ...) where someone wants to
use the software in a way not intended by the designers. I won't say
that R is completely consistent in this regard, but overall the
philosophy of "you are assumed to know what you are doing, we will warn
you but not stop you if you seem to be doing something silly" is
reasonable.
Ben Bolker