Hi All,
I've been thinking alot about the earlier productive and helpful (for me!)
discussion on overdispersion and glmms etc. I haven't gotten back to my
grasshopper example yet, but I will as soon as the students complete
another component of the data collection.
I have just come across an example in which I was able to account for most
of the overdispersion - enough to make me happy anyway - but I'm worried
about what I did to do it. The data are the number of traps at a site on
one day that caught or did not catch a deer, so binomial data. There are
several sites sampled in several different years, and a couple of weather
covariates that are the same for all traps at a given site on a particular
day; there are no trap level covariates, but many days of trapping within
each site/year combination. We expect there to be variation among
site/year combinations related to the number of deer at a site, variation
among days due to weather (some covariate data) or other factors, and
variation among traps due to the location of the trap relative to the deer
population.
Some simulated data:
library(lme4)
# simulate some data
x = runif(300,1,10)
siteyear = factor(rep(1:10,each=30))
logit.p = x*0.2 - 0.025*x^2 + rep(rnorm(10,0,0.4),each=30) +
rnorm(300,0,0.2)
p = 1/(1+exp(-logit.p))
size = sample(5:15,300,replace=TRUE)
y = rbinom(300,prob=p,size=size)
siteday=factor(1:300)
mm.1 = lmer(cbind(y,size-y)~x+I(x^2)+(1|siteday),family=binomial)
mm.2 = lmer(cbind(y,size-y)~x+I(x^2)+(1|siteyear),family=binomial)
mm.3 = lmer(cbind(y,size-y)~x+I(x^2)+siteyear+(1|siteday),family=binomial)
mm.list = list(mm.1,mm.2,mm.3)
k = c(4,4,13)
ll = sapply(mm.list,logLik)
# use chat estimate to get quasiAIC
chat = deviance(mm.3)/300
qaic = -2*ll/chat+2*k
delta = qaic-min(qaic)
AIC.table=cbind(qaic,k,delta,w = exp(-(delta/2))/sum(exp(-delta/2)))
AIC.table
qaic k delta w
ML 404.6663 4 78.66631 8.269964e-18
ML 340.3928 4 14.39277 7.487291e-04
ML 326.0000 13 0.00000 9.992513e-01
chat
ML
1.106134
Now this is pretty close to the structure that I have with the real data.
All models had the (1|siteday) random effect, various combinations of
covariates of interest (x in the example), and then either did or did not
include siteyear as a fixed effect. In the real example, as with my
simulated example, the model with both a fixed effect of siteyear and a
random effect of siteday is the best model. The best model makes
biological sense. The question: is having as many groups as observations
an issue? It seems to work but ... I didn't start worrying about it until
afterwards. Does it work because there are multiple traps per siteday? Any
insights appreciated.
sessionInfo()
R version 2.7.2 (2008-08-25)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] lme4_0.999375-27 Matrix_0.999375-15 lattice_0.17-13
loaded via a namespace (and not attached):
[1] grid_2.7.2 nlme_3.1-89
Drew Tyre
School of Natural Resources
University of Nebraska-Lincoln
416 Hardin Hall, East Campus
3310 Holdrege Street
Lincoln, NE 68583-0974
phone: +1 402 472 4054
fax: +1 402 472 2946
email: atyre2 at unl.edu
http://snr.unl.edu/tyre
If I understand what you're doing correctly, you're setting
up an "individual-level" covariate, essentially fitting
a binomial-normal model (or binomial-logitnormal). That makes sense,
although I'm a bit surprised it worked -- in some versions of lme4 it
has triggered an error message about too many random effect parameters.
For example see
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q1/000603.html
and following discussion.
Ben Bolker
Andrew J Tyre wrote:
Hi All,
I've been thinking alot about the earlier productive and helpful (for me!)
discussion on overdispersion and glmms etc. I haven't gotten back to my
grasshopper example yet, but I will as soon as the students complete
another component of the data collection.
I have just come across an example in which I was able to account for most
of the overdispersion - enough to make me happy anyway - but I'm worried
about what I did to do it. The data are the number of traps at a site on
one day that caught or did not catch a deer, so binomial data. There are
several sites sampled in several different years, and a couple of weather
covariates that are the same for all traps at a given site on a particular
day; there are no trap level covariates, but many days of trapping within
each site/year combination. We expect there to be variation among
site/year combinations related to the number of deer at a site, variation
among days due to weather (some covariate data) or other factors, and
variation among traps due to the location of the trap relative to the deer
population.
Some simulated data:
library(lme4)
# simulate some data
x = runif(300,1,10)
siteyear = factor(rep(1:10,each=30))
logit.p = x*0.2 - 0.025*x^2 + rep(rnorm(10,0,0.4),each=30) +
rnorm(300,0,0.2)
p = 1/(1+exp(-logit.p))
size = sample(5:15,300,replace=TRUE)
y = rbinom(300,prob=p,size=size)
siteday=factor(1:300)
mm.1 = lmer(cbind(y,size-y)~x+I(x^2)+(1|siteday),family=binomial)
mm.2 = lmer(cbind(y,size-y)~x+I(x^2)+(1|siteyear),family=binomial)
mm.3 = lmer(cbind(y,size-y)~x+I(x^2)+siteyear+(1|siteday),family=binomial)
mm.list = list(mm.1,mm.2,mm.3)
k = c(4,4,13)
ll = sapply(mm.list,logLik)
# use chat estimate to get quasiAIC
chat = deviance(mm.3)/300
qaic = -2*ll/chat+2*k
delta = qaic-min(qaic)
AIC.table=cbind(qaic,k,delta,w = exp(-(delta/2))/sum(exp(-delta/2)))
AIC.table
qaic k delta w
ML 404.6663 4 78.66631 8.269964e-18
ML 340.3928 4 14.39277 7.487291e-04
ML 326.0000 13 0.00000 9.992513e-01
chat
ML
1.106134
Now this is pretty close to the structure that I have with the real data.
All models had the (1|siteday) random effect, various combinations of
covariates of interest (x in the example), and then either did or did not
include siteyear as a fixed effect. In the real example, as with my
simulated example, the model with both a fixed effect of siteyear and a
random effect of siteday is the best model. The best model makes
biological sense. The question: is having as many groups as observations
an issue? It seems to work but ... I didn't start worrying about it until
afterwards. Does it work because there are multiple traps per siteday? Any
insights appreciated.
sessionInfo()
R version 2.7.2 (2008-08-25)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] lme4_0.999375-27 Matrix_0.999375-15 lattice_0.17-13
loaded via a namespace (and not attached):
[1] grid_2.7.2 nlme_3.1-89
Drew Tyre
School of Natural Resources
University of Nebraska-Lincoln
416 Hardin Hall, East Campus
3310 Holdrege Street
Lincoln, NE 68583-0974
phone: +1 402 472 4054
fax: +1 402 472 2946
email: atyre2 at unl.edu
http://snr.unl.edu/tyre
Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bolker at ufl.edu / www.zoology.ufl.edu/bolker
GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
Right - I hadn't thought about it that way, but I think a
"binomial-lognormal" is correct:
log(p/(1-p) = X*Beta + epsilon
episilon~N(0,sigma)
The summary says I have 300 groups and 300 observations, so the random
effects vector is the same size as the dataset. If I attempt to include
the "siteyear" level variation as a random effect I get the message
Error in mer_finalize(ans) : q = 310 > n = 300
which is the error described in the previous posts. So its working here
because I only have the individual level random effect.
cheers,
Drew Tyre
School of Natural Resources
University of Nebraska-Lincoln
416 Hardin Hall, East Campus
3310 Holdrege Street
Lincoln, NE 68583-0974
phone: +1 402 472 4054
fax: +1 402 472 2946
email: atyre2 at unl.edu
http://snr.unl.edu/tyre
Ben Bolker <bolker at ufl.edu>
Sent by: r-sig-mixed-models-bounces at r-project.org
01/06/2009 09:13 AM
To
Andrew J Tyre <atyre2 at unlnotes.unl.edu>, R Mixed Models
<r-sig-mixed-models at r-project.org>
cc
Subject
Re: [R-sig-ME] What is the maximum number of groups?
If I understand what you're doing correctly, you're setting
up an "individual-level" covariate, essentially fitting
a binomial-normal model (or binomial-logitnormal). That makes sense,
although I'm a bit surprised it worked -- in some versions of lme4 it
has triggered an error message about too many random effect parameters.
For example see
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q1/000603.html
and following discussion.
Ben Bolker
Andrew J Tyre wrote:
Hi All,
I've been thinking alot about the earlier productive and helpful (for
me!)
discussion on overdispersion and glmms etc. I haven't gotten back to my
grasshopper example yet, but I will as soon as the students complete
another component of the data collection.
I have just come across an example in which I was able to account for
most
of the overdispersion - enough to make me happy anyway - but I'm worried
about what I did to do it. The data are the number of traps at a site on
one day that caught or did not catch a deer, so binomial data. There are
several sites sampled in several different years, and a couple of
weather
covariates that are the same for all traps at a given site on a
particular
day; there are no trap level covariates, but many days of trapping
within
each site/year combination. We expect there to be variation among
site/year combinations related to the number of deer at a site,
variation
among days due to weather (some covariate data) or other factors, and
variation among traps due to the location of the trap relative to the
deer
population.
Some simulated data:
library(lme4)
# simulate some data
x = runif(300,1,10)
siteyear = factor(rep(1:10,each=30))
logit.p = x*0.2 - 0.025*x^2 + rep(rnorm(10,0,0.4),each=30) +
rnorm(300,0,0.2)
p = 1/(1+exp(-logit.p))
size = sample(5:15,300,replace=TRUE)
y = rbinom(300,prob=p,size=size)
siteday=factor(1:300)
mm.1 = lmer(cbind(y,size-y)~x+I(x^2)+(1|siteday),family=binomial)
mm.2 = lmer(cbind(y,size-y)~x+I(x^2)+(1|siteyear),family=binomial)
mm.3 =
mm.list = list(mm.1,mm.2,mm.3)
k = c(4,4,13)
ll = sapply(mm.list,logLik)
# use chat estimate to get quasiAIC
chat = deviance(mm.3)/300
qaic = -2*ll/chat+2*k
delta = qaic-min(qaic)
AIC.table=cbind(qaic,k,delta,w = exp(-(delta/2))/sum(exp(-delta/2)))
AIC.table
qaic k delta w
ML 404.6663 4 78.66631 8.269964e-18
ML 340.3928 4 14.39277 7.487291e-04
ML 326.0000 13 0.00000 9.992513e-01
chat
ML
1.106134
Now this is pretty close to the structure that I have with the real
data.
All models had the (1|siteday) random effect, various combinations of
covariates of interest (x in the example), and then either did or did
not
include siteyear as a fixed effect. In the real example, as with my
simulated example, the model with both a fixed effect of siteyear and a
random effect of siteday is the best model. The best model makes
biological sense. The question: is having as many groups as observations
an issue? It seems to work but ... I didn't start worrying about it
until
afterwards. Does it work because there are multiple traps per siteday?
Any
insights appreciated.
sessionInfo()
R version 2.7.2 (2008-08-25)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices datasets utils methods base
other attached packages:
[1] lme4_0.999375-27 Matrix_0.999375-15 lattice_0.17-13
loaded via a namespace (and not attached):
[1] grid_2.7.2 nlme_3.1-89
Drew Tyre
School of Natural Resources
University of Nebraska-Lincoln
416 Hardin Hall, East Campus
3310 Holdrege Street
Lincoln, NE 68583-0974
phone: +1 402 472 4054
fax: +1 402 472 2946
email: atyre2 at unl.edu
http://snr.unl.edu/tyre
Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bolker at ufl.edu / www.zoology.ufl.edu/bolker
GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models