Skip to content

What is the maximum number of groups?

3 messages · Andrew J Tyre, Ben Bolker

#
Hi All,

I've been thinking alot about the earlier productive and helpful (for me!) 
discussion on overdispersion and glmms etc. I haven't gotten back to my 
grasshopper example yet, but I will as soon as the students complete 
another component of the data collection. 

I have just come across an example in which I was able to account for most 
of the overdispersion - enough to make me happy anyway - but I'm worried 
about what I did to do it. The data are the number of traps at a site on 
one day that caught or did not catch a deer, so binomial data. There are 
several sites sampled in several different years, and a couple of weather 
covariates that are the same for all traps at a given site on a particular 
day; there are no trap level covariates, but many days of trapping within 
each site/year combination. We expect there to be variation among 
site/year combinations related to the number of deer at a site, variation 
among days due to weather (some covariate data) or other factors, and 
variation among traps due to the location of the trap relative to the deer 
population. 

Some simulated data:

library(lme4)

# simulate some data
x = runif(300,1,10)
siteyear = factor(rep(1:10,each=30))
logit.p = x*0.2 - 0.025*x^2 + rep(rnorm(10,0,0.4),each=30) + 
rnorm(300,0,0.2)
p = 1/(1+exp(-logit.p))
size = sample(5:15,300,replace=TRUE)
y = rbinom(300,prob=p,size=size)
siteday=factor(1:300)
mm.1 = lmer(cbind(y,size-y)~x+I(x^2)+(1|siteday),family=binomial)
mm.2 = lmer(cbind(y,size-y)~x+I(x^2)+(1|siteyear),family=binomial)
mm.3 = lmer(cbind(y,size-y)~x+I(x^2)+siteyear+(1|siteday),family=binomial)

mm.list = list(mm.1,mm.2,mm.3)
k = c(4,4,13)
ll = sapply(mm.list,logLik)
# use chat estimate to get quasiAIC
chat = deviance(mm.3)/300

qaic = -2*ll/chat+2*k
delta = qaic-min(qaic)
AIC.table=cbind(qaic,k,delta,w = exp(-(delta/2))/sum(exp(-delta/2)))
AIC.table
       qaic  k    delta            w
ML 404.6663  4 78.66631 8.269964e-18
ML 340.3928  4 14.39277 7.487291e-04
ML 326.0000 13  0.00000 9.992513e-01
ML 
1.106134 

Now this is pretty close to the structure that I have with the real data. 
All models had the (1|siteday) random effect, various combinations of 
covariates of interest (x in the example), and then either did or did not 
include siteyear as a fixed effect. In the real example, as with my 
simulated example, the model with both a fixed effect of siteyear and a 
random effect of siteday is the best model. The best model makes 
biological sense. The question: is having as many groups as observations 
an issue? It seems to work but ... I didn't start worrying about it until 
afterwards. Does it work because there are multiple traps per siteday? Any 
insights appreciated.
R version 2.7.2 (2008-08-25) 
i386-pc-mingw32 

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base 

other attached packages:
[1] lme4_0.999375-27   Matrix_0.999375-15 lattice_0.17-13 

loaded via a namespace (and not attached):
[1] grid_2.7.2  nlme_3.1-89

Drew Tyre

School of Natural Resources
University of Nebraska-Lincoln
416 Hardin Hall, East Campus
3310 Holdrege Street
Lincoln, NE 68583-0974

phone: +1 402 472 4054 
fax: +1 402 472 2946
email: atyre2 at unl.edu
http://snr.unl.edu/tyre
#
If I understand what you're doing correctly, you're setting
up an "individual-level" covariate, essentially fitting
a binomial-normal model (or binomial-logitnormal).  That makes sense,
although I'm a bit surprised it worked -- in some versions of lme4 it
has triggered an error message about too many random effect parameters.
For example see

https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q1/000603.html

and following discussion.

  Ben Bolker
Andrew J Tyre wrote:

  
    
#
Right - I hadn't thought about it that way, but I think a 
"binomial-lognormal" is correct:

log(p/(1-p) = X*Beta + epsilon

episilon~N(0,sigma)

The summary says I have 300 groups and 300 observations, so the random 
effects vector is the same size as the dataset. If I attempt to include 
the "siteyear" level variation as a random effect I get the message 

Error in mer_finalize(ans) : q = 310 > n = 300

which is the error described in the previous posts. So its working here 
because I only have the individual level random effect. 



cheers,

Drew Tyre

School of Natural Resources
University of Nebraska-Lincoln
416 Hardin Hall, East Campus
3310 Holdrege Street
Lincoln, NE 68583-0974

phone: +1 402 472 4054 
fax: +1 402 472 2946
email: atyre2 at unl.edu
http://snr.unl.edu/tyre



Ben Bolker <bolker at ufl.edu> 
Sent by: r-sig-mixed-models-bounces at r-project.org
01/06/2009 09:13 AM

To
Andrew J Tyre <atyre2 at unlnotes.unl.edu>, R Mixed Models 
<r-sig-mixed-models at r-project.org>
cc

Subject
Re: [R-sig-ME] What is the maximum number of groups?







  If I understand what you're doing correctly, you're setting
up an "individual-level" covariate, essentially fitting
a binomial-normal model (or binomial-logitnormal).  That makes sense,
although I'm a bit surprised it worked -- in some versions of lme4 it
has triggered an error message about too many random effect parameters.
For example see

https://stat.ethz.ch/pipermail/r-sig-mixed-models/2008q1/000603.html

and following discussion.

  Ben Bolker
Andrew J Tyre wrote:
me!)
most
weather
particular
within
variation
deer
lmer(cbind(y,size-y)~x+I(x^2)+siteyear+(1|siteday),family=binomial)
data.
not
until
Any