Skip to content

multiple nested random factors

4 messages · Ben Bolker, Amanda Adams

#
Hello!
I have been having a heck of a time figuring out how to estimate the 
proportion of variance from several random factors. I have a count data 
of the number of bat calls recorded at 3 sites, on 6 detectors, over 12 
nights. Detectors were at 2 heights.
If I understand nested factors correctly, Detectors are nested in Site 
and Night is nested in Site. Site/Detector and Site/Night are random 
factors and Height is a fixed factor.
Also, data is overdispersed so I am transforming number of calls as 
log(Calls+1).

'data.frame':   249 obs. of  11 variables:
  $ Night     : int  1 3 5 11 12 1 3 5 11 12 ...
  $ Night2    : int  1 2 3 4 5 1 2 3 4 5 ...
  $ Site      : int  1 1 1 1 1 1 1 1 1 1 ...
  $ Species   : int  1 1 1 1 1 1 1 1 1 1 ...
  $ Detector  : int  1 1 1 1 1 2 2 2 2 2 ...
  $ Height    : int  1 1 1 1 1 2 2 2 2 2 ...
  $ Calls     : int  6 444 236 12 143 5 815 712 30 142 ...
  $ f.Night   : Factor w/ 12 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 
4 5 ...
  $ f.Site    : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 1 1 1 1 ...
  $ f.Detector: Factor w/ 6 levels "1","2","3","4",..: 1 1 1 1 1 2 2 2 2 
2 ...
  $ f.Height  : Factor w/ 2 levels "1","2": 1 1 1 1 1 2 2 2 2 2 ...

I then coded for the nested variables:
data$detector <- with(data, factor(f.Site:f.Detector))
data$night <- with(data, factor(f.Site:f.Night))

trans.log <- log(data$Calls+1)

model <- glmer(round(trans.log,digits=0)~ f.Height + (1|night) + 
(1|detector) +
     (1|f.Site) , data = data, family=poisson)

I am uncertain on a couple things. Are my nested variables correct? Can 
I correct for overdispersion with a transformation?

I was also wondering if there is a reference explaining why there is no 
residual variance term for the Poisson distribution. I saw the 
explanation on a forum, but was hoping there was something I could cite.

Any help or advice would be appreciated.
Thank you!
Amanda
#
Amanda Adams <aadams26 at ...> writes:
It's still not entirely clear to me from this description how
your data are structured.  You have an average of about 249/12 ~ 21
observations per night, so I'm going to assume you have 6 detectors
*at each site*.  Detector will be nested in site (because it doesn't
make any sense to analyze what happens at "detector number 1" unless
the detectors are somehow arranged so that the set of (d1:site1,
d1:site2, d1:site3, ... has something in common).  You *may* want
a night:site interaction (if you have enough data), but in principle
you also want a site factor (probably fixed, since there are only
three levels) and a night factor.  This would be

  ~ height + f.Site + (1|f.Night/f.Site) + (1|f.Site:f.Detector)

  It is quite likely that you will find some of these variance
components estimated as zero ...
This makes no sense (sorry).  Poisson models must have a response
variable that is a raw count value (integer).  How do you know the
data are overdispersed before you fit a model ???  (Although I do see
that you have widely varying values in your 'Calls' variable, so
you may be right ...)

  For various ways of handling overdispersion in GLMMs see
http://glmm.wikidot.com/faq

  I don't know if it's helpful, but Bolker et al. 2009 _Trends
in Ecology and Evolution_ might be a citeable source for GLMMs.
It doesn't really say anything specific about Poisson variables
and why a Poisson model doesn't include a residual variance; for
that you should probably cite (after reading!) a basic book
on generalized linear models.
By the way, you said you have three sites, but the data have four
levels for f.Site?  Did you drop one site from the data and not
use droplevels() ?
#
Thank you for the response Dr. Bolker.
On 22/02/2013 9:05 AM, Ben Bolker wrote:
Yes, I have 6 detectors at each site.
I had tested for overdispersion with qcc.overdispersion.test in qcc 
package.  I had tried using an individual-level random effect to capture 
overdispersion, but was not sure how to interpret the data once that was 
included.
This paper has been very helpful and was the reason I was initially 
using glmer. Thanks! I will do some more reading.
I do have four sites, but only include three for some of my analysis. Sorry.
I applied the individual-level random effect, but how do I interpret the 
proportion of variation from each factor once it is included?

 > model <- glmer(Calls ~ f.Height + f.Site + (1|f.Site/f.Night) +
+ (1|f.Site:f.Detector), data = data, family=poisson)
 >
 > data$ID <- 1:nrow(data)
 > model1 <- glmer(Calls ~ f.Height + f.Site + (1|f.Night/f.Site) + 
(1|f.Site:f.Detector)
+ + (1|ID), data = data, family = poisson)
Number of levels of a grouping factor for the random effects
is *equal* to n, the number of observations
 >
 > anova(model, model1)
Data: data
Models:
model: Calls ~ f.Height + f.Site + (1 | f.Site/f.Night) + (1 | 
f.Site:f.Detector)
model1: Calls ~ f.Height + f.Site + (1 | f.Night/f.Site) + (1 | 
f.Site:f.Detector) +
model1:     (1 | ID)
        Df   AIC   BIC   logLik Chisq Chi Df Pr(>Chisq)
model   8 49163 49191 -24573.4
model1  9  1615  1647   -798.6 47550      1  < 2.2e-16 ***

 > model1
Generalized linear mixed model fit by the Laplace approximation
Formula: Calls ~ f.Height + f.Site + (1 | f.Night/f.Site) + 
(1|f.Site:f.Detector) + (1 | ID)
    Data: data
   AIC  BIC logLik deviance
  1615 1647 -798.6     1597
Random effects:
  Groups            Name        Variance Std.Dev.
  ID                (Intercept) 1.07827  1.03840
  f.Site:f.Night    (Intercept) 1.90958  1.38187
  f.Site:f.Detector (Intercept) 2.32948  1.52626
  f.Night           (Intercept) 0.65313  0.80817
Number of obs: 249, groups: ID, 249; f.Site:f.Night, 47; 
f.Site:f.Detector, 24; f.Night, 12

Fixed effects:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)  2.59535    0.86051   3.016 0.002561 **
f.Height2   -0.05362    0.64015  -0.084 0.933245
f.Site2      1.01975    1.07455   0.949 0.342619
f.Site3      0.73546    1.08115   0.680 0.496343
f.Site4      4.15381    1.07196   3.875 0.000107 ***

Does this mean: Site has a significant effect on bat activity and
44% of the variation in bat activity levels can be explained by detector 
placement within sites
36% by an interaction between Site and Night
12% by temporal effects (night)
20% by individual variation
Does the individual variation essentially mean the variation from not 
explained by temporal and spatial effects?
#
Amanda Adams <aadams26 at ...> writes:
OK
Testing the _marginal_ distribution of the data for anything
(normality, overdispersion, etc.) is very rarely a sensible thing
to do.  You need to test for overdispersion in the _residuals_
of your fit.  It's likely though that you do need an individual-level
random effect.  Have you read the references in http://glmm.wikidot.com/faq
that discuss individual-level random effects?
[snip snip snip]
Site 4 is significantly different from site 1 (and probably
different from the other sites as well, although that isn't
explicitly tested here).

  It's somewhat harder to do "variance decomposition" in a
GLMM (or a complex/modern LMM) than in classic models.
The 'variance components' would include the four components
listed above as well as the Poisson variance term.  Depending
on how you were thinking about it you might also include the
differences among sites and the difference in height as
'variance components'.  If you look at 'variance' narrowly
enough, then you _could_ state things the way you have above.
I don't know that much about variance partitioning; in GLMMs
it may be a bit of a research topic ...

  It may have come up on this list before, but I can't put
my finger on a thread right now.  Perhaps someone else
can.

Goldstein H, Browne W, Rasbash J (2002) Partitioning Variation in
Multilevel Models.  Understanding Statistics 1: 223--231.

Browne WJ, Subramanian SV, Jones K (2005) Variance partitioning in
multilevel logistic models that exhibit overdispersion. Journal Royal
Statistical Society. Series A 168: 599--613.