On 16-07-09 07:20 PM, Teresa Oliveira wrote:
Dear list members,
I am developing GLMM's in order to assess habitat selection (using GLMMs'
coeficients to construct Resource selection functions). I have
data from 5 study areas, and each area has a different number of
individuals monitored.
To develop GLMM's, the dependend variable is binary (1-used locations;
0-available locations), and I have a initial set of 14 continuous
(8 land cover variables; 2 distance variables, to artificial areas and
water sources; 4 topographic variables): a buffer was placed around each
location and the area of each land cover within that buffer was accounted
for; distances were measured from each point to the nearest feature, and
topographic variables were obtained using DEM rasters. I tested for
correlation using Spearman's Rank, so not all 14 were used in the GLMMs.
All variables were transformed using z-score.
As random effect, I used individual ID. I thought at the beggining to use
study area as a random effect but I only had 5 levels and there was
no variance when that random effect was used.
I constructed a GLMM with 9 variables (not correlated) and a random
then used "dredge()" function and "model.avg(dredge)" to sort models by
values. This was the result (only models of AICc lower than 2
[1]Call:
model.avg(object = dredge.m1.1)
Component model call:
glmer(formula = Used ~ <512 unique rhs>, data = All_SA_Used_RP_Area_z,
family =
binomial(link = "logit"))
Component models:
df logLik AICc delta weight
123578 8 -4309.94 8635.89 0.00 0.14
1235789 9 -4309.22 8636.44 0.55 0.10
123789 8 -4310.52 8637.04 1.14 0.08
1235678 9 -4309.75 8637.50 1.61 0.06
12378 7 -4311.78 8637.57 1.67 0.06
1234578 9 -4309.79 8637.58 1.69 0.06
Variables 1 and 2 represent the distance variables; from 3 to 8 land
variables, and 9 is a topographic variable. Weights seem to be very low,
even if I average all those models as it seems to be common when delta
values are low.
Well as far as we can tell from this, variables 4-9 aren't doing much
(on the other hand, variables 1-3 seem to be in all of the top models
you've shown us -- although presumably there are a bunch more models
that are almost like these, and similar in weight, with other
permutations of [123] + [some combination of 456789] ...)
Even with this weights, I constructed GLMMs for each of the
combinations, and the results were simmilar for all 6 combinations. Here
are the results for the first one (GLMM + overdispersion + r-squared):
Random effects:
Groups Name Variance Std.Dev.
ID.CODE_1 (Intercept) 13.02 3.608
Number of obs: 32670, groups: ID.CODE_1, 55
Fixed effects:
Estimate Std. Error z value Pr(>|z|),
(Intercept) -0.54891 0.51174 -1.073 0.283433
3 -0.22232 0.04059 -5.478 4.31e-08 ***
5 -0.05433 0.02837 -1.915 0.055460 .
7 -0.13108 0.02825 -4.640 3.49e-06 ***
8 -0.15864 0.08670 -1.830 0.067287 .
1 0.28438 0.02853 9.968 < 2e-16 ***
2 0.11531 0.03021 3.817 0.000135 ***
Residual deviance: 0.256
r.squaredGLMM():
R2m R2c
0.01063077 0.80039950
This is what I get from this analysis:
1) Variance and SD of the random effect seems fine (definitely better
the "0" I got when using Study Areas as random effect);
yes -- SD of the random effects is much larger than any of the fixed
effects, which means that the differences among individuals are large
(presumably that means you have very different numbers of presences for
different number of individuals [all individuals sharing a common pool
of pseudo-absences ???)
2) Estimate values make sense from what I know of the species and the
knowledge I have of the study areas;
3) Overdispersion values seem good, and R-squared values don't seem very
good (at least when considering only fixed effects) but, as I read in
several places, AIC and r-squared are not always in agreement.
Overdispersion is meaningless for binary data.
4) Weight values seem very low. Does it mean the models are not good?
It means there are many approximately equivalent models. Nothing in
this output tells you very much about absolute goodness of fit (which is
tricky for binary data).
Then what I did was construct a GLM ("glm()"), so no random effect was
used. I used the same set of variables used in [1], and here are the
results (only models of AICc lower than 2 represented):
[2] Call:
model.avg(object = dredge.glm_m1.1)
Component model call:
glm(formula = Used ~ <512 unique rhs>, family = binomial(link = "logit"),
data =
All_SA_Used_RP_Area_z)
Component models:
df logLik AICc delta weight
12345678 9 -9251.85 18521.70 0.00 0.52
123456789 10 -9251.77 18523.54 1.84 0.21
1345678 8 -9253.84 18523.69 1.99 0.19
In this case, weight values are higher.
Does this mean that it is better not to use a random effect? (I am not
I can compare GLMM with GLM results, correct me if I am doing wrong
assumptions)
No. You could do a likelihood ratio test with anova(), but note that
the AICc values for the glm() fits are 10,000 (!!) units higher than the
glmer fits.
While it will potentially greatly complicate your life, I think you
should at least *consider* interactions between your environment
variables and ID, i.e. allow for the possibility that different
individuals respond differently to habitat variation.
Ben Bolker