Nested error term and unbalanced design
Baldwin, Jim -FS <jbaldwin at ...> writes:
While there is a definite order to family, genus, and species (no pun intended), I think that the "nestedness" (if any) would be related to how you selected your sampling units rather than the fixed effects of family, genus, and species. (I admit bias in rarely if ever considering species as a random effect.)
Jim
I think I respectfully disagree ... see below ...
I am trying to run a model that incorporates both environmental variables and taxonomic relationships, and I am unsure if I am 1) specifying the error term correctly, and 2) accounting for unbalanced data correctly. I would appreciate any guidance you can provide.
As a simplified example, I want to ask if a bird is more likely to be carrying ticks based on the habitat it was caught in, and what kind of bird it is (my actual model has many more environmental variables). We have many related species in multiple genera in multiple families, but all in the same order. Species is nested within genus, and genus is nested within family. I want to estimate a fixed effect for both habitat and species, while accounting for the nestedness of the relationships of the birds, and I also want to account for the fact that we caught more of certain species than others.
My simplified model looks like this: M1 <- lmer(y ~ HABITAT + SPECIES + (1|FAMILY/GENUS/SPECIES), family=binomial(link="logit")) where y is a column vector of (tick presence, tick absence) So my questions are: is this the correct "grammar" for the nested error? and does the nested error structure by itself take into account the unbalanced data structure?
Generally you don't have to worry about lack of balance in
'modern' mixed models unless it's really extreme.
I'm having a little bit of a hard time conceptually with the
idea of having species as a fixed effect _and_ having the
variances of family and genus be random. You certainly
shouldn't have a categorical predictor (SPECIES) appear as both
a random and a fixed effect, though.
M1 <- lmer(y ~ HABITAT + SPECIES + (1|FAMILY/GENUS),
family=binomial(link="logit"))
*might* work (I would give it a try and see if the results are sensible).
I would also consider
M1 <- lmer(y ~ HABITAT + (HABITAT|FAMILY/GENUS/SPECIES),
family=binomial(link="logit"))
if your data set is big enough to support it. This allows for habitat
to have different effects on different species ... (see a paper
by Schielzeth and Forstmeier on the importance of including interactions
between fixed and random effects:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2657178/ )