advice on grouping structure - many levels but few individuals per level
On 09/04/2008, at 11:47 PM, Douglas Bates wrote:
On Wed, Apr 9, 2008 at 5:29 AM, Martin Matejus <mmatejus at googlemail.com
wrote: Dear lmer's
I was hoping to get a little advice about specifying a grouping structure with many levels but few (sometimes one) individual per level. I have had a look through the posting archives but could not find a similar question. Many apologies in advance if I have missed any.
The context of the question is as follows:
I would like to model fitness of juvenile birds (a simple weight based metric) with a number of explanatory variables including; when they were layed (as a Julian day - egglayed), number of nestlings in nest (nestlings) and whether they are male or female (sex). Each bird obviously originates from a nest with some birds originating from the same nest (siblings). As there is the potential for the fitness of siblings to be similar (either due to genetic or environmental effects) I would like to include nest as a random effect to reflect this potential grouping structure. For example
model <- lmer(fitness ~ egglayed + nestlings + sex +(1|nest))
I have many nests (175) but about half of them contain only 1 individual.
My question is: does it make sense to include nest as a random effect given that many nests only contain one individual? I know this probably reflects a rather deep misunderstanding regarding mixed effects models on my part but I would have thought that it would be impossible to estimate a within nest variance with only one individual and therefore make my between nest variance estimates meaningless.
That's not a problem as long as you recognize that you will get almost no new information from the groups that have only one observation. In other words you will get almost the same parameter estimates from the complete data set as you would get from the data after elimination those nests with only one individual. If you wrote out all of the error terms for each observation you would see that for those nests with only one observation you have two confounded error terms. I have seen this effect when fitting models to the 'star' data set in the mlmRev package. Because these are longitudinal data, groups are indexed by individuals (students, in this case) and the number of observations per group is the number of times the student takes a test. Many students have only one observation. For most models you can remove those students or keep them in without affecting the parameter estimates noticeably.
This depends on the data. If the within cluster correlation is high then a large cluster has little more information than a small cluster. In that case take out half the clusters and the standard errors will increase by 30% or more. My suggestion is to leave all the data in, and fit as a random effects model as this will work fine. The original concern was that the within nest variance couldn't be calculated for clusters with single observations but this is not a problem. Ken
Many, many thanks for your advice in advance.
Best wishes
Martin
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models