advice on grouping structure - many levels but few individuals per level

On Wed, Apr 9, 2008 at 5:29 AM, Martin Matejus <mmatejus at googlemail.com 
wrote:
Dear lmer's

I was hoping to get a little advice about specifying a grouping  
structure
with many levels but few (sometimes one) individual per level. I  
have had a
look through the posting archives but could not find a similar  
question.
Many apologies in advance if I have missed any.

The context of the question is as follows:

I would like to model fitness of juvenile birds (a simple weight  
based
metric) with a number of explanatory variables including; when they  
were
layed (as a Julian day - egglayed), number of nestlings in nest  
(nestlings)
and whether they are male or female (sex). Each bird obviously  
originates
from a nest with some birds originating from the same nest  
(siblings). As
there is the potential for the fitness of siblings to be similar  
(either due
to genetic or environmental effects) I would like to include nest  
as a
random effect to reflect this potential grouping structure. For  
example

model <- lmer(fitness ~ egglayed + nestlings + sex +(1|nest))

I have many nests (175) but about half of them contain only 1  
individual.

My question is: does it make sense to include nest as a random  
effect given
that many nests only contain one individual? I know this probably  
reflects a
rather deep misunderstanding regarding mixed effects models on my  
part but I
would have thought that it would be impossible to estimate a within  
nest
variance with only one individual and therefore make my between nest
variance estimates meaningless.
That's not a problem as long as you recognize that you will get almost
no new information from the groups that have only one observation. In
other words you will get almost the same parameter estimates from the
complete data set as you would get from the data after elimination
those nests with only one individual.  If you wrote out all of the
error terms for each observation you would see that for those nests
with only one observation you have two confounded error terms.

I have seen this effect when fitting models to the 'star' data set in
the mlmRev package.  Because these are longitudinal data, groups are
indexed by individuals (students, in this case)  and the number of
observations per group is the number of times the student takes a
test.  Many students have only one observation.  For most models you
can remove those students or keep them in without affecting the
parameter estimates noticeably.

This depends on the data. If the  within cluster correlation is high  
then a large cluster has little more information than a small cluster.  
In that case take out half the clusters and the standard errors will  
increase by 30% or more.

My suggestion is to leave all the data in, and fit as a random effects  
model as this will work fine. The original concern was that the within  
nest variance couldn't be calculated for clusters with single  
observations but this is not a problem.

Ken
Many, many thanks for your advice in advance.
Best wishes
Martin

       [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

advice on grouping structure - many levels but few individuals per level

Thread (4 messages)