(in plain text) Replication in nested structure, how much?

Tue, Sep 3, 2013 8:37 AM #

Following classical ANOVA, I thought it important to have replication at each level. Maybe this is not essential for mixed models? 
?
Here's my model: Y~1+(1|SUBJECT/OCCASION) 
?
Each subject was tested on multiple occasions.I want to evaluate the variance within-subjects and variance within-occasions. 
?
I have data for 105 subjects. Occasions per subject ranges from 1 to 4. Repeated measurements of the response Y per occasion range from 1 to 5. 
?
Originally, I thought to restrict the modelling to subjects tested on at least 2 occasions and with at least 2 Y data per occasion. Here are the numbers of "levels" in the reduced dataset: 
?

[1] 57

[1] 138

[1] 353 
?
And here's what I get with the full dataset: 
?

[1] 105

[1] 196

[1] 471 
?
There are some potential issues in the full dataset affecting 48/105 of the subjects: 
1) No replication (i.e. subjects measured on 1 occasion and once). 
2) No replication of occasions (i.e. subjects measured multiple times but on 1 occasion). 
3) No replication of measurements on some occasions (i.e. subjects measured on multiple occasions but sometimes with only 1 measurement per occasion). 
?
I do not want to ignore potentially informative data and the precision for random effect results seems to improve with the full dataset. 
?
I welcome some guidance on how I should proceed. Perhaps some of the issues 1), 2), 3) are allowable and some are not? 
?
Stephen.

Ben Bolker

Tue, Sep 3, 2013 2:11 PM #

Stephen T <stwebvanuatu at ...> writes:

{57 subjects, 138 occasions, 353 observations)

{105 subjects, 196 observations, 471 observations}

of the subjects:

times but on 1 occasion).

As far as I can see, all three of your issues are allowable in the 'modern'/
(RE)ML mixed model framework; the within-subject variance and the
within-occasion
variances should still be identifiable.  If you had a very extreme case
(e.g. most individuals measured only once, with a few measured more than once)
it might not be _practical_ to try to estimate both variances, even though
they would still be theoretically identifiable, but it sounds like
you're not in that situation.

  As always, someone else more informed may come along and correct
this answer ...  The best way to reassure yourself in this case is to
simulate some data with known variance structure, knock out a number
of observations to make it resemble your example, and see whether you
still recover approximately correct answers.

  Ben Bolker