-----Oorspronkelijk bericht-----
Van: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] Namens
Kvingedal, Eli
Verzonden: woensdag 10 maart 2010 15:08
Aan: r-sig-mixed-models at r-project.org
Onderwerp: [R-sig-ME] mixed effects models and pseudo replication
Hi,
I am analysing effects of local population density on fish
performance (e.g. weight). My dataset is based on fish
sampled from different sites (17 stations) and in addition to
measures on individual performance, I have information on age
(0 and 1). On site level, I have information on fish
densities for both age groups. I am interesting in estimating
the effects of fish density on performance and particularly
interested in determining possible differences between age
groups in the density response.
Traditionally, these kind of data are analysed based on mean
values (ancovas). However, based on mixed effects model, the
among individual variance will be included in the analysis
and not just averaged out. I started by using lmer (lme4
package), but realizing that the variance is increasing with
density, I switched to lme (nlme package) and applied
variance structures.
My starting model is thus:
m1 <- lme(weight ~ age*density0 + age*density1, random =
~1|station, weights=....)
with station and age as factors.
Now, my issue is pseudo-replication. The summary table shows
that the factors age and age*density have very high degrees
of freedom (~700) and accordingly low p-values. It seems to
me like age and the interactions between age and density are
analysed as if the samples were independent, and if so, it
means pseudo-replication, doesn't it?
If I set up an alternative random structure allowing for
random variance between age classes within station:
m2 <- lme(weight ~ age*density0 + age*density1, random =
~1|station/age, weights=....)
the summary table is more like I think it should be: 14 df
for all fixed effects parameters and interactions, and the
p-values seem more realistic.
When comparing m1 and m2 (REML estimation), however, m2 do
not provide better fit, and based on literature (e.g. Zuur et
al. 2009), then I should use m1.
Testing the significance of the interaction terms by model
comparisons (which is what I do to find the optimal model),
the significance levels of the likelihood ratio test for
specific interaction terms are equivalent whether I use
station or station/age as random factors. Which is sort of
comforting.
So, my question is, do I really control for
pseudo-replication in the estimation of all fixed effects and
interactions when using m1? If so, why these high dfs in the
summary table??
I would really appreciate if someone could enlighten me!
Regards,
Eli