Dear Derek,
Your problem is due to the small sample size. Therefore I would not trust the results you get from either lme() nor aov().
Take a look at the example below.
Best regards,
Thierry
library(nlme)
TE <- c(0, 10)
sdPlot <- 1
sdNoise <- 10
dataset <- expand.grid(Treatment = factor(LETTERS[1:2]), Plot =factor(seq_len(3)), Plant = factor(seq_len(2)))
dataset$Plot <- dataset$Treatment:dataset$Plot
tmp <- replicate(1000, {
dataset$Response <- with(dataset, TE[Treatment] + rnorm(length(levels(Plot)), sd = sdPlot)[Plot] + rnorm(nrow(dataset), sd = sdNoise))
c(anova(aov(x ~ Treatment, data = aggregate(dataset$Response, dataset[, c("Treatment", "Plot")], FUN = mean)))[1, 5],
anova(lme(Response ~ Treatment, random = ~ 1|Plot, data = dataset))[2, 4])
})
plot(t(tmp))
#The p-value of the lme model is always equal to or greater than the p-value of the aov model.
#Let's increase the number of plots to ten. Notice that the difference in p-value is much smaller
dataset <- expand.grid(Treatment = factor(LETTERS[1:2]), Plot =factor(seq_len(10)), Plant = factor(seq_len(2)))
dataset$Plot <- dataset$Treatment:dataset$Plot
tmp <- replicate(1000, {
dataset$Response <- with(dataset, TE[Treatment] + rnorm(length(levels(Plot)), sd = sdPlot)[Plot] + rnorm(nrow(dataset), sd = sdNoise))
c(anova(aov(x ~ Treatment, data = aggregate(dataset$Response, dataset[, c("Treatment", "Plot")], FUN = mean)))[1, 5],
anova(lme(Response ~ Treatment, random = ~ 1|Plot, data = dataset))[2, 4])
})
plot(t(tmp))
#Increasing both the number of plots and the number of plants per plot eliminates the difference in p-value.
dataset <- expand.grid(Treatment = factor(LETTERS[1:2]), Plot =factor(seq_len(10)), Plant = factor(seq_len(20)))
dataset$Plot <- dataset$Treatment:dataset$Plot
tmp <- replicate(1000, {
dataset$Response <- with(dataset, TE[Treatment] + rnorm(length(levels(Plot)), sd = sdPlot)[Plot] + rnorm(nrow(dataset), sd = sdNoise))
c(anova(aov(x ~ Treatment, data = aggregate(dataset$Response, dataset[, c("Treatment", "Plot")], FUN = mean)))[1, 5],
anova(lme(Response ~ Treatment, random = ~ 1|Plot, data = dataset))[2, 4])
})
plot(t(tmp))
----------------------------------------------------------------------------
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium
Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
-----Oorspronkelijk bericht-----
Van: r-sig-mixed-models-bounces at r-project.org
[mailto:r-sig-mixed-models-bounces at r-project.org] Namens
Derek Sonderegger
Verzonden: woensdag 23 februari 2011 0:46
Aan: r-sig-mixed-models at r-project.org
Onderwerp: [R-sig-ME] using mixed model vs just using plot means
I am trying to fully understand the trade-offs between the
two analyses and if one method is superior in the case of
only two observations per random effect.
Full experimental set-up: I have 6 field plots and three of
them receive a treatment (applied to the plot) and three
serve as controls. In each plot, I have measurements from 2
individual plants. Since this is a desert ecosystem, the
variance within a plot is quite large compared the the
variance from plot to plot.
The 'old-school' analysis method would be to calculate plot
average for each plot and do an ANOVA using those plot means
as data and accept that my inference is only at the plot
level. If I go with a fancy mixed model approach, my
inference is on the level of an individual plant, and I get
an estimate of within plot variability.
But the rub is this: the plot means approach gives me a
fairly significant treatment effect (p = .014) while the
mixed model approach gives me a non-significant result
(p=.34). My feeling is that trying to estimate the random
effect (in addition to the treatment effect) is taking a
large amount of our statistical power in a relatively data
poor problem. I did a quick simulation study and it appears
that in the case of high within plot variability and a small
number of observations per plot, the mixed model approach has
substantially reduced power compared to the plot means
From a scientific point of view, we actually don't mind making
inferences at
the plot level as we are primarily concerned with landscape
effects of the treatment.
At this point I think that the plot means approach is the
best choice, but I want to make sure that I'm not missing anything.
Derek Sonderegger
[[alternative HTML version deleted]]