Hi, I'm using lme and lmer in my dissertation and it's the first time
I've used these methods. Taking into account replies from my previous
query I decided to go through with a model simplification, and then try
to validate the models in various ways and come up with the best one to
include in my work, be it a linear mixed effects model or general linear
effects model, with log() data or not etc - interestingly it does not
seems like doing transofrmations and such makes much difference so far,
looking at changes in diagnostic plots and AIC.
Anywho, I simplified to the model using lme (I've pasted it at the
bottom). And looking at the anova output the numDF looks right. However
I'm concerned about the 342 df in the denDF in anova() and in the
summary() output, as it seems to high to me, because at the observation
level is too high and pseudoreplicated; 4 readings per disk, 3 disks,
per plate, 3 plates per lineage, 5 lineages per group, 2 groups so:
4*3*3*5*2=360. If I take this to disk level 3*3*5*2=90, and at dish
level it's 3*5*2=30 degrees of freedom for error. And either dish or
disk (arguments for both) is the level at which one independant point of
datum is obtained, most probably Dish. So I'm wondering if either I'de
done something wrong, or I'm not understanding how df are presented and
used in mixed models. It's not really explained in my texts, and my
lecturer told me I'm working at the edge of his personal/professional
experience.
I've used lmer and the function in languageR to extract p-values without
it even mentioning df. Now if the lmer method with pvals.fnc() makes it
so as I don't have to worry about these df then in a way it makes my
issue a bit redundant. But it is playing on my mind a bit so felt I
should ask.
My second question is about when I do the equivalent model using lmer:
"lmer(Diameter~Group*Lineage+(1|Dish)+(1|Disk), data=Dataset)" - which
I'm sure does the same because all my plots of residuals against fitted
and such are the same, if I define it with the poisson family, which
uses log, then I get a much lower AIC of about 45, compared to over 1000
without family defined, which I think defaults to gaussian/normal. And
my diagnostic plots still give me all the same patters, but just looking
a bit different because of the family distribution specified. I then did
a model logging the response variable by using log(Diameter), again, I
get the same diagnostic plot patterns, but on a different scale, and I
get an AIC of - 795.6. Now normally I'd go for the model with the lowest
AIC, however, I've never observed this beahviour before, and can't help
but think thhat the shift from a posotive 1000+ AIC to a negative one is
due to the fact the data has been logged, rather than that the model
fitted to log data in this way is genuinley better.
Finally, I saw in a text, an example of using lmer but "Recoding Factor
Levels" like:
lineage<-Group:Lineage
dish<-Group:Lineage:Dish
disk<-Group:Lineage:Dish:Disk
model<-lmer(Diameter~Group+(1|lineage)+(1|dish)+(1|disk)
However I don't see why this should need to be done, considering, the
study was hieracheal, just like all other examples in that chapter, and
it does not give a reason why, but says it does the same job as a nested
anova, which I though mixed models did anyway.
Hopefully sombody can shed light on my concerns. In terms of my work and
university, I could include what I've done here and be as transparrant
as possible and discuss these issues, because log() of the data or
defining a distribution in the model is leading to the same plots and
conclusions. But I'd like to make sure I come to term with what's
actually happening here.
A million thanks,
Ben W.
lme14 <- lme(Diameter~Group*Lineage,random=~1|Dish/Disk, data=Dataset,
method="REML")
>anova(lme14):
numDF denDF F-value p-value
(Intercept) 1 342 16538.253 <.0001
Group 1 342 260.793 <.0001
Lineage 4 342 8.226 <.0001
Group:Lineage 4 342 9.473 <.0001
> summary(lme14)
Linear mixed-effects model fit by REML
Data: Dataset
AIC BIC logLik
1148.317 1198.470 -561.1587
Random effects:
Formula: ~1 | Dish
(Intercept)
StdDev: 0.1887527
Formula: ~1 | Disk %in% Dish
(Intercept) Residual
StdDev: 6.303059e-05 1.137701
Fixed effects: Diameter ~ Group * Lineage
Value Std.Error DF t-value
p-value
(Intercept) 15.049722 0.2187016 342 68.81396 0.0000
Group[T.NEDettol] 0.980556 0.2681586 342 3.65662 0.0003
Lineage[T.First] -0.116389 0.2681586 342 -0.43403 0.6645
Lineage[T.Fourth] -0.038056 0.2681586 342 -0.14191 0.8872
Lineage[T.Second] -0.177500 0.2681586 342 -0.66192 0.5085
Lineage[T.Third] 0.221111 0.2681586 342 0.82455 0.4102
Group[T.NEDettol]:Lineage[T.First] 2.275000 0.3792336 342 5.99894 0.0000
Group[T.NEDettol]:Lineage[T.Fourth] 0.955556 0.3792336 342 2.51970 0.0122
Group[T.NEDettol]:Lineage[T.Second] 0.828333 0.3792336 342 2.18423 0.0296
Group[T.NEDettol]:Lineage[T.Third] 0.721667 0.3792336 342 1.90296 0.0579
Correlation:
(Intr) Gr[T.NED] Lng[T.Frs] Lng[T.Frt]
Group[T.NEDettol] -0.613
Lineage[T.First] -0.613 0.500
Lineage[T.Fourth] -0.613 0.500 0.500
Lineage[T.Second] -0.613 0.500 0.500 0.500
Lineage[T.Third] -0.613 0.500 0.500 0.500
Group[T.NEDettol]:Lineage[T.First] 0.434 -0.707 -0.707 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] 0.434 -0.707 -0.354 -0.707
Group[T.NEDettol]:Lineage[T.Second] 0.434 -0.707 -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Third] 0.434 -0.707 -0.354 -0.354
L[T.S] L[T.T] Grp[T.NEDttl]:Lng[T.Frs]
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third] 0.500
Group[T.NEDettol]:Lineage[T.First] -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] -0.354 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Second] -0.707 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Third] -0.354 -0.707 0.500
Grp[T.NEDttl]:Lng[T.Frt] G[T.NED]:L[T.S
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third]
Group[T.NEDettol]:Lineage[T.First]
Group[T.NEDettol]:Lineage[T.Fourth]
Group[T.NEDettol]:Lineage[T.Second] 0.500
Group[T.NEDettol]:Lineage[T.Third] 0.500 0.500
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.47467771 -0.75133489 0.06697157 0.67851126 3.27449064
Number of Observations: 360
Number of Groups:
Dish Disk %in% Dish
3 9
DF in lme
8 messages · Ben Ward, iwhite@staffmail.ed.ac.uk, Toby Marthews +1 more
Hi Ben W, 1. About the denominator degrees of freedom in lme, please see these posts: http://r.789695.n4.nabble.com/degrees-of-freedom-in-lme-td828478.html http://markmail.org/message/i445xmb25yzu2g4h 2. Also, is Dish nested in Disk? From your lmer command it seems that it isn't, but in your lme command it is. HTH Toby Marthews
From: r-sig-mixed-models-bounces at r-project.org [r-sig-mixed-models-bounces at r-project.org] On Behalf Of Ben Ward [benjamin.ward at bathspa.org]
Sent: 16 March 2011 07:52
To: R-SIG-Mixed-Models at r-project.org
Subject: [R-sig-ME] DF in lme
Sent: 16 March 2011 07:52
To: R-SIG-Mixed-Models at r-project.org
Subject: [R-sig-ME] DF in lme
Hi, I'm using lme and lmer in my dissertation and it's the first time
I've used these methods. Taking into account replies from my previous
query I decided to go through with a model simplification, and then try
to validate the models in various ways and come up with the best one to
include in my work, be it a linear mixed effects model or general linear
effects model, with log() data or not etc - interestingly it does not
seems like doing transofrmations and such makes much difference so far,
looking at changes in diagnostic plots and AIC.
Anywho, I simplified to the model using lme (I've pasted it at the
bottom). And looking at the anova output the numDF looks right. However
I'm concerned about the 342 df in the denDF in anova() and in the
summary() output, as it seems to high to me, because at the observation
level is too high and pseudoreplicated; 4 readings per disk, 3 disks,
per plate, 3 plates per lineage, 5 lineages per group, 2 groups so:
4*3*3*5*2=360. If I take this to disk level 3*3*5*2=90, and at dish
level it's 3*5*2=30 degrees of freedom for error. And either dish or
disk (arguments for both) is the level at which one independant point of
datum is obtained, most probably Dish. So I'm wondering if either I'de
done something wrong, or I'm not understanding how df are presented and
used in mixed models. It's not really explained in my texts, and my
lecturer told me I'm working at the edge of his personal/professional
experience.
I've used lmer and the function in languageR to extract p-values without
it even mentioning df. Now if the lmer method with pvals.fnc() makes it
so as I don't have to worry about these df then in a way it makes my
issue a bit redundant. But it is playing on my mind a bit so felt I
should ask.
My second question is about when I do the equivalent model using lmer:
"lmer(Diameter~Group*Lineage+(1|Dish)+(1|Disk), data=Dataset)" - which
I'm sure does the same because all my plots of residuals against fitted
and such are the same, if I define it with the poisson family, which
uses log, then I get a much lower AIC of about 45, compared to over 1000
without family defined, which I think defaults to gaussian/normal. And
my diagnostic plots still give me all the same patters, but just looking
a bit different because of the family distribution specified. I then did
a model logging the response variable by using log(Diameter), again, I
get the same diagnostic plot patterns, but on a different scale, and I
get an AIC of - 795.6. Now normally I'd go for the model with the lowest
AIC, however, I've never observed this beahviour before, and can't help
but think thhat the shift from a posotive 1000+ AIC to a negative one is
due to the fact the data has been logged, rather than that the model
fitted to log data in this way is genuinley better.
Finally, I saw in a text, an example of using lmer but "Recoding Factor
Levels" like:
lineage<-Group:Lineage
dish<-Group:Lineage:Dish
disk<-Group:Lineage:Dish:Disk
model<-lmer(Diameter~Group+(1|lineage)+(1|dish)+(1|disk)
However I don't see why this should need to be done, considering, the
study was hieracheal, just like all other examples in that chapter, and
it does not give a reason why, but says it does the same job as a nested
anova, which I though mixed models did anyway.
Hopefully sombody can shed light on my concerns. In terms of my work and
university, I could include what I've done here and be as transparrant
as possible and discuss these issues, because log() of the data or
defining a distribution in the model is leading to the same plots and
conclusions. But I'd like to make sure I come to term with what's
actually happening here.
A million thanks,
Ben W.
lme14 <- lme(Diameter~Group*Lineage,random=~1|Dish/Disk, data=Dataset,
method="REML")
>anova(lme14):
numDF denDF F-value p-value
(Intercept) 1 342 16538.253 <.0001
Group 1 342 260.793 <.0001
Lineage 4 342 8.226 <.0001
Group:Lineage 4 342 9.473 <.0001
> summary(lme14)
Linear mixed-effects model fit by REML
Data: Dataset
AIC BIC logLik
1148.317 1198.470 -561.1587
Random effects:
Formula: ~1 | Dish
(Intercept)
StdDev: 0.1887527
Formula: ~1 | Disk %in% Dish
(Intercept) Residual
StdDev: 6.303059e-05 1.137701
Fixed effects: Diameter ~ Group * Lineage
Value Std.Error DF t-value
p-value
(Intercept) 15.049722 0.2187016 342 68.81396 0.0000
Group[T.NEDettol] 0.980556 0.2681586 342 3.65662 0.0003
Lineage[T.First] -0.116389 0.2681586 342 -0.43403 0.6645
Lineage[T.Fourth] -0.038056 0.2681586 342 -0.14191 0.8872
Lineage[T.Second] -0.177500 0.2681586 342 -0.66192 0.5085
Lineage[T.Third] 0.221111 0.2681586 342 0.82455 0.4102
Group[T.NEDettol]:Lineage[T.First] 2.275000 0.3792336 342 5.99894 0.0000
Group[T.NEDettol]:Lineage[T.Fourth] 0.955556 0.3792336 342 2.51970 0.0122
Group[T.NEDettol]:Lineage[T.Second] 0.828333 0.3792336 342 2.18423 0.0296
Group[T.NEDettol]:Lineage[T.Third] 0.721667 0.3792336 342 1.90296 0.0579
Correlation:
(Intr) Gr[T.NED] Lng[T.Frs] Lng[T.Frt]
Group[T.NEDettol] -0.613
Lineage[T.First] -0.613 0.500
Lineage[T.Fourth] -0.613 0.500 0.500
Lineage[T.Second] -0.613 0.500 0.500 0.500
Lineage[T.Third] -0.613 0.500 0.500 0.500
Group[T.NEDettol]:Lineage[T.First] 0.434 -0.707 -0.707 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] 0.434 -0.707 -0.354 -0.707
Group[T.NEDettol]:Lineage[T.Second] 0.434 -0.707 -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Third] 0.434 -0.707 -0.354 -0.354
L[T.S] L[T.T] Grp[T.NEDttl]:Lng[T.Frs]
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third] 0.500
Group[T.NEDettol]:Lineage[T.First] -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] -0.354 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Second] -0.707 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Third] -0.354 -0.707 0.500
Grp[T.NEDttl]:Lng[T.Frt] G[T.NED]:L[T.S
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third]
Group[T.NEDettol]:Lineage[T.First]
Group[T.NEDettol]:Lineage[T.Fourth]
Group[T.NEDettol]:Lineage[T.Second] 0.500
Group[T.NEDettol]:Lineage[T.Third] 0.500 0.500
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.47467771 -0.75133489 0.06697157 0.67851126 3.27449064
Number of Observations: 360
Number of Groups:
Dish Disk %in% Dish
3 9
_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Ben, What happens if you include plate as another random effect? random = ~1|Plate/Dish/Disk
Ben Ward wrote:
Hi, I'm using lme and lmer in my dissertation and it's the first time I've used these methods. Taking into account replies from my previous query I decided to go through with a model simplification, and then try to validate the models in various ways and come up with the best one to include in my work, be it a linear mixed effects model or general linear effects model, with log() data or not etc - interestingly it does not seems like doing transofrmations and such makes much difference so far, looking at changes in diagnostic plots and AIC. Anywho, I simplified to the model using lme (I've pasted it at the bottom). And looking at the anova output the numDF looks right. However I'm concerned about the 342 df in the denDF in anova() and in the summary() output, as it seems to high to me, because at the observation level is too high and pseudoreplicated; 4 readings per disk, 3 disks, per plate, 3 plates per lineage, 5 lineages per group, 2 groups so: 4*3*3*5*2=360. If I take this to disk level 3*3*5*2=90, and at dish level it's 3*5*2=30 degrees of freedom for error. And either dish or disk (arguments for both) is the level at which one independant point of datum is obtained, most probably Dish. So I'm wondering if either I'de done something wrong, or I'm not understanding how df are presented and used in mixed models. It's not really explained in my texts, and my lecturer told me I'm working at the edge of his personal/professional experience. I've used lmer and the function in languageR to extract p-values without it even mentioning df. Now if the lmer method with pvals.fnc() makes it so as I don't have to worry about these df then in a way it makes my issue a bit redundant. But it is playing on my mind a bit so felt I should ask. My second question is about when I do the equivalent model using lmer: "lmer(Diameter~Group*Lineage+(1|Dish)+(1|Disk), data=Dataset)" - which I'm sure does the same because all my plots of residuals against fitted and such are the same, if I define it with the poisson family, which uses log, then I get a much lower AIC of about 45, compared to over 1000 without family defined, which I think defaults to gaussian/normal. And my diagnostic plots still give me all the same patters, but just looking a bit different because of the family distribution specified. I then did a model logging the response variable by using log(Diameter), again, I get the same diagnostic plot patterns, but on a different scale, and I get an AIC of - 795.6. Now normally I'd go for the model with the lowest AIC, however, I've never observed this beahviour before, and can't help but think thhat the shift from a posotive 1000+ AIC to a negative one is due to the fact the data has been logged, rather than that the model fitted to log data in this way is genuinley better. Finally, I saw in a text, an example of using lmer but "Recoding Factor Levels" like: lineage<-Group:Lineage dish<-Group:Lineage:Dish disk<-Group:Lineage:Dish:Disk model<-lmer(Diameter~Group+(1|lineage)+(1|dish)+(1|disk) However I don't see why this should need to be done, considering, the study was hieracheal, just like all other examples in that chapter, and it does not give a reason why, but says it does the same job as a nested anova, which I though mixed models did anyway. Hopefully sombody can shed light on my concerns. In terms of my work and university, I could include what I've done here and be as transparrant as possible and discuss these issues, because log() of the data or defining a distribution in the model is leading to the same plots and conclusions. But I'd like to make sure I come to term with what's actually happening here. A million thanks, Ben W. lme14 <- lme(Diameter~Group*Lineage,random=~1|Dish/Disk, data=Dataset, method="REML")
>anova(lme14):
numDF denDF F-value p-value (Intercept) 1 342 16538.253 <.0001 Group 1 342 260.793 <.0001 Lineage 4 342 8.226 <.0001 Group:Lineage 4 342 9.473 <.0001
> summary(lme14)
Linear mixed-effects model fit by REML
Data: Dataset
AIC BIC logLik
1148.317 1198.470 -561.1587
Random effects:
Formula: ~1 | Dish
(Intercept)
StdDev: 0.1887527
Formula: ~1 | Disk %in% Dish
(Intercept) Residual
StdDev: 6.303059e-05 1.137701
Fixed effects: Diameter ~ Group * Lineage
Value Std.Error DF t-value
p-value
(Intercept) 15.049722 0.2187016 342 68.81396
0.0000
Group[T.NEDettol] 0.980556 0.2681586 342 3.65662
0.0003
Lineage[T.First] -0.116389 0.2681586 342 -0.43403
0.6645
Lineage[T.Fourth] -0.038056 0.2681586 342 -0.14191
0.8872
Lineage[T.Second] -0.177500 0.2681586 342 -0.66192
0.5085
Lineage[T.Third] 0.221111 0.2681586 342 0.82455
0.4102
Group[T.NEDettol]:Lineage[T.First] 2.275000 0.3792336 342 5.99894
0.0000
Group[T.NEDettol]:Lineage[T.Fourth] 0.955556 0.3792336 342 2.51970
0.0122
Group[T.NEDettol]:Lineage[T.Second] 0.828333 0.3792336 342 2.18423
0.0296
Group[T.NEDettol]:Lineage[T.Third] 0.721667 0.3792336 342 1.90296
0.0579
Correlation:
(Intr) Gr[T.NED] Lng[T.Frs] Lng[T.Frt]
Group[T.NEDettol] -0.613
Lineage[T.First] -0.613 0.500
Lineage[T.Fourth] -0.613 0.500 0.500
Lineage[T.Second] -0.613 0.500 0.500 0.500
Lineage[T.Third] -0.613 0.500 0.500 0.500
Group[T.NEDettol]:Lineage[T.First] 0.434 -0.707 -0.707 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] 0.434 -0.707 -0.354 -0.707
Group[T.NEDettol]:Lineage[T.Second] 0.434 -0.707 -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Third] 0.434 -0.707 -0.354 -0.354
L[T.S] L[T.T] Grp[T.NEDttl]:Lng[T.Frs]
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third] 0.500
Group[T.NEDettol]:Lineage[T.First] -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] -0.354 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Second] -0.707 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Third] -0.354 -0.707 0.500
Grp[T.NEDttl]:Lng[T.Frt] G[T.NED]:L[T.S
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third]
Group[T.NEDettol]:Lineage[T.First]
Group[T.NEDettol]:Lineage[T.Fourth]
Group[T.NEDettol]:Lineage[T.Second] 0.500
Group[T.NEDettol]:Lineage[T.Third] 0.500 0.500
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.47467771 -0.75133489 0.06697157 0.67851126 3.27449064
Number of Observations: 360
Number of Groups:
Dish Disk %in% Dish
3 9
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Hi Ben W, Oh: I meant Disk in Dish not Dish in Disk ... Toby
From: r-sig-mixed-models-bounces at r-project.org [r-sig-mixed-models-bounces at r-project.org] On Behalf Of Toby Marthews [toby.marthews at ouce.ox.ac.uk]
Sent: 16 March 2011 09:26
To: Ben Ward
Cc: R-SIG-Mixed-Models at r-project.org
Subject: Re: [R-sig-ME] DF in lme
Sent: 16 March 2011 09:26
To: Ben Ward
Cc: R-SIG-Mixed-Models at r-project.org
Subject: Re: [R-sig-ME] DF in lme
Hi Ben W, 1. About the denominator degrees of freedom in lme, please see these posts: http://r.789695.n4.nabble.com/degrees-of-freedom-in-lme-td828478.html http://markmail.org/message/i445xmb25yzu2g4h 2. Also, is Dish nested in Disk? From your lmer command it seems that it isn't, but in your lme command it is. HTH Toby Marthews ________________________________________ From: r-sig-mixed-models-bounces at r-project.org [r-sig-mixed-models-bounces at r-project.org] On Behalf Of Ben Ward [benjamin.ward at bathspa.org] Sent: 16 March 2011 07:52 To: R-SIG-Mixed-Models at r-project.org Subject: [R-sig-ME] DF in lme Hi, I'm using lme and lmer in my dissertation and it's the first time I've used these methods. Taking into account replies from my previous query I decided to go through with a model simplification, and then try to validate the models in various ways and come up with the best one to include in my work, be it a linear mixed effects model or general linear effects model, with log() data or not etc - interestingly it does not seems like doing transofrmations and such makes much difference so far, looking at changes in diagnostic plots and AIC. Anywho, I simplified to the model using lme (I've pasted it at the bottom). And looking at the anova output the numDF looks right. However I'm concerned about the 342 df in the denDF in anova() and in the summary() output, as it seems to high to me, because at the observation level is too high and pseudoreplicated; 4 readings per disk, 3 disks, per plate, 3 plates per lineage, 5 lineages per group, 2 groups so: 4*3*3*5*2=360. If I take this to disk level 3*3*5*2=90, and at dish level it's 3*5*2=30 degrees of freedom for error. And either dish or disk (arguments for both) is the level at which one independant point of datum is obtained, most probably Dish. So I'm wondering if either I'de done something wrong, or I'm not understanding how df are presented and used in mixed models. It's not really explained in my texts, and my lecturer told me I'm working at the edge of his personal/professional experience. I've used lmer and the function in languageR to extract p-values without it even mentioning df. Now if the lmer method with pvals.fnc() makes it so as I don't have to worry about these df then in a way it makes my issue a bit redundant. But it is playing on my mind a bit so felt I should ask. My second question is about when I do the equivalent model using lmer: "lmer(Diameter~Group*Lineage+(1|Dish)+(1|Disk), data=Dataset)" - which I'm sure does the same because all my plots of residuals against fitted and such are the same, if I define it with the poisson family, which uses log, then I get a much lower AIC of about 45, compared to over 1000 without family defined, which I think defaults to gaussian/normal. And my diagnostic plots still give me all the same patters, but just looking a bit different because of the family distribution specified. I then did a model logging the response variable by using log(Diameter), again, I get the same diagnostic plot patterns, but on a different scale, and I get an AIC of - 795.6. Now normally I'd go for the model with the lowest AIC, however, I've never observed this beahviour before, and can't help but think thhat the shift from a posotive 1000+ AIC to a negative one is due to the fact the data has been logged, rather than that the model fitted to log data in this way is genuinley better. Finally, I saw in a text, an example of using lmer but "Recoding Factor Levels" like: lineage<-Group:Lineage dish<-Group:Lineage:Dish disk<-Group:Lineage:Dish:Disk model<-lmer(Diameter~Group+(1|lineage)+(1|dish)+(1|disk) However I don't see why this should need to be done, considering, the study was hieracheal, just like all other examples in that chapter, and it does not give a reason why, but says it does the same job as a nested anova, which I though mixed models did anyway. Hopefully sombody can shed light on my concerns. In terms of my work and university, I could include what I've done here and be as transparrant as possible and discuss these issues, because log() of the data or defining a distribution in the model is leading to the same plots and conclusions. But I'd like to make sure I come to term with what's actually happening here. A million thanks, Ben W. lme14 <- lme(Diameter~Group*Lineage,random=~1|Dish/Disk, data=Dataset, method="REML") >anova(lme14): numDF denDF F-value p-value (Intercept) 1 342 16538.253 <.0001 Group 1 342 260.793 <.0001 Lineage 4 342 8.226 <.0001 Group:Lineage 4 342 9.473 <.0001 > summary(lme14) Linear mixed-effects model fit by REML Data: Dataset AIC BIC logLik 1148.317 1198.470 -561.1587 Random effects: Formula: ~1 | Dish (Intercept) StdDev: 0.1887527 Formula: ~1 | Disk %in% Dish (Intercept) Residual StdDev: 6.303059e-05 1.137701 Fixed effects: Diameter ~ Group * Lineage Value Std.Error DF t-value p-value (Intercept) 15.049722 0.2187016 342 68.81396 0.0000 Group[T.NEDettol] 0.980556 0.2681586 342 3.65662 0.0003 Lineage[T.First] -0.116389 0.2681586 342 -0.43403 0.6645 Lineage[T.Fourth] -0.038056 0.2681586 342 -0.14191 0.8872 Lineage[T.Second] -0.177500 0.2681586 342 -0.66192 0.5085 Lineage[T.Third] 0.221111 0.2681586 342 0.82455 0.4102 Group[T.NEDettol]:Lineage[T.First] 2.275000 0.3792336 342 5.99894 0.0000 Group[T.NEDettol]:Lineage[T.Fourth] 0.955556 0.3792336 342 2.51970 0.0122 Group[T.NEDettol]:Lineage[T.Second] 0.828333 0.3792336 342 2.18423 0.0296 Group[T.NEDettol]:Lineage[T.Third] 0.721667 0.3792336 342 1.90296 0.0579 Correlation: (Intr) Gr[T.NED] Lng[T.Frs] Lng[T.Frt] Group[T.NEDettol] -0.613 Lineage[T.First] -0.613 0.500 Lineage[T.Fourth] -0.613 0.500 0.500 Lineage[T.Second] -0.613 0.500 0.500 0.500 Lineage[T.Third] -0.613 0.500 0.500 0.500 Group[T.NEDettol]:Lineage[T.First] 0.434 -0.707 -0.707 -0.354 Group[T.NEDettol]:Lineage[T.Fourth] 0.434 -0.707 -0.354 -0.707 Group[T.NEDettol]:Lineage[T.Second] 0.434 -0.707 -0.354 -0.354 Group[T.NEDettol]:Lineage[T.Third] 0.434 -0.707 -0.354 -0.354 L[T.S] L[T.T] Grp[T.NEDttl]:Lng[T.Frs] Group[T.NEDettol] Lineage[T.First] Lineage[T.Fourth] Lineage[T.Second] Lineage[T.Third] 0.500 Group[T.NEDettol]:Lineage[T.First] -0.354 -0.354 Group[T.NEDettol]:Lineage[T.Fourth] -0.354 -0.354 0.500 Group[T.NEDettol]:Lineage[T.Second] -0.707 -0.354 0.500 Group[T.NEDettol]:Lineage[T.Third] -0.354 -0.707 0.500 Grp[T.NEDttl]:Lng[T.Frt] G[T.NED]:L[T.S Group[T.NEDettol] Lineage[T.First] Lineage[T.Fourth] Lineage[T.Second] Lineage[T.Third] Group[T.NEDettol]:Lineage[T.First] Group[T.NEDettol]:Lineage[T.Fourth] Group[T.NEDettol]:Lineage[T.Second] 0.500 Group[T.NEDettol]:Lineage[T.Third] 0.500 0.500 Standardized Within-Group Residuals: Min Q1 Med Q3 Max -2.47467771 -0.75133489 0.06697157 0.67851126 3.27449064 Number of Observations: 360 Number of Groups: Dish Disk %in% Dish 3 9 _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
On 11-03-16 03:52 AM, Ben Ward wrote:
Hi, I'm using lme and lmer in my dissertation and it's the first time I've used these methods. Taking into account replies from my previous query I decided to go through with a model simplification, and then try to validate the models in various ways and come up with the best one to include in my work, be it a linear mixed effects model or general linear effects model, with log() data or not etc - interestingly it does not seems like doing transofrmations and such makes much difference so far, looking at changes in diagnostic plots and AIC.
Be careful about comparing fits of transformed and non-transformed data via AIC/log-likelihood: e.g. see <http://www.unc.edu/courses/2006spring/ecol/145/001/docs/lectures/lecture18.htm>. (This does *not* refer to the link function, e.g. the log link of the Poisson, but to the case where you transform your data prior to analysis.)
Anywho, I simplified to the model using lme (I've pasted it at the bottom). And looking at the anova output the numDF looks right. However I'm concerned about the 342 df in the denDF in anova() and in the summary() output, as it seems to high to me, because at the observation level is too high and pseudoreplicated; 4 readings per disk, 3 disks, per plate, 3 plates per lineage, 5 lineages per group, 2 groups so: 4*3*3*5*2=360. If I take this to disk level 3*3*5*2=90, and at dish level it's 3*5*2=30 degrees of freedom for error. And either dish or disk (arguments for both) is the level at which one independant point of datum is obtained, most probably Dish. So I'm wondering if either I'de done something wrong, or I'm not understanding how df are presented and used in mixed models. It's not really explained in my texts, and my lecturer told me I'm working at the edge of his personal/professional experience.
At what level are Group and Lineage replicated in the model? Do you have different Groups or Lineages represented on the same disk, dish, or plate? If you do have multiple Groups and Lineages present at the lowest level of replication, then you have a randomized block design and the degrees of freedom may be higher than you think. If you really want denominator degrees of freedom and you want them correct, consult an experimental design book and figure out what they should be in the classical framework ...
I've used lmer and the function in languageR to extract p-values without it even mentioning df. Now if the lmer method with pvals.fnc() makes it so as I don't have to worry about these df then in a way it makes my issue a bit redundant. But it is playing on my mind a bit so felt I should ask. My second question is about when I do the equivalent model using lmer: "lmer(Diameter~Group*Lineage+(1|Dish)+(1|Disk), data=Dataset)" - which I'm sure does the same because all my plots of residuals against fitted and such are the same, if I define it with the poisson family, which uses log, then I get a much lower AIC of about 45, compared to over 1000 without family defined, which I think defaults to gaussian/normal.
I don't think you should try to pick the family on the basis of AIC -- you should pick it on the basis of the qualitative nature of the data. If you have count data, you should probably use Poisson (but you may want to add an observation-level random effect to allow for overdispersion.) If your response variable is Diameter, it is **not** a count variable, and you shouldn't use Poisson -- you should use an appropriately transformed response variable. And
my diagnostic plots still give me all the same patters, but just looking a bit different because of the family distribution specified. I then did a model logging the response variable by using log(Diameter), again, I get the same diagnostic plot patterns, but on a different scale, and I get an AIC of - 795.6. Now normally I'd go for the model with the lowest AIC, however, I've never observed this beahviour before, and can't help but think thhat the shift from a posotive 1000+ AIC to a negative one is due to the fact the data has been logged, rather than that the model fitted to log data in this way is genuinley better. Finally, I saw in a text, an example of using lmer but "Recoding Factor Levels" like: lineage<-Group:Lineage dish<-Group:Lineage:Dish disk<-Group:Lineage:Dish:Disk model<-lmer(Diameter~Group+(1|lineage)+(1|dish)+(1|disk) However I don't see why this should need to be done, considering, the study was hieracheal, just like all other examples in that chapter, and it does not give a reason why, but says it does the same job as a nested anova, which I though mixed models did anyway.
(1|lineage)+(1|dish)+(1|disk) is the same as (1|Lineage/Dish/Disk) (1|Dish) + (1|Disk) is **not** the same as (1|Dish/Disk), if Disk is not labeled uniquely (i.e. if Dishes are A, B, C, .. and Disks are 1, 2, 3, ... then you need Dish/Disk. If you have labeled Disks A1, A2, ... B1, B2, ... then the specifications are equivalent. For a linear mixed model (i.e. not Poisson counts) you should be able to run the same model in lmer and lme and get extremely similar results.
Hopefully sombody can shed light on my concerns. In terms of my work and university, I could include what I've done here and be as transparrant as possible and discuss these issues, because log() of the data or defining a distribution in the model is leading to the same plots and conclusions. But I'd like to make sure I come to term with what's actually happening here. A million thanks, Ben W. lme14 <- lme(Diameter~Group*Lineage,random=~1|Dish/Disk, data=Dataset, method="REML")
anova(lme14):
numDF denDF F-value p-value (Intercept) 1 342 16538.253 <.0001 Group 1 342 260.793 <.0001 Lineage 4 342 8.226 <.0001 Group:Lineage 4 342 9.473 <.0001
summary(lme14)
Linear mixed-effects model fit by REML
Data: Dataset
AIC BIC logLik
1148.317 1198.470 -561.1587
Random effects:
Formula: ~1 | Dish
(Intercept)
StdDev: 0.1887527
Formula: ~1 | Disk %in% Dish
(Intercept) Residual
StdDev: 6.303059e-05 1.137701
Fixed effects: Diameter ~ Group * Lineage
Value Std.Error DF t-value
p-value
(Intercept) 15.049722 0.2187016 342 68.81396
0.0000
Group[T.NEDettol] 0.980556 0.2681586 342 3.65662
0.0003
Lineage[T.First] -0.116389 0.2681586 342 -0.43403
0.6645
Lineage[T.Fourth] -0.038056 0.2681586 342 -0.14191
0.8872
Lineage[T.Second] -0.177500 0.2681586 342 -0.66192
0.5085
Lineage[T.Third] 0.221111 0.2681586 342 0.82455
0.4102
Group[T.NEDettol]:Lineage[T.First] 2.275000 0.3792336 342 5.99894
0.0000
Group[T.NEDettol]:Lineage[T.Fourth] 0.955556 0.3792336 342 2.51970
0.0122
Group[T.NEDettol]:Lineage[T.Second] 0.828333 0.3792336 342 2.18423
0.0296
Group[T.NEDettol]:Lineage[T.Third] 0.721667 0.3792336 342 1.90296
0.0579
Correlation:
(Intr) Gr[T.NED] Lng[T.Frs] Lng[T.Frt]
Group[T.NEDettol] -0.613
Lineage[T.First] -0.613 0.500
Lineage[T.Fourth] -0.613 0.500 0.500
Lineage[T.Second] -0.613 0.500 0.500 0.500
Lineage[T.Third] -0.613 0.500 0.500 0.500
Group[T.NEDettol]:Lineage[T.First] 0.434 -0.707 -0.707 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] 0.434 -0.707 -0.354 -0.707
Group[T.NEDettol]:Lineage[T.Second] 0.434 -0.707 -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Third] 0.434 -0.707 -0.354 -0.354
L[T.S] L[T.T] Grp[T.NEDttl]:Lng[T.Frs]
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third] 0.500
Group[T.NEDettol]:Lineage[T.First] -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] -0.354 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Second] -0.707 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Third] -0.354 -0.707 0.500
Grp[T.NEDttl]:Lng[T.Frt] G[T.NED]:L[T.S
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third]
Group[T.NEDettol]:Lineage[T.First]
Group[T.NEDettol]:Lineage[T.Fourth]
Group[T.NEDettol]:Lineage[T.Second] 0.500
Group[T.NEDettol]:Lineage[T.Third] 0.500 0.500
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.47467771 -0.75133489 0.06697157 0.67851126 3.27449064
Number of Observations: 360
Number of Groups:
Dish Disk %in% Dish
3 9
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
On 16/03/2011 15:08, Ben Bolker wrote:
On 11-03-16 03:52 AM, Ben Ward wrote:
Hi, I'm using lme and lmer in my dissertation and it's the first time I've used these methods. Taking into account replies from my previous query I decided to go through with a model simplification, and then try to validate the models in various ways and come up with the best one to include in my work, be it a linear mixed effects model or general linear effects model, with log() data or not etc - interestingly it does not seems like doing transofrmations and such makes much difference so far, looking at changes in diagnostic plots and AIC.
Be careful about comparing fits of transformed and non-transformed data via AIC/log-likelihood: e.g. see <http://www.unc.edu/courses/2006spring/ecol/145/001/docs/lectures/lecture18.htm>. (This does *not* refer to the link function, e.g. the log link of the Poisson, but to the case where you transform your data prior to analysis.)
Anywho, I simplified to the model using lme (I've pasted it at the bottom). And looking at the anova output the numDF looks right. However I'm concerned about the 342 df in the denDF in anova() and in the summary() output, as it seems to high to me, because at the observation level is too high and pseudoreplicated; 4 readings per disk, 3 disks, per plate, 3 plates per lineage, 5 lineages per group, 2 groups so: 4*3*3*5*2=360. If I take this to disk level 3*3*5*2=90, and at dish level it's 3*5*2=30 degrees of freedom for error. And either dish or disk (arguments for both) is the level at which one independant point of datum is obtained, most probably Dish. So I'm wondering if either I'de done something wrong, or I'm not understanding how df are presented and used in mixed models. It's not really explained in my texts, and my lecturer told me I'm working at the edge of his personal/professional experience.
At what level are Group and Lineage replicated in the model? Do you have different Groups or Lineages represented on the same disk, dish, or plate? If you do have multiple Groups and Lineages present at the lowest level of replication, then you have a randomized block design and the degrees of freedom may be higher than you think. If you really want denominator degrees of freedom and you want them correct, consult an experimental design book and figure out what they should be in the classical framework ...
I'm now of the opinion that - (Just trying to get my head around it) -
that I don't have a randomized block design:
I've done a bit like a lenski evolution experiment with my microbes,
which involed two groups, in those two groups i have 5 cultures each,
one group is 5 lineages of bacteria I have been evolving against some
antimicrobial, the other group have not been through this - they are
stock run of the mill organisms. So with those 5 cultures of evolved
bacteria, for each, I'd take some, and spread it on three plates - so
theres no intermingling or randomization/mixing of the cultures: each
gets plated onto a who plate itself three times. Then the three disks,
loaded with antimicrobial were loaded onto each plate, and they were
incubated, and then I took 4 measurements from each zone that formed
around those disks. The disks all have the same antimicrobial on them.
So in that way, if what you say by randomized block design is something
like a split plot experiment, where there are several plots, and
numerous plants, and each one got a different treatment, then I don't
believe my experiment is like that. In my case that would be like me
having different cultures on the same dish, or using disks with
different antimicrobials on, at least I think this is what you're
asking. In which case Dish is the level at which I get truly indepentent
pieces of data, and 3plates*5lineages*2Groups=30: If I recode my factor
levels then, like so, which I mentioned before as a possibility:
Diameter<-Dataset$Diameter
Group<-factor(Dataset$Group)
Lineage<-factor(Dataset$Lineage)
Dish<-factor(Dataset$Dish)
Disk<-factor(Dataset$Disk)
lineage<-Group:Lineage
dish<-Group:Lineage:Dish
disk<-Group:Lineage:Dish:Disk
And then fit the model:
model <- lme(Diameter~Group*Lineage,random=~1|dish/disk, method="REML")
I get the following:
> summary(model)
Linear mixed-effects model fit by REML
Data: NULL
AIC BIC logLik
1144.193 1194.346 -559.0966
Random effects:
Formula: ~1 | dish
(Intercept)
StdDev: 0.2334716
Formula: ~1 | disk %in% dish
(Intercept) Residual
StdDev: 0.356117 1.079568
Fixed effects: Diameter ~ Group * Lineage
Value Std.Error DF t-value
p-value
(Intercept) 15.049722 0.2542337 270 59.19641 0.0000
Group[T.NEDettol] 0.980556 0.3595407 20 2.72724 0.0130
Lineage[T.First] -0.116389 0.3595407 20 -0.32372 0.7495
Lineage[T.Fourth] -0.038056 0.3595407 20 -0.10584 0.9168
Lineage[T.Second] -0.177500 0.3595407 20 -0.49369 0.6269
Lineage[T.Third] 0.221111 0.3595407 20 0.61498 0.5455
Group[T.NEDettol]:Lineage[T.First] 2.275000 0.5084674 20 4.47423 0.0002
Group[T.NEDettol]:Lineage[T.Fourth] 0.955556 0.5084674 20 1.87929 0.0749
Group[T.NEDettol]:Lineage[T.Second] 0.828333 0.5084674 20 1.62908 0.1189
Group[T.NEDettol]:Lineage[T.Third] 0.721667 0.5084674 20 1.41930 0.1712
Correlation:
(Intr) Gr[T.NED] Lng[T.Frs] Lng[T.Frt]
Group[T.NEDettol] -0.707
Lineage[T.First] -0.707 0.500
Lineage[T.Fourth] -0.707 0.500 0.500
Lineage[T.Second] -0.707 0.500 0.500 0.500
Lineage[T.Third] -0.707 0.500 0.500 0.500
Group[T.NEDettol]:Lineage[T.First] 0.500 -0.707 -0.707 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] 0.500 -0.707 -0.354 -0.707
Group[T.NEDettol]:Lineage[T.Second] 0.500 -0.707 -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Third] 0.500 -0.707 -0.354 -0.354
L[T.S] L[T.T] Grp[T.NEDttl]:Lng[T.Frs]
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third] 0.500
Group[T.NEDettol]:Lineage[T.First] -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] -0.354 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Second] -0.707 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Third] -0.354 -0.707 0.500
Grp[T.NEDttl]:Lng[T.Frt] G[T.NED]:L[T.S
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third]
Group[T.NEDettol]:Lineage[T.First]
Group[T.NEDettol]:Lineage[T.Fourth]
Group[T.NEDettol]:Lineage[T.Second] 0.500
Group[T.NEDettol]:Lineage[T.Third] 0.500 0.500
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.26060119 -0.70948250 0.03630884 0.69899536 3.42475990
Number of Observations: 360
Number of Groups:
dish disk %in% dish
30 90
> anova(model)
numDF denDF F-value p-value
(Intercept) 1 270 39586.82 <.0001
Group 1 20 145.07 <.0001
Lineage 4 20 4.58 0.0087
Group:Lineage 4 20 5.27 0.0046
This is closer to what I was expecting in terms of DF: 3 plates*5
lineages=15: 15 samples per group, 15-4(the numDF Lineage)=11, 11-1(the
numDF for Group)= 10 x 2 for the two groups/treatments = 20.
Hopefully I've worked that out correctly, and sombody could tell me
whether . Its' awkward because this experiment is unprecedented at my
uni, it was offered up by a teacher as a topic but then got dropped due
to lack of interest. As it's the first time, myself and my supervisor
were in many ways flying blind. If I remove the Lineage main effect
term, and include it as a random effect, leaving only group as a fixed
effect:
> anova(model2)
numDF denDF F-value p-value
(Intercept) 1 270 8041.429 <.0001
Group 1 8 29.469 6e-04
I get 8DF which by the same reasoning in the above model, is 5-1=4, 4*2
= 8, so I take that as reassurance my working is correct. I'd also like
to ask for opinion, on whether it would be advisable to actually remove
lineage as a fixed effect, and include lineage as a random effect on the
slope, rather than intersect which is what I've put all the others as. I
ask this because, whilst I feel whilst lineage might seem a factor with
informative levels( tha's how I first saw them), I had no way of
predicting which ones would show greatest or smallest differences or how
the five factor levels would interact and shape my data, in that way the
factor levels are not really all that informative at all - they're just
numbered as dish and disk are, and their effects may even be different
within my two groups - they don't really allow any prediction in the
same way a factor for different types of fertiliser would in a plant
study would for example, so I'm thinking maybe it should be a random
effect.
Thank you very much to everyone that's replied to me and assisted me
with this, it's a tough learning curve, but I do think I'm beginning to
grasp how to use lme and lmer for my basic ends. Once I'm confident on
the above, I'm next considering, whether to try an introduce some
weighting options to see what happens to a small amount of
heterscedacity I have between the two groups.
Ben W.
I've used lmer and the function in languageR to extract p-values without it even mentioning df. Now if the lmer method with pvals.fnc() makes it so as I don't have to worry about these df then in a way it makes my issue a bit redundant. But it is playing on my mind a bit so felt I should ask. My second question is about when I do the equivalent model using lmer: "lmer(Diameter~Group*Lineage+(1|Dish)+(1|Disk), data=Dataset)" - which I'm sure does the same because all my plots of residuals against fitted and such are the same, if I define it with the poisson family, which uses log, then I get a much lower AIC of about 45, compared to over 1000 without family defined, which I think defaults to gaussian/normal.
I don't think you should try to pick the family on the basis of AIC -- you should pick it on the basis of the qualitative nature of the data. If you have count data, you should probably use Poisson (but you may want to add an observation-level random effect to allow for overdispersion.) If your response variable is Diameter, it is **not** a count variable, and you shouldn't use Poisson -- you should use an appropriately transformed response variable.
I've tried transforming my response variable in a few ways, like natural log, sqrt, and (x/1) but they don't really seem to alter the distribution or shape of my data at all. Interestingly, if I look at the spread of the data by splitting the response variable between the two groups, I see much more symmetry - although still not a nice neat normal distribution, but in Biology I've been taught never to expect one.
And
my diagnostic plots still give me all the same patters, but just looking a bit different because of the family distribution specified. I then did a model logging the response variable by using log(Diameter), again, I get the same diagnostic plot patterns, but on a different scale, and I get an AIC of - 795.6. Now normally I'd go for the model with the lowest AIC, however, I've never observed this beahviour before, and can't help but think thhat the shift from a posotive 1000+ AIC to a negative one is due to the fact the data has been logged, rather than that the model fitted to log data in this way is genuinley better. Finally, I saw in a text, an example of using lmer but "Recoding Factor Levels" like: lineage<-Group:Lineage dish<-Group:Lineage:Dish disk<-Group:Lineage:Dish:Disk model<-lmer(Diameter~Group+(1|lineage)+(1|dish)+(1|disk) However I don't see why this should need to be done, considering, the study was hieracheal, just like all other examples in that chapter, and it does not give a reason why, but says it does the same job as a nested anova, which I though mixed models did anyway.
(1|lineage)+(1|dish)+(1|disk) is the same as (1|Lineage/Dish/Disk) (1|Dish) + (1|Disk) is **not** the same as (1|Dish/Disk), if Disk is not labeled uniquely (i.e. if Dishes are A, B, C, .. and Disks are 1, 2, 3, ... then you need Dish/Disk. If you have labeled Disks A1, A2, ... B1, B2, ... then the specifications are equivalent. For a linear mixed model (i.e. not Poisson counts) you should be able to run the same model in lmer and lme and get extremely similar results.
Hopefully sombody can shed light on my concerns. In terms of my work and university, I could include what I've done here and be as transparrant as possible and discuss these issues, because log() of the data or defining a distribution in the model is leading to the same plots and conclusions. But I'd like to make sure I come to term with what's actually happening here. A million thanks, Ben W. lme14<- lme(Diameter~Group*Lineage,random=~1|Dish/Disk, data=Dataset, method="REML")
anova(lme14):
numDF denDF F-value p-value (Intercept) 1 342 16538.253<.0001 Group 1 342 260.793<.0001 Lineage 4 342 8.226<.0001 Group:Lineage 4 342 9.473<.0001
summary(lme14)
Linear mixed-effects model fit by REML
Data: Dataset
AIC BIC logLik
1148.317 1198.470 -561.1587
Random effects:
Formula: ~1 | Dish
(Intercept)
StdDev: 0.1887527
Formula: ~1 | Disk %in% Dish
(Intercept) Residual
StdDev: 6.303059e-05 1.137701
Fixed effects: Diameter ~ Group * Lineage
Value Std.Error DF t-value
p-value
(Intercept) 15.049722 0.2187016 342 68.81396
0.0000
Group[T.NEDettol] 0.980556 0.2681586 342 3.65662
0.0003
Lineage[T.First] -0.116389 0.2681586 342 -0.43403
0.6645
Lineage[T.Fourth] -0.038056 0.2681586 342 -0.14191
0.8872
Lineage[T.Second] -0.177500 0.2681586 342 -0.66192
0.5085
Lineage[T.Third] 0.221111 0.2681586 342 0.82455
0.4102
Group[T.NEDettol]:Lineage[T.First] 2.275000 0.3792336 342 5.99894
0.0000
Group[T.NEDettol]:Lineage[T.Fourth] 0.955556 0.3792336 342 2.51970
0.0122
Group[T.NEDettol]:Lineage[T.Second] 0.828333 0.3792336 342 2.18423
0.0296
Group[T.NEDettol]:Lineage[T.Third] 0.721667 0.3792336 342 1.90296
0.0579
Correlation:
(Intr) Gr[T.NED] Lng[T.Frs] Lng[T.Frt]
Group[T.NEDettol] -0.613
Lineage[T.First] -0.613 0.500
Lineage[T.Fourth] -0.613 0.500 0.500
Lineage[T.Second] -0.613 0.500 0.500 0.500
Lineage[T.Third] -0.613 0.500 0.500 0.500
Group[T.NEDettol]:Lineage[T.First] 0.434 -0.707 -0.707 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] 0.434 -0.707 -0.354 -0.707
Group[T.NEDettol]:Lineage[T.Second] 0.434 -0.707 -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Third] 0.434 -0.707 -0.354 -0.354
L[T.S] L[T.T] Grp[T.NEDttl]:Lng[T.Frs]
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third] 0.500
Group[T.NEDettol]:Lineage[T.First] -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] -0.354 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Second] -0.707 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Third] -0.354 -0.707 0.500
Grp[T.NEDttl]:Lng[T.Frt] G[T.NED]:L[T.S
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third]
Group[T.NEDettol]:Lineage[T.First]
Group[T.NEDettol]:Lineage[T.Fourth]
Group[T.NEDettol]:Lineage[T.Second] 0.500
Group[T.NEDettol]:Lineage[T.Third] 0.500 0.500
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.47467771 -0.75133489 0.06697157 0.67851126 3.27449064
Number of Observations: 360
Number of Groups:
Dish Disk %in% Dish
3 9
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Ben Suppose you calculated an average response for each of the 30 plates in your experiment, and calculated a standard two-way anova as follows: Source of variation DF Groups 1 Lines 4 Groups x lines 4 Residual 20 The F-tests from this anova should agree with the Wald tests from lmer. The residual is based on variation between plates within lines and groups. If I understand the design correctly, the other sources of variation (between disks in plates, between readings within disks) may be of interest but do not feature individually in the testing of groups and lines. When data are balanced, an anova can clarify some of the obscurities of mixed model fitting. Is this a controversial observation on this list?
Ben Ward wrote:
On 16/03/2011 15:08, Ben Bolker wrote:
On 11-03-16 03:52 AM, Ben Ward wrote:
Hi, I'm using lme and lmer in my dissertation and it's the first time I've used these methods. Taking into account replies from my previous query I decided to go through with a model simplification, and then try to validate the models in various ways and come up with the best one to include in my work, be it a linear mixed effects model or general linear effects model, with log() data or not etc - interestingly it does not seems like doing transofrmations and such makes much difference so far, looking at changes in diagnostic plots and AIC.
Be careful about comparing fits of transformed and non-transformed data via AIC/log-likelihood: e.g. see <http://www.unc.edu/courses/2006spring/ecol/145/001/docs/lectures/lecture18.htm>. (This does *not* refer to the link function, e.g. the log link of the Poisson, but to the case where you transform your data prior to analysis.)
Anywho, I simplified to the model using lme (I've pasted it at the bottom). And looking at the anova output the numDF looks right. However I'm concerned about the 342 df in the denDF in anova() and in the summary() output, as it seems to high to me, because at the observation level is too high and pseudoreplicated; 4 readings per disk, 3 disks, per plate, 3 plates per lineage, 5 lineages per group, 2 groups so: 4*3*3*5*2=360. If I take this to disk level 3*3*5*2=90, and at dish level it's 3*5*2=30 degrees of freedom for error. And either dish or disk (arguments for both) is the level at which one independant point of datum is obtained, most probably Dish. So I'm wondering if either I'de done something wrong, or I'm not understanding how df are presented and used in mixed models. It's not really explained in my texts, and my lecturer told me I'm working at the edge of his personal/professional experience.
At what level are Group and Lineage replicated in the model? Do you have different Groups or Lineages represented on the same disk, dish, or plate? If you do have multiple Groups and Lineages present at the lowest level of replication, then you have a randomized block design and the degrees of freedom may be higher than you think. If you really want denominator degrees of freedom and you want them correct, consult an experimental design book and figure out what they should be in the classical framework ...
I'm now of the opinion that - (Just trying to get my head around it) - that I don't have a randomized block design: I've done a bit like a lenski evolution experiment with my microbes, which involed two groups, in those two groups i have 5 cultures each, one group is 5 lineages of bacteria I have been evolving against some antimicrobial, the other group have not been through this - they are stock run of the mill organisms. So with those 5 cultures of evolved bacteria, for each, I'd take some, and spread it on three plates - so theres no intermingling or randomization/mixing of the cultures: each gets plated onto a who plate itself three times. Then the three disks, loaded with antimicrobial were loaded onto each plate, and they were incubated, and then I took 4 measurements from each zone that formed around those disks. The disks all have the same antimicrobial on them. So in that way, if what you say by randomized block design is something like a split plot experiment, where there are several plots, and numerous plants, and each one got a different treatment, then I don't believe my experiment is like that. In my case that would be like me having different cultures on the same dish, or using disks with different antimicrobials on, at least I think this is what you're asking. In which case Dish is the level at which I get truly indepentent pieces of data, and 3plates*5lineages*2Groups=30: If I recode my factor levels then, like so, which I mentioned before as a possibility: Diameter<-Dataset$Diameter Group<-factor(Dataset$Group) Lineage<-factor(Dataset$Lineage) Dish<-factor(Dataset$Dish) Disk<-factor(Dataset$Disk) lineage<-Group:Lineage dish<-Group:Lineage:Dish disk<-Group:Lineage:Dish:Disk And then fit the model: model <- lme(Diameter~Group*Lineage,random=~1|dish/disk, method="REML") I get the following:
> summary(model)
Linear mixed-effects model fit by REML
Data: NULL
AIC BIC logLik
1144.193 1194.346 -559.0966
Random effects:
Formula: ~1 | dish
(Intercept)
StdDev: 0.2334716
Formula: ~1 | disk %in% dish
(Intercept) Residual
StdDev: 0.356117 1.079568
Fixed effects: Diameter ~ Group * Lineage
Value Std.Error DF t-value
p-value
(Intercept) 15.049722 0.2542337 270 59.19641
0.0000
Group[T.NEDettol] 0.980556 0.3595407 20 2.72724
0.0130
Lineage[T.First] -0.116389 0.3595407 20 -0.32372
0.7495
Lineage[T.Fourth] -0.038056 0.3595407 20 -0.10584
0.9168
Lineage[T.Second] -0.177500 0.3595407 20 -0.49369
0.6269
Lineage[T.Third] 0.221111 0.3595407 20 0.61498
0.5455
Group[T.NEDettol]:Lineage[T.First] 2.275000 0.5084674 20 4.47423
0.0002
Group[T.NEDettol]:Lineage[T.Fourth] 0.955556 0.5084674 20 1.87929
0.0749
Group[T.NEDettol]:Lineage[T.Second] 0.828333 0.5084674 20 1.62908
0.1189
Group[T.NEDettol]:Lineage[T.Third] 0.721667 0.5084674 20 1.41930
0.1712
Correlation:
(Intr) Gr[T.NED] Lng[T.Frs] Lng[T.Frt]
Group[T.NEDettol] -0.707
Lineage[T.First] -0.707 0.500
Lineage[T.Fourth] -0.707 0.500 0.500
Lineage[T.Second] -0.707 0.500 0.500 0.500
Lineage[T.Third] -0.707 0.500 0.500 0.500
Group[T.NEDettol]:Lineage[T.First] 0.500 -0.707 -0.707 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] 0.500 -0.707 -0.354 -0.707
Group[T.NEDettol]:Lineage[T.Second] 0.500 -0.707 -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Third] 0.500 -0.707 -0.354 -0.354
L[T.S] L[T.T] Grp[T.NEDttl]:Lng[T.Frs]
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third] 0.500
Group[T.NEDettol]:Lineage[T.First] -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] -0.354 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Second] -0.707 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Third] -0.354 -0.707 0.500
Grp[T.NEDttl]:Lng[T.Frt] G[T.NED]:L[T.S
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third]
Group[T.NEDettol]:Lineage[T.First]
Group[T.NEDettol]:Lineage[T.Fourth]
Group[T.NEDettol]:Lineage[T.Second] 0.500
Group[T.NEDettol]:Lineage[T.Third] 0.500 0.500
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.26060119 -0.70948250 0.03630884 0.69899536 3.42475990
Number of Observations: 360
Number of Groups:
dish disk %in% dish
30 90
> anova(model)
numDF denDF F-value p-value (Intercept) 1 270 39586.82 <.0001 Group 1 20 145.07 <.0001 Lineage 4 20 4.58 0.0087 Group:Lineage 4 20 5.27 0.0046 This is closer to what I was expecting in terms of DF: 3 plates*5 lineages=15: 15 samples per group, 15-4(the numDF Lineage)=11, 11-1(the numDF for Group)= 10 x 2 for the two groups/treatments = 20. Hopefully I've worked that out correctly, and sombody could tell me whether . Its' awkward because this experiment is unprecedented at my uni, it was offered up by a teacher as a topic but then got dropped due to lack of interest. As it's the first time, myself and my supervisor were in many ways flying blind. If I remove the Lineage main effect term, and include it as a random effect, leaving only group as a fixed effect:
> anova(model2)
numDF denDF F-value p-value (Intercept) 1 270 8041.429 <.0001 Group 1 8 29.469 6e-04 I get 8DF which by the same reasoning in the above model, is 5-1=4, 4*2 = 8, so I take that as reassurance my working is correct. I'd also like to ask for opinion, on whether it would be advisable to actually remove lineage as a fixed effect, and include lineage as a random effect on the slope, rather than intersect which is what I've put all the others as. I ask this because, whilst I feel whilst lineage might seem a factor with informative levels( tha's how I first saw them), I had no way of predicting which ones would show greatest or smallest differences or how the five factor levels would interact and shape my data, in that way the factor levels are not really all that informative at all - they're just numbered as dish and disk are, and their effects may even be different within my two groups - they don't really allow any prediction in the same way a factor for different types of fertiliser would in a plant study would for example, so I'm thinking maybe it should be a random effect. Thank you very much to everyone that's replied to me and assisted me with this, it's a tough learning curve, but I do think I'm beginning to grasp how to use lme and lmer for my basic ends. Once I'm confident on the above, I'm next considering, whether to try an introduce some weighting options to see what happens to a small amount of heterscedacity I have between the two groups. Ben W.
I've used lmer and the function in languageR to extract p-values without it even mentioning df. Now if the lmer method with pvals.fnc() makes it so as I don't have to worry about these df then in a way it makes my issue a bit redundant. But it is playing on my mind a bit so felt I should ask. My second question is about when I do the equivalent model using lmer: "lmer(Diameter~Group*Lineage+(1|Dish)+(1|Disk), data=Dataset)" - which I'm sure does the same because all my plots of residuals against fitted and such are the same, if I define it with the poisson family, which uses log, then I get a much lower AIC of about 45, compared to over 1000 without family defined, which I think defaults to gaussian/normal.
I don't think you should try to pick the family on the basis of AIC -- you should pick it on the basis of the qualitative nature of the data. If you have count data, you should probably use Poisson (but you may want to add an observation-level random effect to allow for overdispersion.) If your response variable is Diameter, it is **not** a count variable, and you shouldn't use Poisson -- you should use an appropriately transformed response variable.
I've tried transforming my response variable in a few ways, like natural log, sqrt, and (x/1) but they don't really seem to alter the distribution or shape of my data at all. Interestingly, if I look at the spread of the data by splitting the response variable between the two groups, I see much more symmetry - although still not a nice neat normal distribution, but in Biology I've been taught never to expect one.
And
my diagnostic plots still give me all the same patters, but just looking a bit different because of the family distribution specified. I then did a model logging the response variable by using log(Diameter), again, I get the same diagnostic plot patterns, but on a different scale, and I get an AIC of - 795.6. Now normally I'd go for the model with the lowest AIC, however, I've never observed this beahviour before, and can't help but think thhat the shift from a posotive 1000+ AIC to a negative one is due to the fact the data has been logged, rather than that the model fitted to log data in this way is genuinley better. Finally, I saw in a text, an example of using lmer but "Recoding Factor Levels" like: lineage<-Group:Lineage dish<-Group:Lineage:Dish disk<-Group:Lineage:Dish:Disk model<-lmer(Diameter~Group+(1|lineage)+(1|dish)+(1|disk) However I don't see why this should need to be done, considering, the study was hieracheal, just like all other examples in that chapter, and it does not give a reason why, but says it does the same job as a nested anova, which I though mixed models did anyway.
(1|lineage)+(1|dish)+(1|disk) is the same as (1|Lineage/Dish/Disk) (1|Dish) + (1|Disk) is **not** the same as (1|Dish/Disk), if Disk is not labeled uniquely (i.e. if Dishes are A, B, C, .. and Disks are 1, 2, 3, ... then you need Dish/Disk. If you have labeled Disks A1, A2, ... B1, B2, ... then the specifications are equivalent. For a linear mixed model (i.e. not Poisson counts) you should be able to run the same model in lmer and lme and get extremely similar results.
Hopefully sombody can shed light on my concerns. In terms of my work and university, I could include what I've done here and be as transparrant as possible and discuss these issues, because log() of the data or defining a distribution in the model is leading to the same plots and conclusions. But I'd like to make sure I come to term with what's actually happening here. A million thanks, Ben W. lme14<- lme(Diameter~Group*Lineage,random=~1|Dish/Disk, data=Dataset, method="REML")
anova(lme14):
numDF denDF F-value p-value (Intercept) 1 342 16538.253<.0001 Group 1 342 260.793<.0001 Lineage 4 342 8.226<.0001 Group:Lineage 4 342 9.473<.0001
summary(lme14)
Linear mixed-effects model fit by REML
Data: Dataset
AIC BIC logLik
1148.317 1198.470 -561.1587
Random effects:
Formula: ~1 | Dish
(Intercept)
StdDev: 0.1887527
Formula: ~1 | Disk %in% Dish
(Intercept) Residual
StdDev: 6.303059e-05 1.137701
Fixed effects: Diameter ~ Group * Lineage
Value Std.Error DF t-value
p-value
(Intercept) 15.049722 0.2187016 342 68.81396
0.0000
Group[T.NEDettol] 0.980556 0.2681586 342 3.65662
0.0003
Lineage[T.First] -0.116389 0.2681586 342 -0.43403
0.6645
Lineage[T.Fourth] -0.038056 0.2681586 342 -0.14191
0.8872
Lineage[T.Second] -0.177500 0.2681586 342 -0.66192
0.5085
Lineage[T.Third] 0.221111 0.2681586 342 0.82455
0.4102
Group[T.NEDettol]:Lineage[T.First] 2.275000 0.3792336 342 5.99894
0.0000
Group[T.NEDettol]:Lineage[T.Fourth] 0.955556 0.3792336 342 2.51970
0.0122
Group[T.NEDettol]:Lineage[T.Second] 0.828333 0.3792336 342 2.18423
0.0296
Group[T.NEDettol]:Lineage[T.Third] 0.721667 0.3792336 342 1.90296
0.0579
Correlation:
(Intr) Gr[T.NED] Lng[T.Frs]
Lng[T.Frt]
Group[T.NEDettol] -0.613
Lineage[T.First] -0.613 0.500
Lineage[T.Fourth] -0.613 0.500 0.500
Lineage[T.Second] -0.613 0.500 0.500 0.500
Lineage[T.Third] -0.613 0.500 0.500 0.500
Group[T.NEDettol]:Lineage[T.First] 0.434 -0.707 -0.707 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] 0.434 -0.707 -0.354 -0.707
Group[T.NEDettol]:Lineage[T.Second] 0.434 -0.707 -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Third] 0.434 -0.707 -0.354 -0.354
L[T.S] L[T.T]
Grp[T.NEDttl]:Lng[T.Frs]
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third] 0.500
Group[T.NEDettol]:Lineage[T.First] -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] -0.354 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Second] -0.707 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Third] -0.354 -0.707 0.500
Grp[T.NEDttl]:Lng[T.Frt]
G[T.NED]:L[T.S
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third]
Group[T.NEDettol]:Lineage[T.First]
Group[T.NEDettol]:Lineage[T.Fourth]
Group[T.NEDettol]:Lineage[T.Second] 0.500
Group[T.NEDettol]:Lineage[T.Third] 0.500 0.500
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.47467771 -0.75133489 0.06697157 0.67851126 3.27449064
Number of Observations: 360
Number of Groups:
Dish Disk %in% Dish
3 9
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
On 11-03-17 06:16 AM, i white wrote:
Ben Suppose you calculated an average response for each of the 30 plates in your experiment, and calculated a standard two-way anova as follows: Source of variation DF Groups 1 Lines 4 Groups x lines 4 Residual 20 The F-tests from this anova should agree with the Wald tests from lmer. The residual is based on variation between plates within lines and groups. If I understand the design correctly, the other sources of variation (between disks in plates, between readings within disks) may be of interest but do not feature individually in the testing of groups and lines. When data are balanced, an anova can clarify some of the obscurities of mixed model fitting. Is this a controversial observation on this list?
I don't disagree. I'm glad that light seems to be appearing at the end of the tunnel for the original poster. I would also say following Murtaugh 2007 (who I quote often here) that I think that thinking of the subsampling (disks/plates) as being a method for increasing precision of measurements, and averaging the values, has advantages in terms of (1) simplifying the analysis (and thus lowering the chances of mistakes/increasing the chance of detecting them) (2) bringing non-normal sample distributions closer to normality by averaging. (This doesn't work for randomized block designs, or GLMMs, or cases where the variation at lower levels of nesting is of intrinsic interest.) Lineage definitely makes more sense to me as a random effect, although there is almost always wiggle room within these definitions ... Murtaugh, Paul A. 2007. ?Simplicity and Complexity in Ecological Data Analysis.? Ecology 88 (1): 56-62. http://www.esajournals.org/doi/abs/10.1890/0012-9658%282007%2988%5B56%3ASACIED%5D2.0.CO%3B2.
Ben Ward wrote:
On 16/03/2011 15:08, Ben Bolker wrote:
On 11-03-16 03:52 AM, Ben Ward wrote:
Hi, I'm using lme and lmer in my dissertation and it's the first time I've used these methods. Taking into account replies from my previous query I decided to go through with a model simplification, and then try to validate the models in various ways and come up with the best one to include in my work, be it a linear mixed effects model or general linear effects model, with log() data or not etc - interestingly it does not seems like doing transofrmations and such makes much difference so far, looking at changes in diagnostic plots and AIC.
Be careful about comparing fits of transformed and non-transformed data via AIC/log-likelihood: e.g. see <http://www.unc.edu/courses/2006spring/ecol/145/001/docs/lectures/lecture18.htm>. (This does *not* refer to the link function, e.g. the log link of the Poisson, but to the case where you transform your data prior to analysis.)
Anywho, I simplified to the model using lme (I've pasted it at the bottom). And looking at the anova output the numDF looks right. However I'm concerned about the 342 df in the denDF in anova() and in the summary() output, as it seems to high to me, because at the observation level is too high and pseudoreplicated; 4 readings per disk, 3 disks, per plate, 3 plates per lineage, 5 lineages per group, 2 groups so: 4*3*3*5*2=360. If I take this to disk level 3*3*5*2=90, and at dish level it's 3*5*2=30 degrees of freedom for error. And either dish or disk (arguments for both) is the level at which one independant point of datum is obtained, most probably Dish. So I'm wondering if either I'de done something wrong, or I'm not understanding how df are presented and used in mixed models. It's not really explained in my texts, and my lecturer told me I'm working at the edge of his personal/professional experience.
At what level are Group and Lineage replicated in the model? Do you have different Groups or Lineages represented on the same disk, dish, or plate? If you do have multiple Groups and Lineages present at the lowest level of replication, then you have a randomized block design and the degrees of freedom may be higher than you think. If you really want denominator degrees of freedom and you want them correct, consult an experimental design book and figure out what they should be in the classical framework ...
I'm now of the opinion that - (Just trying to get my head around it) - that I don't have a randomized block design: I've done a bit like a lenski evolution experiment with my microbes, which involed two groups, in those two groups i have 5 cultures each, one group is 5 lineages of bacteria I have been evolving against some antimicrobial, the other group have not been through this - they are stock run of the mill organisms. So with those 5 cultures of evolved bacteria, for each, I'd take some, and spread it on three plates - so theres no intermingling or randomization/mixing of the cultures: each gets plated onto a who plate itself three times. Then the three disks, loaded with antimicrobial were loaded onto each plate, and they were incubated, and then I took 4 measurements from each zone that formed around those disks. The disks all have the same antimicrobial on them. So in that way, if what you say by randomized block design is something like a split plot experiment, where there are several plots, and numerous plants, and each one got a different treatment, then I don't believe my experiment is like that. In my case that would be like me having different cultures on the same dish, or using disks with different antimicrobials on, at least I think this is what you're asking. In which case Dish is the level at which I get truly indepentent pieces of data, and 3plates*5lineages*2Groups=30: If I recode my factor levels then, like so, which I mentioned before as a possibility: Diameter<-Dataset$Diameter Group<-factor(Dataset$Group) Lineage<-factor(Dataset$Lineage) Dish<-factor(Dataset$Dish) Disk<-factor(Dataset$Disk) lineage<-Group:Lineage dish<-Group:Lineage:Dish disk<-Group:Lineage:Dish:Disk And then fit the model: model <- lme(Diameter~Group*Lineage,random=~1|dish/disk, method="REML") I get the following:
> summary(model)
Linear mixed-effects model fit by REML
Data: NULL
AIC BIC logLik
1144.193 1194.346 -559.0966
Random effects:
Formula: ~1 | dish
(Intercept)
StdDev: 0.2334716
Formula: ~1 | disk %in% dish
(Intercept) Residual
StdDev: 0.356117 1.079568
Fixed effects: Diameter ~ Group * Lineage
Value Std.Error DF t-value
p-value
(Intercept) 15.049722 0.2542337 270 59.19641
0.0000
Group[T.NEDettol] 0.980556 0.3595407 20 2.72724
0.0130
Lineage[T.First] -0.116389 0.3595407 20 -0.32372
0.7495
Lineage[T.Fourth] -0.038056 0.3595407 20 -0.10584
0.9168
Lineage[T.Second] -0.177500 0.3595407 20 -0.49369
0.6269
Lineage[T.Third] 0.221111 0.3595407 20 0.61498
0.5455
Group[T.NEDettol]:Lineage[T.First] 2.275000 0.5084674 20 4.47423
0.0002
Group[T.NEDettol]:Lineage[T.Fourth] 0.955556 0.5084674 20 1.87929
0.0749
Group[T.NEDettol]:Lineage[T.Second] 0.828333 0.5084674 20 1.62908
0.1189
Group[T.NEDettol]:Lineage[T.Third] 0.721667 0.5084674 20 1.41930
0.1712
Correlation:
(Intr) Gr[T.NED] Lng[T.Frs]
Lng[T.Frt]
Group[T.NEDettol] -0.707
Lineage[T.First] -0.707 0.500
Lineage[T.Fourth] -0.707 0.500 0.500
Lineage[T.Second] -0.707 0.500 0.500 0.500
Lineage[T.Third] -0.707 0.500 0.500 0.500
Group[T.NEDettol]:Lineage[T.First] 0.500 -0.707 -0.707 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] 0.500 -0.707 -0.354 -0.707
Group[T.NEDettol]:Lineage[T.Second] 0.500 -0.707 -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Third] 0.500 -0.707 -0.354 -0.354
L[T.S] L[T.T]
Grp[T.NEDttl]:Lng[T.Frs]
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third] 0.500
Group[T.NEDettol]:Lineage[T.First] -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] -0.354 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Second] -0.707 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Third] -0.354 -0.707 0.500
Grp[T.NEDttl]:Lng[T.Frt]
G[T.NED]:L[T.S
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third]
Group[T.NEDettol]:Lineage[T.First]
Group[T.NEDettol]:Lineage[T.Fourth]
Group[T.NEDettol]:Lineage[T.Second] 0.500
Group[T.NEDettol]:Lineage[T.Third] 0.500 0.500
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.26060119 -0.70948250 0.03630884 0.69899536 3.42475990
Number of Observations: 360
Number of Groups:
dish disk %in% dish
30 90
> anova(model)
numDF denDF F-value p-value (Intercept) 1 270 39586.82 <.0001 Group 1 20 145.07 <.0001 Lineage 4 20 4.58 0.0087 Group:Lineage 4 20 5.27 0.0046 This is closer to what I was expecting in terms of DF: 3 plates*5 lineages=15: 15 samples per group, 15-4(the numDF Lineage)=11, 11-1(the numDF for Group)= 10 x 2 for the two groups/treatments = 20. Hopefully I've worked that out correctly, and sombody could tell me whether . Its' awkward because this experiment is unprecedented at my uni, it was offered up by a teacher as a topic but then got dropped due to lack of interest. As it's the first time, myself and my supervisor were in many ways flying blind. If I remove the Lineage main effect term, and include it as a random effect, leaving only group as a fixed effect:
> anova(model2)
numDF denDF F-value p-value (Intercept) 1 270 8041.429 <.0001 Group 1 8 29.469 6e-04 I get 8DF which by the same reasoning in the above model, is 5-1=4, 4*2 = 8, so I take that as reassurance my working is correct. I'd also like to ask for opinion, on whether it would be advisable to actually remove lineage as a fixed effect, and include lineage as a random effect on the slope, rather than intersect which is what I've put all the others as. I ask this because, whilst I feel whilst lineage might seem a factor with informative levels( tha's how I first saw them), I had no way of predicting which ones would show greatest or smallest differences or how the five factor levels would interact and shape my data, in that way the factor levels are not really all that informative at all - they're just numbered as dish and disk are, and their effects may even be different within my two groups - they don't really allow any prediction in the same way a factor for different types of fertiliser would in a plant study would for example, so I'm thinking maybe it should be a random effect. Thank you very much to everyone that's replied to me and assisted me with this, it's a tough learning curve, but I do think I'm beginning to grasp how to use lme and lmer for my basic ends. Once I'm confident on the above, I'm next considering, whether to try an introduce some weighting options to see what happens to a small amount of heterscedacity I have between the two groups. Ben W.
I've used lmer and the function in languageR to extract p-values without it even mentioning df. Now if the lmer method with pvals.fnc() makes it so as I don't have to worry about these df then in a way it makes my issue a bit redundant. But it is playing on my mind a bit so felt I should ask. My second question is about when I do the equivalent model using lmer: "lmer(Diameter~Group*Lineage+(1|Dish)+(1|Disk), data=Dataset)" - which I'm sure does the same because all my plots of residuals against fitted and such are the same, if I define it with the poisson family, which uses log, then I get a much lower AIC of about 45, compared to over 1000 without family defined, which I think defaults to gaussian/normal.
I don't think you should try to pick the family on the basis of AIC -- you should pick it on the basis of the qualitative nature of the data. If you have count data, you should probably use Poisson (but you may want to add an observation-level random effect to allow for overdispersion.) If your response variable is Diameter, it is **not** a count variable, and you shouldn't use Poisson -- you should use an appropriately transformed response variable.
I've tried transforming my response variable in a few ways, like natural log, sqrt, and (x/1) but they don't really seem to alter the distribution or shape of my data at all. Interestingly, if I look at the spread of the data by splitting the response variable between the two groups, I see much more symmetry - although still not a nice neat normal distribution, but in Biology I've been taught never to expect one.
And
my diagnostic plots still give me all the same patters, but just looking a bit different because of the family distribution specified. I then did a model logging the response variable by using log(Diameter), again, I get the same diagnostic plot patterns, but on a different scale, and I get an AIC of - 795.6. Now normally I'd go for the model with the lowest AIC, however, I've never observed this beahviour before, and can't help but think thhat the shift from a posotive 1000+ AIC to a negative one is due to the fact the data has been logged, rather than that the model fitted to log data in this way is genuinley better. Finally, I saw in a text, an example of using lmer but "Recoding Factor Levels" like: lineage<-Group:Lineage dish<-Group:Lineage:Dish disk<-Group:Lineage:Dish:Disk model<-lmer(Diameter~Group+(1|lineage)+(1|dish)+(1|disk) However I don't see why this should need to be done, considering, the study was hieracheal, just like all other examples in that chapter, and it does not give a reason why, but says it does the same job as a nested anova, which I though mixed models did anyway.
(1|lineage)+(1|dish)+(1|disk) is the same as (1|Lineage/Dish/Disk) (1|Dish) + (1|Disk) is **not** the same as (1|Dish/Disk), if Disk is not labeled uniquely (i.e. if Dishes are A, B, C, .. and Disks are 1, 2, 3, ... then you need Dish/Disk. If you have labeled Disks A1, A2, ... B1, B2, ... then the specifications are equivalent. For a linear mixed model (i.e. not Poisson counts) you should be able to run the same model in lmer and lme and get extremely similar results.
Hopefully sombody can shed light on my concerns. In terms of my work and university, I could include what I've done here and be as transparrant as possible and discuss these issues, because log() of the data or defining a distribution in the model is leading to the same plots and conclusions. But I'd like to make sure I come to term with what's actually happening here. A million thanks, Ben W. lme14<- lme(Diameter~Group*Lineage,random=~1|Dish/Disk, data=Dataset, method="REML")
anova(lme14):
numDF denDF F-value p-value (Intercept) 1 342 16538.253<.0001 Group 1 342 260.793<.0001 Lineage 4 342 8.226<.0001 Group:Lineage 4 342 9.473<.0001
summary(lme14)
Linear mixed-effects model fit by REML
Data: Dataset
AIC BIC logLik
1148.317 1198.470 -561.1587
Random effects:
Formula: ~1 | Dish
(Intercept)
StdDev: 0.1887527
Formula: ~1 | Disk %in% Dish
(Intercept) Residual
StdDev: 6.303059e-05 1.137701
Fixed effects: Diameter ~ Group * Lineage
Value Std.Error DF t-value
p-value
(Intercept) 15.049722 0.2187016 342 68.81396
0.0000
Group[T.NEDettol] 0.980556 0.2681586 342 3.65662
0.0003
Lineage[T.First] -0.116389 0.2681586 342 -0.43403
0.6645
Lineage[T.Fourth] -0.038056 0.2681586 342 -0.14191
0.8872
Lineage[T.Second] -0.177500 0.2681586 342 -0.66192
0.5085
Lineage[T.Third] 0.221111 0.2681586 342 0.82455
0.4102
Group[T.NEDettol]:Lineage[T.First] 2.275000 0.3792336 342 5.99894
0.0000
Group[T.NEDettol]:Lineage[T.Fourth] 0.955556 0.3792336 342 2.51970
0.0122
Group[T.NEDettol]:Lineage[T.Second] 0.828333 0.3792336 342 2.18423
0.0296
Group[T.NEDettol]:Lineage[T.Third] 0.721667 0.3792336 342 1.90296
0.0579
Correlation:
(Intr) Gr[T.NED] Lng[T.Frs]
Lng[T.Frt]
Group[T.NEDettol] -0.613
Lineage[T.First] -0.613 0.500
Lineage[T.Fourth] -0.613 0.500 0.500
Lineage[T.Second] -0.613 0.500 0.500 0.500
Lineage[T.Third] -0.613 0.500 0.500 0.500
Group[T.NEDettol]:Lineage[T.First] 0.434 -0.707 -0.707 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] 0.434 -0.707 -0.354 -0.707
Group[T.NEDettol]:Lineage[T.Second] 0.434 -0.707 -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Third] 0.434 -0.707 -0.354 -0.354
L[T.S] L[T.T]
Grp[T.NEDttl]:Lng[T.Frs]
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third] 0.500
Group[T.NEDettol]:Lineage[T.First] -0.354 -0.354
Group[T.NEDettol]:Lineage[T.Fourth] -0.354 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Second] -0.707 -0.354 0.500
Group[T.NEDettol]:Lineage[T.Third] -0.354 -0.707 0.500
Grp[T.NEDttl]:Lng[T.Frt]
G[T.NED]:L[T.S
Group[T.NEDettol]
Lineage[T.First]
Lineage[T.Fourth]
Lineage[T.Second]
Lineage[T.Third]
Group[T.NEDettol]:Lineage[T.First]
Group[T.NEDettol]:Lineage[T.Fourth]
Group[T.NEDettol]:Lineage[T.Second] 0.500
Group[T.NEDettol]:Lineage[T.Third] 0.500 0.500
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.47467771 -0.75133489 0.06697157 0.67851126 3.27449064
Number of Observations: 360
Number of Groups:
Dish Disk %in% Dish
3 9
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models