Skip to content

lme/anova vs paired t test query

4 messages · Ben Bolker, Gang Chen, Robert Kushler

#
It was my understanding that testing a two-level fixed effect in a
one-way randomized block ANOVA using REML should be identical to a
paired t test ... but, at least with the attempt below, using
lme(...,method="REML") and t.test(...,paired=TRUE,var.equal=TRUE) seems
to give slightly different results.  They're not different in any
important way, but the difference bothers me.  Can anyone see what I'm
missing?

  Ben Bolker

dat <- structure(list(sex = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("f",
"m"), class = "factor"), prevalence = c(0, 0.375, 0.133333333333333,
0.176470588235294, 0.1875, 0, 0, 1, 1, 0.5, 0.6, 0.333333333333333,
0.5, 0, 0.333333333333333, 0, 0.5, 0, 0.625, 0.333333333333333,
0.5, 0, 0.333333333333333, 0.153846153846154, 0.222222222222222,
0.5, 1, 0.5, 0, 0.277777777777778, 0.125, 0, 0, 0.428571428571429,
0.451612903225806, 0.362068965517241), tripsite = structure(c(1L,
1L, 4L, 4L, 14L, 14L, 5L, 5L, 8L, 8L, 15L, 15L, 6L, 6L, 9L, 9L,
11L, 11L, 16L, 16L, 2L, 2L, 7L, 7L, 10L, 10L, 13L, 13L, 17L,
17L, 3L, 3L, 12L, 12L, 18L, 18L), .Label = c("1.2", "4.2", "5.2",
"1.3", "2.3", "3.3", "4.3", "2.4", "3.4", "4.4", "3.5", "5.5",
"4.6", "1.9", "2.9", "3.9", "4.9", "5.9"), class = "factor")), .Names =
c("sex",
"prevalence", "tripsite"), row.names = c(1L, 2L, 3L, 4L, 9L,
10L, 11L, 12L, 13L, 14L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L,
27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L, 36L, 38L, 39L, 40L,
41L, 42L, 43L, 45L, 46L), class = "data.frame")


t0 <-
with(dat,t.test(prevalence[sex=="f"],prevalence[sex=="m"],paired=TRUE,var.equal=TRUE))

library(nlme)

(a1 <- anova(lme(prevalence~sex,random=~1|tripsite,data=dat,method="REML")))

(fstat0 <- t0$statistic^2)     ## 0.789627
(fstat1 <- a1[["F-value"]][2]) ## 0.8056624

tmpf <- function(dat) {
  t0 <-
with(dat,t.test(prevalence[sex=="f"],prevalence[sex=="m"],paired=TRUE,var.equal=TRUE))
  (a1 <-
anova(lme(prevalence~sex,random=~1|tripsite,data=dat,method="REML")))
  (fstat0 <- t0$statistic^2)     ## 0.789627
  (fstat1 <- a1[["F-value"]][2]) ## 0.8056624
  fstat0-fstat1
}

  If I then redo this with random data, the results are within 1e-5
set.seed(1001)
rr <- replicate(500,tmpf(transform(dat,prevalence=rnorm(nrow(dat)))))
##
##  summary(rr)
##      Min.    1st Qu.     Median       Mean    3rd Qu.       Max.
## -1.7770000 -0.0568400 -0.0000015 -0.0722900  0.0000000  0.0000139

mean(abs(rr)<1e-5)
## 0.502

 hist(log10(abs(rr)),breaks=50,col="gray")

## this is sort of interesting, suggesting a bimodal distribution with
one component (the 'good ones' centered at 10^{-8} and another centered
at 0.1 ...
#
I believe I know the answer for this one because I used to struggle
with this for quite a while. Eventually I figured it out.

To match up with the paired t-test, you need

(a2 <- anova(lme(prevalence~sex,random=list(tripsite=pdCompSymm(~sex-1)),data=dat,method="REML")))
(fstat2 <- a2[["F-value"]][2]) # you get 0.789627

The catch here is that the compound-symmetry structure for the
variance-covariance structure of the random components in the lme
model is equivalent to a paired t-test, a special of the traditional
repeated-meaures one-way ANOVA.

Gang
On Mon, May 23, 2011 at 2:55 PM, Ben Bolker <bbolker at gmail.com> wrote:
#
Great, very clear!  Thanks.

  Ben
On 05/23/2011 03:40 PM, Gang Chen wrote:
#
Ben,

The "good ones" in your simulation correspond to cases where the classical
"ANOVA" estimate of the block variance component is greater than zero (i.e.
the F statistic for "tripsite" in the two way fixed effects anova is greater
than 1).

 > tmpf <- function(dat) {
+   t0 <- with(dat,t.test(prevalence[sex=="f"],prevalence[sex=="m"],paired=TRUE,var.equal=TRUE))
+   a1 <- anova(lme(prevalence~sex,random=~1|tripsite,data=dat,method="REML"))
+   a2 <- anova(lm(prevalence~sex+tripsite,data=dat))
+   fstat0 <- t0$statistic^2     ## 0.789627
+   fstat1 <- a1[["F-value"]][2] ## 0.8056624
+   fstat2 <- a2[["F value"]][2]
+   c(fstat0,fstat1,fstat2)
+ }
 >
 > set.seed(1001)
 > rr <- unlist(replicate(500,tmpf(transform(dat,prevalence=rnorm(nrow(dat))))))
 > table((abs(rr[1,]-rr[2,])<0.00002),(rr[3,]>=1))

         FALSE TRUE
   FALSE   246    0
   TRUE      3  251



Regards,   Rob Kushler
On 5/23/2011 4:10 PM, Ben Bolker wrote: