lm equivalent of Welch-corrected t-test?
On 13 Nov 2018, at 16:19 , Paul Johnson <pauljohn32 at gmail.com> wrote: Long ago, when R's t.test had var.equal=TRUE by default, I wrote some class notes showing that the result was equivalent to a one predictor regression model. Because t.test does not default to var.equal=TRUE these days, I'm curious to know if there is a way to specify weights in an lm to obtain the same result as the Welch-adjusted values reported by t.test at the current time. Is there a WLS equivalent adjustment with lm?
The short answer is no. The long answer is to look into heteroscedasticity adjustments or lmer combined with pbkrtest.
Well, you can do the weights in lm() and get the right t statistic, but not the Satterthwaite oddball-df thing (the sleep data are actually paired, but ignore that here)
variances <- tapply(sleep$extra, sleep$group, var)
sleep$wt <- 1/variances[sleep$group]
t.test(extra~group, sleep)
summary(lm(extra~factor(group), weight=wt, sleep))
The pbkrtest approach goes like this:
library(lme4)
library(pbkrtest)
sleep$gdummy <- sleep$group-1 # arrange that this is 1 in the group with larger variance
fit1 <- lmer(extra ~ group + (gdummy+0 | ID), sleep)
fit0 <- lmer(extra ~ 1 + (gdummy+0 | ID), sleep)
KRmodcomp(fit0, fit1)
(This somewhat abuses the existing sleep$ID -- all you really need is something that has a level for each record where gdummy==1)
This ends up with
stat ndf ddf F.scaling p.value
Ftest 3.4626 1.0000 17.7765 1 0.07939 .
to be compared with the Welch test
t = -1.8608, df = 17.776, p-value = 0.07939
-pd
Here's example code to show that lm is same as t.test with var.equal.
The signs come out differently, but otherwise the effect estimate,
standard error, t value are same:
set.seed(234234)
dat <- data.frame(x = gl(2, 50, labels = c("F", "M")))
dat$err <- rnorm(100, 0, 1)
dat$y <- ifelse(dat$x == "F", 40 + dat$err, 44 + dat$err)
m1 <- lm(y ~ x, dat)
summary(m1)
m1.t <- t.test(y ~ x, dat, var.equal = TRUE)
m1.t
## diff matches regression coefficient
(m1.t.effect <- diff(m1.t$estimate))
## standard error matches regression se
m1.t.stderr <- m1.t.effect/m1.t$statistic
If you run that, you see lm output:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 39.9456 0.1180 338.65 <2e-16 ***
xM 3.9080 0.1668 23.43 <2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 0.8341 on 98 degrees of freedom
Multiple R-squared: 0.8485, Adjusted R-squared: 0.8469
F-statistic: 548.8 on 1 and 98 DF, p-value: < 2.2e-16
and t.test:
m1.t <- t.test(y ~ x, dat, var.equal = TRUE) m1.t
Two Sample t-test
data: y by x
t = -23.427, df = 98, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-4.239038 -3.576968
sample estimates:
mean in group F mean in group M
39.94558 43.85358
(m1.t.effect <- diff(m1.t$estimate))
mean in group M
3.908003
m1.t.effect/m1.t$statistic
mean in group M
-0.1668129
--
Paul E. Johnson http://pj.freefaculty.org
Director, Center for Research Methods and Data Analysis http://crmda.ku.edu
To write to me directly, please address me at pauljohn at ku.edu.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com