An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111101/36c4eb70/attachment.pl>
Discrepancy with p.value from t.test
3 messages · Jonathan Thayn, Peter Langfelder, Albyn Jones
On Tue, Nov 1, 2011 at 12:40 PM, Jonathan Thayn <jthayn at ilstu.edu> wrote:
Sometimes the p.value returned by t.test() is the same that I calculate using pt() and sometimes it's not. I don't understand the difference. I'm sure there is a simple explanation but I haven't been able to find it, even after looking at the code for t.test.default. I apologize if this is a basic and obvious question. For example:
data(sleep) t.test(extra~group,data=sleep,var.equal=T)
# the p.value returned is 0.07939
2*pt(-1.8608,18) ? # using the t.statistic and the df returned above
[1] 0.0791887 These p.values are the same. However, they are different when I use a different dataset:
data(beavers) b1 <- beaver1$temp b2 <- beaver2$temp t.test(b1,b2,var.equal=T)
# the p.value returned is 2.2e-16
2*pt(-15.9366,212) ? # using the t.statistic and the df returned above
[1] 4.10686e-38
If you read the output of t.test carefully, you will find something like p-value < 2.2e-16 not p-value = 2.2e-16 so the results are not inconsistent. Not sure why t.test is coded that way, perhaps the p-value calculation is not very reliable below roughly 2e-16. This issue could also come up if the function doesn't use lower/upper tail of the distribution function as needed and then must subtract the calculated results from 1 to obtain the returned value. Here's an example:
x = rnorm(100) y = x<0 t.test(x~y)
Welch Two Sample t-test
data: x by y
t = 12.9463, df = 97.424, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
1.397253 1.903200
sample estimates:
mean in group FALSE mean in group TRUE
0.7596083 -0.8906181
Now do a naive pt:
pt(12.9463, df = 97.424)
[1] 1 my desired p-value is 1-pt(12.9463, df = 97.424) but that's zero. Of course, I can get the p-value in a more intelligent way,
pt(12.9463, df = 97.424, lower.tail = FALSE)
[1] 3.394337e-23 Peter
The print method is the issue:
t.out <- t.test(b1,b2,var.equal=T) t.out$p.value
[1] 4.108001e-38
t.out$statistic
t -15.93656 albyn
On Tue, Nov 01, 2011 at 02:40:15PM -0500, Jonathan Thayn wrote:
Sometimes the p.value returned by t.test() is the same that I calculate using pt() and sometimes it's not. I don't understand the difference. I'm sure there is a simple explanation but I haven't been able to find it, even after looking at the code for t.test.default. I apologize if this is a basic and obvious question. For example:
data(sleep) t.test(extra~group,data=sleep,var.equal=T)
# the p.value returned is 0.07939
2*pt(-1.8608,18) # using the t.statistic and the df returned above
[1] 0.0791887 These p.values are the same. However, they are different when I use a different dataset:
data(beavers) b1 <- beaver1$temp b2 <- beaver2$temp t.test(b1,b2,var.equal=T)
# the p.value returned is 2.2e-16
2*pt(-15.9366,212) # using the t.statistic and the df returned above
[1] 4.10686e-38 Jonathan B. Thayn, Ph.D. Illinois State University Department of Geography and Geology 200A Felmley Hall Normal, Illinois 61790 (309) 438-8112 jthayn at ilstu.edu my.ilstu.edu/~jthayn [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Albyn Jones Reed College jones at reed.edu