Hi, I have a question about prop.test in R: I teach students the score confidence interval for proportions (also called Wilson or Wilson score interval). prop.test(,..., correct=FALSE) gives this interval. The default uses a continuity correction. When should we use one over the other? Is it worth going over this in class? Why is correct=TRUE the default? Thanks for any pedagogical guidance here! -- Laura ******************************************* Laura Chihara Professor of Mathematics 507-222-4065 (office) Dept of Mathematics 507-222-4312 (fax) Carleton College 1 North College Street Northfield MN 55057
prop.test in R
10 messages · Ralph O'Brien, PhD, Laura Chihara, Albyn Jones +3 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-teaching/attachments/20101025/082cebc8/attachment.pl>
Yes, thank you for this reference. But according to this article, the score is better than continuity correction, so why is continuity correction the default with prop.test? -Laura
On 10/25/2010 4:02 PM, Ralph O'Brien, PhD wrote:
I suggest:
A. Agresti and B. A. Coull. Approximate is better than ?exact? for
interval estimation of binomial proportions. The American Statistician,
52(2):119?126, 1998.
On Mon, Oct 25, 2010 at 4:38 PM, Laura Chihara <lchihara at carleton.edu
<mailto:lchihara at carleton.edu>> wrote:
Hi,
I have a question about prop.test in R:
I teach students the score confidence
interval for proportions (also called
Wilson or Wilson score interval).
prop.test(,..., correct=FALSE) gives this
interval.
The default uses a continuity correction.
When should we use one over the other?
Is it worth going over this in class? Why
is correct=TRUE the default?
Thanks for any pedagogical guidance here!
-- Laura
*******************************************
Laura Chihara
Professor of Mathematics 507-222-4065 (office)
Dept of Mathematics 507-222-4312 (fax)
Carleton College
1 North College Street
Northfield MN 55057
_______________________________________________
R-sig-teaching at r-project.org <mailto:R-sig-teaching at r-project.org>
mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
--
Ralph O'Brien, PhD
Professor, Dept of Epidemiology and Biostatistics
Case Western Reserve University
Office: 216.368.1927
Cell: 216.312.3203
******************************************* Laura Chihara Professor of Mathematics 507-222-4065 (office) Dept of Mathematics 507-222-4312 (fax) Carleton College 1 North College Street Northfield MN 55057
Laura, I would make the argument that continuity correction should not be used in practice, or in the classroom. Continuity corrected intervals are, on average, to wide. This might be defensible if they guaranteed their coverage level (as 'exact' distribution based intervals do), but due to the fact that they are asymptotic, they may have coverage less than the nominal level. The Agresti reference Ralph sent is an excellent article. I highly recommend it. I find it helpful to categorize discrete tests on two axes. conservative vs. approximate, and asymptotic vs distribution based. Conservative tests attempt to keep type 1 error less than the nominal level, and approximate tests attempt to keep the error near its nominal level. Asymptotic Distribution based Conservative Continuity Corrected Standard 'Exact' test Approximate Standard Asymptotic Mid p-value I would also be interested to hear why the default it correct=TRUE. Perhaps it is historical. Ian
On Oct 25, 2010, at 1:38 PM, Laura Chihara wrote:
Hi, I have a question about prop.test in R: I teach students the score confidence interval for proportions (also called Wilson or Wilson score interval). prop.test(,..., correct=FALSE) gives this interval. The default uses a continuity correction. When should we use one over the other? Is it worth going over this in class? Why is correct=TRUE the default? Thanks for any pedagogical guidance here! -- Laura ******************************************* Laura Chihara Professor of Mathematics 507-222-4065 (office) Dept of Mathematics 507-222-4312 (fax) Carleton College 1 North College Street Northfield MN 55057
_______________________________________________ R-sig-teaching at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
I don't know, the help file is uninformative. I'd guess the answer is "the author wrote it that way". Other R functions like t.test include similar unfortunate (to me) default choices, in that case var.equal=FALSE (ie the Welch test) is the default. albyn
On Mon, Oct 25, 2010 at 04:15:20PM -0500, Laura Chihara wrote:
Yes, thank you for this reference. But according to this article, the score is better than continuity correction, so why is continuity correction the default with prop.test? -Laura On 10/25/2010 4:02 PM, Ralph O'Brien, PhD wrote:
I suggest: A. Agresti and B. A. Coull. Approximate is better than ?exact? for interval estimation of binomial proportions. The American Statistician, 52(2):119?126, 1998. On Mon, Oct 25, 2010 at 4:38 PM, Laura Chihara <lchihara at carleton.edu <mailto:lchihara at carleton.edu>> wrote: Hi, I have a question about prop.test in R: I teach students the score confidence interval for proportions (also called Wilson or Wilson score interval). prop.test(,..., correct=FALSE) gives this interval. The default uses a continuity correction. When should we use one over the other? Is it worth going over this in class? Why is correct=TRUE the default? Thanks for any pedagogical guidance here! -- Laura ******************************************* Laura Chihara Professor of Mathematics 507-222-4065 (office) Dept of Mathematics 507-222-4312 (fax) Carleton College 1 North College Street Northfield MN 55057
_______________________________________________ R-sig-teaching at r-project.org <mailto:R-sig-teaching at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-teaching -- Ralph O'Brien, PhD Professor, Dept of Epidemiology and Biostatistics Case Western Reserve University Office: 216.368.1927 Cell: 216.312.3203
-- ******************************************* Laura Chihara Professor of Mathematics 507-222-4065 (office) Dept of Mathematics 507-222-4312 (fax) Carleton College 1 North College Street Northfield MN 55057
_______________________________________________ R-sig-teaching at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
Albyn Jones Reed College jones at reed.edu
In the case of the t.test, having the default be var.equal=TRUE is the right way to go. There is little to no power lost by using the welch test, and the assumption of equal variance can be difficult to assess. For this reason, many introductory text books have now banished the equal variance t-test from their chapters (e.g. Moore's The Basic Practice of Statistics). Ian
On Oct 25, 2010, at 4:05 PM, Albyn Jones wrote:
I don't know, the help file is uninformative. I'd guess the answer is "the author wrote it that way". Other R functions like t.test include similar unfortunate (to me) default choices, in that case var.equal=FALSE (ie the Welch test) is the default. albyn On Mon, Oct 25, 2010 at 04:15:20PM -0500, Laura Chihara wrote:
Yes, thank you for this reference. But according to this article, the score is better than continuity correction, so why is continuity correction the default with prop.test? -Laura On 10/25/2010 4:02 PM, Ralph O'Brien, PhD wrote:
I suggest: A. Agresti and B. A. Coull. Approximate is better than ?exact? for interval estimation of binomial proportions. The American Statistician, 52(2):119?126, 1998. On Mon, Oct 25, 2010 at 4:38 PM, Laura Chihara <lchihara at carleton.edu <mailto:lchihara at carleton.edu>> wrote: Hi, I have a question about prop.test in R: I teach students the score confidence interval for proportions (also called Wilson or Wilson score interval). prop.test(,..., correct=FALSE) gives this interval. The default uses a continuity correction. When should we use one over the other? Is it worth going over this in class? Why is correct=TRUE the default? Thanks for any pedagogical guidance here! -- Laura ******************************************* Laura Chihara Professor of Mathematics 507-222-4065 (office) Dept of Mathematics 507-222-4312 (fax) Carleton College 1 North College Street Northfield MN 55057
_______________________________________________ R-sig-teaching at r-project.org <mailto:R-sig-teaching at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-teaching -- Ralph O'Brien, PhD Professor, Dept of Epidemiology and Biostatistics Case Western Reserve University Office: 216.368.1927 Cell: 216.312.3203
-- ******************************************* Laura Chihara Professor of Mathematics 507-222-4065 (office) Dept of Mathematics 507-222-4312 (fax) Carleton College 1 North College Street Northfield MN 55057
_______________________________________________ R-sig-teaching at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
-- Albyn Jones Reed College jones at reed.edu
_______________________________________________ R-sig-teaching at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
Exactly - elementary texts and methods books recommend the welch test for the reason you mention. Curiously, those same texts recommend using anova and regression without automatically correcting for the possibility of non-constant variance. Why is the case of comparing two means different from 3? Those same books will tell you that anova is pretty robust to non-constant variance. well, the two sample t-test is anova. I don't use the welch test except as a conscious decision: ie I really want to compare the means while suspecting that the variances differ. Generally people are using the t test to certify that two populations are different. If the variances are wildly different, that may be much more important than a difference in means. in fact, to test for a difference in means when the variances are wildly different is almost always substantively silly. There was a great example a few years ago from a psychiatric journal, comparing two medications, where the investigators did a t-test for the means when one distribution was unimodal and the other was bi-modal; there was no statistically significant difference in the means, but there was a really important difference in the distributions. The automatic use of the welch test makes you feel that you are protected against Bad Things, when you aren't. albyn Quoting Ian Fellows <ian.fellows at stat.ucla.edu>:
In the case of the t.test, having the default be var.equal=TRUE is the right way to go. There is little to no power lost by using the welch test, and the assumption of equal variance can be difficult to assess. For this reason, many introductory text books have now banished the equal variance t-test from their chapters (e.g. Moore's The Basic Practice of Statistics). Ian On Oct 25, 2010, at 4:05 PM, Albyn Jones wrote:
I don't know, the help file is uninformative. I'd guess the answer is "the author wrote it that way". Other R functions like t.test include similar unfortunate (to me) default choices, in that case var.equal=FALSE (ie the Welch test) is the default. albyn On Mon, Oct 25, 2010 at 04:15:20PM -0500, Laura Chihara wrote:
Yes, thank you for this reference. But according to this article, the score is better than continuity correction, so why is continuity correction the default with prop.test? -Laura On 10/25/2010 4:02 PM, Ralph O'Brien, PhD wrote:
I suggest: A. Agresti and B. A. Coull. Approximate is better than ?exact? for interval estimation of binomial proportions. The American Statistician, 52(2):119?126, 1998. On Mon, Oct 25, 2010 at 4:38 PM, Laura Chihara <lchihara at carleton.edu <mailto:lchihara at carleton.edu>> wrote: Hi, I have a question about prop.test in R: I teach students the score confidence interval for proportions (also called Wilson or Wilson score interval). prop.test(,..., correct=FALSE) gives this interval. The default uses a continuity correction. When should we use one over the other? Is it worth going over this in class? Why is correct=TRUE the default? Thanks for any pedagogical guidance here! -- Laura ******************************************* Laura Chihara Professor of Mathematics 507-222-4065 (office) Dept of Mathematics 507-222-4312 (fax) Carleton College 1 North College Street Northfield MN 55057
_______________________________________________ R-sig-teaching at r-project.org <mailto:R-sig-teaching at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-teaching -- Ralph O'Brien, PhD Professor, Dept of Epidemiology and Biostatistics Case Western Reserve University Office: 216.368.1927 Cell: 216.312.3203
-- ******************************************* Laura Chihara Professor of Mathematics 507-222-4065 (office) Dept of Mathematics 507-222-4312 (fax) Carleton College 1 North College Street Northfield MN 55057
_______________________________________________ R-sig-teaching at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
-- Albyn Jones Reed College jones at reed.edu
_______________________________________________ R-sig-teaching at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-teaching/attachments/20101026/85b1c39e/attachment.pl>
On Tue, Oct 26, 2010 at 4:42 AM, Adams, Zeno <Zeno.Adams at ebs.edu> wrote:
This is an interesting discussion. Concerning the right choice and computation of t-tests there is still one point that is unclear to me: In the Welch t-test we have a difference in the numerator and a standard deviation of a difference in the denominator. Why then is the standard deviation of the difference not computed correctly, i.e. why is the covariance between X and Y not taken into account? For example using the sleep data: data(sleep) means <- tapply(sleep$extra,sleep$group,mean) ; means vars <- tapply(sleep$extra,sleep$group,var) ; vars sd.welch <- sqrt(vars[1]/10 + vars[2]/10) ; sd.welch #in sd.welch the covariance is ignored t.welch <- (means[1]-means[2])/sd.welch ; t.welch #verify with R-built-in t.test function: t.test(extra ~ group, data = sleep) However, the correlation between sleep$extra[sleep$group == 1] and sleep$extra[sleep$group == 2] is relatively high: cor(sleep$extra[sleep$group == 1],sleep$extra[sleep$group == 2]) Souldn?t the correct standard deviation be? sd.paired <- sqrt(vars[1]/10 + vars[2]/10 ? ? ? ?-2*cov(sleep$extra[sleep$group == 1],sleep$extra[sleep$group == 2])/10) ; sd.paired ?as in the paired t-test???
Only if the order of the observations in each sample is fixed. I don't want to sound facetious but the important characteristic of the samples in a paired t-test is that they are paired. The first observation in sample 1 is associated in some way with the first observation in sample 2, say because they are observations on the same subject or at the same location or ... If there is no pairing then one of the samples could be rearranged without changing the other, thereby changing the covariance. Because of the pairing the sample sizes in a paired t-test must be equal. But a t-test for independent samples can be used when the sample sizes are unequal. So, no, the t-test for independent samples is not a special case of the paired t-test.
In other words, isn?t the Welch t-test a special case of the paired t-test with both samples assumed to be uncorrelated? And shouldn?t we then teach only the paired t-test as the most general test in class? Thanks! Zeno -----Original Message----- From: r-sig-teaching-bounces at r-project.org on behalf of Albyn Jones Sent: Tue 10/26/2010 3:51 AM To: Ian Fellows Cc: r-sig-teaching at r-project.org Subject: Re: [R-sig-teaching] prop.test in R Exactly - elementary texts and methods books recommend the welch test for the reason you mention. ?Curiously, those same texts recommend using anova and regression without automatically correcting for the possibility of non-constant variance. ?Why is the case of comparing two means different from 3? ?Those same books will tell you that anova is pretty robust to non-constant variance. ?well, the two sample t-test is anova. I don't use the welch test except as a conscious decision: ie I really want to compare the means while suspecting that the variances differ. Generally people are using the t test to certify that two populations are different. ?If the variances are wildly different, that may be much more important than a difference in means. ?in fact, to test for a difference in means when the variances are wildly different is almost always substantively silly. ? There was a great example a few years ago from a psychiatric journal, comparing two medications, where the investigators did a t-test for the means when one distribution was unimodal and the other was bi-modal; there was no statistically significant difference in the means, but there was a really important difference in the distributions. ?The automatic use of the welch test makes you feel that you are protected against Bad Things, when you aren't. albyn Quoting Ian Fellows <ian.fellows at stat.ucla.edu>:
In the case of the t.test, having the default be var.equal=TRUE is the right way to go. There is little to no power lost by using the welch test, and the assumption of equal variance can be difficult to assess. For this reason, many introductory text books have now banished the equal variance t-test from their chapters (e.g. Moore's The Basic Practice of Statistics). Ian On Oct 25, 2010, at 4:05 PM, Albyn Jones wrote:
I don't know, the help file is uninformative. ?I'd guess the answer is "the author wrote it that way". ?Other R functions like t.test include similar unfortunate (to me) default choices, in that case var.equal=FALSE (ie the Welch test) is the default. albyn On Mon, Oct 25, 2010 at 04:15:20PM -0500, Laura Chihara wrote:
Yes, thank you for this reference. But according to this article, the score is better than continuity correction, so why is continuity correction the default with prop.test? -Laura On 10/25/2010 4:02 PM, Ralph O'Brien, PhD wrote:
I suggest: A. Agresti and B. A. Coull. Approximate is better than "exact" for interval estimation of binomial proportions. The American Statistician, 52(2):119-126, 1998. On Mon, Oct 25, 2010 at 4:38 PM, Laura Chihara <lchihara at carleton.edu <mailto:lchihara at carleton.edu>> wrote: ? Hi, ? I have a question about prop.test in R: ? I teach students the score confidence ? interval for proportions (also called ? Wilson or Wilson score interval). ? prop.test(,..., correct=FALSE) gives this ? interval. ? The default uses a continuity correction. ? When should we use one over the other? ? Is it worth going over this in class? Why ? is correct=TRUE the default? ? Thanks for any pedagogical guidance here! ? -- Laura ? ******************************************* ? Laura Chihara ? Professor of Mathematics ? 507-222-4065 (office) ? Dept of Mathematics ? ? ? ?507-222-4312 (fax) ? Carleton College ? 1 North College Street ? Northfield MN 55057 ? _______________________________________________ ? R-sig-teaching at r-project.org <mailto:R-sig-teaching at r-project.org> ? mailing list ? https://stat.ethz.ch/mailman/listinfo/r-sig-teaching -- Ralph O'Brien, PhD Professor, Dept of Epidemiology and Biostatistics Case Western Reserve University Office: 216.368.1927 Cell: 216.312.3203
-- ******************************************* Laura Chihara Professor of Mathematics ? 507-222-4065 (office) Dept of Mathematics ? ? ? ?507-222-4312 (fax) Carleton College 1 North College Street Northfield MN 55057
_______________________________________________ R-sig-teaching at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
-- Albyn Jones Reed College jones at reed.edu
_______________________________________________ R-sig-teaching at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
_______________________________________________ R-sig-teaching at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-teaching EBS European Business School gemeinnuetzige GmbH, Universitaet fuer Wirtschaft und Recht i.Gr. - Amtsgericht Wiesbaden HRB 19951 - Umsatzsteuer-ID DE 113891213 Geschaeftsfuehrung: Prof. Dr. Christopher Jahns, ?President; Prof. Dr. Rolf Tilmes, Dean Business School; Sabine Fuchs, CMO; Prof. Dr. Dr. Gerrick Frhr. v. Hoyningen-Huene, Dean Law School; Verwaltungsrat: Dr. Hellmut K. Albrecht, Vorsitzender ? ? ? ?[[alternative HTML version deleted]] _______________________________________________ R-sig-teaching at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
Exactly - elementary texts and methods books recommend the welch test for the reason you mention. Curiously, those same texts recommend using anova and regression without automatically correcting for the possibility of non-constant variance. Why is the case of comparing two means different from 3? Those same books will tell you that anova is pretty robust to non-constant variance. well, the two sample t-test is anova.
I agree with you that the presentation is unfortunate. Perhaps it has something to do with the fact that heteroskedastic consistent covariance matrices (HCCM) for linear regression are a relatively recent development (by White and Huber in the early 80s), and initially they performed poorly for small sample sizes. From a pedegogy standpoint the derivations of the formulas for HCCM are beyond the scope of an undergraduate course whereas the equal variance versions can be easily derived. Given more recent simulation studies showing the power and level of tests based on HCCM are comparable with equal variance regression, and that there is rarely any reason to apriori think that the variances are equal. The anova is robust to violations so long as the group sizes are equal. if they aren't then it isn't.
I don't use the welch test except as a conscious decision: ie I really want to compare the means while suspecting that the variances differ. Generally people are using the t test to certify that two populations are different. If the variances are wildly different, that may be much more important than a difference in means. in fact, to test for a difference in means when the variances are wildly different is almost always substantively silly. There was a great example a few years ago from a psychiatric journal, comparing two medications, where the investigators did a t-test for the means when one distribution was unimodal and the other was bi-modal; there was no statistically significant difference in the means, but there was a really important difference in the distributions. The automatic use of the welch test makes you feel that you are protected against Bad Things, when you aren't.
You may not suspect that the variances are different, but there is no apriori reason to think that they are equal. Why should you assume something you have no reason to believe is true? In my experience, people are not using the t-test to say that two populations are in some general way different, but rather specifically that the means vary. This is an important question regardless of whether the variances are equal. In your medication example, the shape of the two distributions was different, but when making the decision of whether to approve a medication, the more important question is whether the central tendency is different. Does one medication on average improve the outcome more than another. A secondary, though important, question is how variable the outcome is. The investigators made a correct inference (in stating no significant mean difference between the groups), but they missed an important question that they could have asked their data. This omission has nothing to do with the t-test. Using heteroskedastic robust methods DO protect against "Bad Things." What they don't do is reveal the existence important data trends unrelated to their hypothesis of interest. Ian