<https://meta.stackexchange.com/questions/362285/about-a-p-value-2-2e-16> Dear all, i would appreciate having your advice on the following please : in R, the wilcox.test() provides "a p-value < 2.2e-16", when we compare sets of 1000 genes expression (in the genomics field). however, the journal asks us to provide the exact p value ... would it be legitimate to write : "p-value = 0" ? thanks a lot, -- bogdan
about a p-value < 2.2e-16
24 messages · Spencer Graves, Peter Langfelder, Bogdan Tanasa +6 more
????? I would push back on that from two perspectives: ??? ??????? 1.? I would study exactly what the journal said very carefully.? If they mandated "wilcox.test", that function has an argument called "exact".? If that's what they are asking, then using that argument gives the exact p-value, e.g.: > wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) ??????? Wilcoxon rank sum exact test data:? rnorm(100) and rnorm(100, 2) W = 691, p-value < 2.2e-16 ??? ??????? 2.? If that's NOT what they are asking, then I'm not convinced what they are asking makes sense:? There is is no such thing as an "exact p value" except to the extent that certain assumptions hold, and all models are wrong (but some are useful), as George Box famously said years ago.[1]? Truth only exists in mathematics, and that's because it's a fiction to start with ;-) ????? Hope this helps. ????? Spencer Graves [1] https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
<https://meta.stackexchange.com/questions/362285/about-a-p-value-2-2e-16> Dear all, i would appreciate having your advice on the following please : in R, the wilcox.test() provides "a p-value < 2.2e-16", when we compare sets of 1000 genes expression (in the genomics field). however, the journal asks us to provide the exact p value ... would it be legitimate to write : "p-value = 0" ? thanks a lot, -- bogdan [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I thinnk the answer is much simpler. The print method for hypothesis tests (class htest) truncates the p-values. In the above example, instead of using wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) and copying the output, just print the p-value: tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 2.988368e-32 I think this value is what the journal asks for. HTH, Peter On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves
<spencer.graves at effectivedefense.org> wrote:
I would push back on that from two perspectives:
1. I would study exactly what the journal said very
carefully. If they mandated "wilcox.test", that function has an
argument called "exact". If that's what they are asking, then using
that argument gives the exact p-value, e.g.:
> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
Wilcoxon rank sum exact test
data: rnorm(100) and rnorm(100, 2)
W = 691, p-value < 2.2e-16
2. If that's NOT what they are asking, then I'm not
convinced what they are asking makes sense: There is is no such thing
as an "exact p value" except to the extent that certain assumptions
hold, and all models are wrong (but some are useful), as George Box
famously said years ago.[1] Truth only exists in mathematics, and
that's because it's a fiction to start with ;-)
Hope this helps.
Spencer Graves
[1]
https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
<https://meta.stackexchange.com/questions/362285/about-a-p-value-2-2e-16> Dear all, i would appreciate having your advice on the following please : in R, the wilcox.test() provides "a p-value < 2.2e-16", when we compare sets of 1000 genes expression (in the genomics field). however, the journal asks us to provide the exact p value ... would it be legitimate to write : "p-value = 0" ? thanks a lot, -- bogdan [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dear Spencer, thank you very much for your prompt email and help. When using :
wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
W = 698, p-value < 2.2e-16
wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE)
W = 1443, p-value < 2.2e-16 and in both cases p-value < 2.2e-16. By "exact" p-value, i have meant the "precise" p-value ; If I may ask please, could we write p-value = 0 ? i have noted a similar conversation on stackexchange, although the answer is not very clear (to me). https://stats.stackexchange.com/questions/78839/how-should-tiny-p-values-be-reported-and-why-does-r-put-a-minimum-on-2-22e-1 thanks again, bogdan On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves <
spencer.graves at effectivedefense.org> wrote:
I would push back on that from two perspectives:
1. I would study exactly what the journal said very
carefully. If they mandated "wilcox.test", that function has an
argument called "exact". If that's what they are asking, then using
that argument gives the exact p-value, e.g.:
> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
Wilcoxon rank sum exact test
data: rnorm(100) and rnorm(100, 2)
W = 691, p-value < 2.2e-16
2. If that's NOT what they are asking, then I'm not
convinced what they are asking makes sense: There is is no such thing
as an "exact p value" except to the extent that certain assumptions
hold, and all models are wrong (but some are useful), as George Box
famously said years ago.[1] Truth only exists in mathematics, and
that's because it's a fiction to start with ;-)
Hope this helps.
Spencer Graves
[1]
https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
<
Dear all,
i would appreciate having your advice on the following please :
in R, the wilcox.test() provides "a p-value < 2.2e-16", when we compare
sets of 1000 genes expression (in the genomics field).
however, the journal asks us to provide the exact p value ...
would it be legitimate to write : "p-value = 0" ? thanks a lot,
-- bogdan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dear Peter, thanks a lot. yes, we can see a very precise p-value, and that was the request from the journal. if I may ask another question please : what is the meaning of "exact=TRUE" or "exact=FALSE" in wilcox.test ? i can see that the "numerically precise" p-values are different. thanks a lot ! tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 8.535524e-25 tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) tst$p.value [1] 3.448211e-25 On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder <
peter.langfelder at gmail.com> wrote:
I thinnk the answer is much simpler. The print method for hypothesis tests (class htest) truncates the p-values. In the above example, instead of using wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) and copying the output, just print the p-value: tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 2.988368e-32 I think this value is what the journal asks for. HTH, Peter On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves <spencer.graves at effectivedefense.org> wrote:
I would push back on that from two perspectives:
1. I would study exactly what the journal said very
carefully. If they mandated "wilcox.test", that function has an
argument called "exact". If that's what they are asking, then using
that argument gives the exact p-value, e.g.:
> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
Wilcoxon rank sum exact test
data: rnorm(100) and rnorm(100, 2)
W = 691, p-value < 2.2e-16
2. If that's NOT what they are asking, then I'm not
convinced what they are asking makes sense: There is is no such thing
as an "exact p value" except to the extent that certain assumptions
hold, and all models are wrong (but some are useful), as George Box
famously said years ago.[1] Truth only exists in mathematics, and
that's because it's a fiction to start with ;-)
Hope this helps.
Spencer Graves
[1]
https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
<
Dear all,
i would appreciate having your advice on the following please :
in R, the wilcox.test() provides "a p-value < 2.2e-16", when we compare
sets of 1000 genes expression (in the genomics field).
however, the journal asks us to provide the exact p value ...
would it be legitimate to write : "p-value = 0" ? thanks a lot,
-- bogdan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Bogdan, You can also get the information from the link of the Wilcox.test function page. ?By default (if exact is not specified), an exact p-value is computed if the samples contain less than 50 finite values and there are no ties. Otherwise, a normal approximation is used.? For more: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html Hope this helps! Best, VD
On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tanasa at gmail.com> wrote:
Dear Peter, thanks a lot. yes, we can see a very precise p-value, and that was the request from the journal. if I may ask another question please : what is the meaning of "exact=TRUE" or "exact=FALSE" in wilcox.test ? i can see that the "numerically precise" p-values are different. thanks a lot ! tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 8.535524e-25 tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) tst$p.value [1] 3.448211e-25 On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < peter.langfelder at gmail.com> wrote:
I thinnk the answer is much simpler. The print method for hypothesis tests (class htest) truncates the p-values. In the above example, instead of using wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) and copying the output, just print the p-value: tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 2.988368e-32 I think this value is what the journal asks for. HTH, Peter On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves <spencer.graves at effectivedefense.org> wrote:
I would push back on that from two perspectives:
1. I would study exactly what the journal said very
carefully. If they mandated "wilcox.test", that function has an
argument called "exact". If that's what they are asking, then using
that argument gives the exact p-value, e.g.:
> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
Wilcoxon rank sum exact test
data: rnorm(100) and rnorm(100, 2)
W = 691, p-value < 2.2e-16
2. If that's NOT what they are asking, then I'm not
convinced what they are asking makes sense: There is is no such thing
as an "exact p value" except to the extent that certain assumptions
hold, and all models are wrong (but some are useful), as George Box
famously said years ago.[1] Truth only exists in mathematics, and
that's because it's a fiction to start with ;-)
Hope this helps.
Spencer Graves
[1]
https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
<
Dear all, i would appreciate having your advice on the following please : in R, the wilcox.test() provides "a p-value < 2.2e-16", when we
compare
sets of 1000 genes expression (in the genomics field).
however, the journal asks us to provide the exact p value ...
would it be legitimate to write : "p-value = 0" ? thanks a lot,
-- bogdan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
---------------------------------------------------------- Vivek Das, PhD [[alternative HTML version deleted]]
thanks a lot, Vivek ! in other words, assuming that we work with 1000 data points, shall we use EXACT = TRUE, it uses the normal approximation, while if EXACT=FALSE (for these large samples), it does not ?
On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind at gmail.com> wrote:
Hi Bogdan, You can also get the information from the link of the Wilcox.test function page. ?By default (if exact is not specified), an exact p-value is computed if the samples contain less than 50 finite values and there are no ties. Otherwise, a normal approximation is used.? For more: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html Hope this helps! Best, VD On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tanasa at gmail.com> wrote:
Dear Peter, thanks a lot. yes, we can see a very precise p-value, and that was the request from the journal. if I may ask another question please : what is the meaning of "exact=TRUE" or "exact=FALSE" in wilcox.test ? i can see that the "numerically precise" p-values are different. thanks a lot ! tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 8.535524e-25 tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) tst$p.value [1] 3.448211e-25 On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < peter.langfelder at gmail.com> wrote:
I thinnk the answer is much simpler. The print method for hypothesis tests (class htest) truncates the p-values. In the above example, instead of using wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) and copying the output, just print the p-value: tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 2.988368e-32 I think this value is what the journal asks for. HTH, Peter On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves <spencer.graves at effectivedefense.org> wrote:
I would push back on that from two perspectives:
1. I would study exactly what the journal said very
carefully. If they mandated "wilcox.test", that function has an
argument called "exact". If that's what they are asking, then using
that argument gives the exact p-value, e.g.:
> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
Wilcoxon rank sum exact test
data: rnorm(100) and rnorm(100, 2)
W = 691, p-value < 2.2e-16
2. If that's NOT what they are asking, then I'm not
convinced what they are asking makes sense: There is is no such thing
as an "exact p value" except to the extent that certain assumptions
hold, and all models are wrong (but some are useful), as George Box
famously said years ago.[1] Truth only exists in mathematics, and
that's because it's a fiction to start with ;-)
Hope this helps.
Spencer Graves
[1]
https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
<
Dear all, i would appreciate having your advice on the following please : in R, the wilcox.test() provides "a p-value < 2.2e-16", when we
compare
sets of 1000 genes expression (in the genomics field).
however, the journal asks us to provide the exact p value ...
would it be legitimate to write : "p-value = 0" ? thanks a lot,
-- bogdan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- ---------------------------------------------------------- Vivek Das, PhD
Sent from my iPhone
On Mar 18, 2021, at 10:26 PM, Bogdan Tanasa <tanasa at gmail.com> wrote: ?Dear Spencer, thank you very much for your prompt email and help. When using :
wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
W = 698, p-value < 2.2e-16
wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE)
W = 1443, p-value < 2.2e-16 and in both cases p-value < 2.2e-16. By "exact" p-value, i have meant the "precise" p-value ; If I may ask please, could we write p-value = 0 ? i have noted a similar conversation on stackexchange, although the answer is not very clear (to me).
The reason it wasn?t and couldn?t be ?clear? was that the underlying scientific question and the statistical methods were not precisely described. The same lack of background information still persists in this discussion. ? David
https://stats.stackexchange.com/questions/78839/how-should-tiny-p-values-be-reported-and-why-does-r-put-a-minimum-on-2-22e-1 thanks again, bogdan
On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves <
spencer.graves at effectivedefense.org> wrote:
I would push back on that from two perspectives:
1. I would study exactly what the journal said very
carefully. If they mandated "wilcox.test", that function has an
argument called "exact". If that's what they are asking, then using
that argument gives the exact p-value, e.g.:
wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
Wilcoxon rank sum exact test
data: rnorm(100) and rnorm(100, 2)
W = 691, p-value < 2.2e-16
2. If that's NOT what they are asking, then I'm not
convinced what they are asking makes sense: There is is no such thing
as an "exact p value" except to the extent that certain assumptions
hold, and all models are wrong (but some are useful), as George Box
famously said years ago.[1] Truth only exists in mathematics, and
that's because it's a fiction to start with ;-)
Hope this helps.
Spencer Graves
[1]
https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote: <
Dear all,
i would appreciate having your advice on the following please :
in R, the wilcox.test() provides "a p-value < 2.2e-16", when we compare
sets of 1000 genes expression (in the genomics field).
however, the journal asks us to provide the exact p value ...
would it be legitimate to write : "p-value = 0" ? thanks a lot,
-- bogdan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
thanks a lot, Vivek ! in other words, assuming that we work with 1000 data points, shall we use EXACT = TRUE, it uses the normal approximation, while if EXACT=FALSE (for these large samples), it does not ?
????? As David Winsemius noted, the documentation is not clear. Consider the following:
set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > > wilcox.test(x, y)$p.value
[1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > > wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y, exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y, EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here: 1.172189e-25 and 4.123875e-32. The first one, I think, is the normal approximation, which is the same as exact=FALSE. I think that with exact=FALSE, you get a permutation distribution, though I'm not sure. You might try looking at "wilcox_test in package coin for exact, asymptotic and Monte Carlo conditional p-values, including in the presence of ties" to see if it is clearer. NOTE: R is case sensitive, so "EXACT" is a different variable from "exact". It is interpreted as an optional argument, which is not recognized and therefore ignored in this context. Hope this helps. Spencer
On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind at gmail.com> wrote:
Hi Bogdan, You can also get the information from the link of the Wilcox.test function page. ?By default (if exact is not specified), an exact p-value is computed if the samples contain less than 50 finite values and there are no ties. Otherwise, a normal approximation is used.? For more: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html Hope this helps! Best, VD On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tanasa at gmail.com> wrote:
Dear Peter, thanks a lot. yes, we can see a very precise p-value, and that was the request from the journal. if I may ask another question please : what is the meaning of "exact=TRUE" or "exact=FALSE" in wilcox.test ? i can see that the "numerically precise" p-values are different. thanks a lot ! tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 8.535524e-25 tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) tst$p.value [1] 3.448211e-25 On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < peter.langfelder at gmail.com> wrote:
I thinnk the answer is much simpler. The print method for hypothesis tests (class htest) truncates the p-values. In the above example, instead of using wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) and copying the output, just print the p-value: tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 2.988368e-32 I think this value is what the journal asks for. HTH, Peter On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves <spencer.graves at effectivedefense.org> wrote:
I would push back on that from two perspectives:
1. I would study exactly what the journal said very
carefully. If they mandated "wilcox.test", that function has an
argument called "exact". If that's what they are asking, then using
that argument gives the exact p-value, e.g.:
> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
Wilcoxon rank sum exact test
data: rnorm(100) and rnorm(100, 2)
W = 691, p-value < 2.2e-16
2. If that's NOT what they are asking, then I'm not
convinced what they are asking makes sense: There is is no such thing
as an "exact p value" except to the extent that certain assumptions
hold, and all models are wrong (but some are useful), as George Box
famously said years ago.[1] Truth only exists in mathematics, and
that's because it's a fiction to start with ;-)
Hope this helps.
Spencer Graves
[1]
https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
<
Dear all, i would appreciate having your advice on the following please : in R, the wilcox.test() provides "a p-value < 2.2e-16", when we
compare
sets of 1000 genes expression (in the genomics field).
however, the journal asks us to provide the exact p value ...
would it be legitimate to write : "p-value = 0" ? thanks a lot,
-- bogdan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- ---------------------------------------------------------- Vivek Das, PhD
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hey, I just want to point out that the word "exact" has two meanings. It can mean the numerically accurate p-value as Bogdan asked in his first email, or it could mean the p-value calculated from the exact distribution of the statistic(In this case, U stat). These two are actually not related, even though they all called "exact". Best, Jiefei On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves <
spencer.graves at effectivedefense.org> wrote:
On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
thanks a lot, Vivek ! in other words, assuming that we work with 1000
data
points, shall we use EXACT = TRUE, it uses the normal approximation, while if EXACT=FALSE (for these large samples), it does not ?
As David Winsemius noted, the documentation is not clear.
Consider the following:
set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > > wilcox.test(x,
y)$p.value
[1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > >
wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x,
y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y,
exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y,
EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here:
1.172189e-25 and 4.123875e-32. The first one, I think, is the normal
approximation, which is the same as exact=FALSE. I think that with
exact=FALSE, you get a permutation distribution, though I'm not sure.
You might try looking at "wilcox_test in package coin for exact,
asymptotic and Monte Carlo conditional p-values, including in the
presence of ties" to see if it is clearer. NOTE: R is case sensitive, so
"EXACT" is a different variable from "exact". It is interpreted as an
optional argument, which is not recognized and therefore ignored in this
context.
Hope this helps.
Spencer
On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind at gmail.com> wrote:
Hi Bogdan, You can also get the information from the link of the Wilcox.test
function
page. ?By default (if exact is not specified), an exact p-value is computed if the samples contain less than 50 finite values and there are no ties. Otherwise, a normal approximation is used.? For more:
Hope this helps! Best, VD On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tanasa at gmail.com>
wrote:
Dear Peter, thanks a lot. yes, we can see a very precise p-value, and
that
was the request from the journal. if I may ask another question please : what is the meaning of
"exact=TRUE"
or "exact=FALSE" in wilcox.test ? i can see that the "numerically precise" p-values are different.
thanks a
lot ! tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 8.535524e-25 tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) tst$p.value [1] 3.448211e-25 On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < peter.langfelder at gmail.com> wrote:
I thinnk the answer is much simpler. The print method for hypothesis tests (class htest) truncates the p-values. In the above example, instead of using wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) and copying the output, just print the p-value: tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 2.988368e-32 I think this value is what the journal asks for. HTH, Peter On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves <spencer.graves at effectivedefense.org> wrote:
I would push back on that from two perspectives:
1. I would study exactly what the journal said very
carefully. If they mandated "wilcox.test", that function has an
argument called "exact". If that's what they are asking, then using
that argument gives the exact p-value, e.g.:
> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
Wilcoxon rank sum exact test
data: rnorm(100) and rnorm(100, 2)
W = 691, p-value < 2.2e-16
2. If that's NOT what they are asking, then I'm not
convinced what they are asking makes sense: There is is no such
thing
as an "exact p value" except to the extent that certain assumptions
hold, and all models are wrong (but some are useful), as George Box
famously said years ago.[1] Truth only exists in mathematics, and
that's because it's a fiction to start with ;-)
Hope this helps.
Spencer Graves
[1]
https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
<
Dear all, i would appreciate having your advice on the following please : in R, the wilcox.test() provides "a p-value < 2.2e-16", when we
compare
sets of 1000 genes expression (in the genomics field).
however, the journal asks us to provide the exact p value ...
would it be legitimate to write : "p-value = 0" ? thanks a lot,
-- bogdan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- ---------------------------------------------------------- Vivek Das, PhD
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
After digging into the R source, it turns out that the argument `exact` has nothing to do with the numeric precision. It only affects the statistic model used to compute the p-value. When `exact=TRUE` the true distribution of the statistic will be used. Otherwise, a normal approximation will be used. I think the documentation needs to be improved here, you can compute the exact p-value *only* when you do not have any ties in your data. If you have ties in your data you will get the p-value from the normal approximation no matter what value you put in `exact`. This behavior should be documented or a warning should be given when `exact=TRUE` and ties present. FYI, if the exact p-value is required, `pwilcox` function will be used to compute the p-value. There are no details on how it computes the pvalue but its C code seems to compute the probability table, so I assume it computes the exact p-value from the true distribution of the statistic, not a permutation or MC p-value. Best, Jiefei
On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwjf08 at gmail.com> wrote:
Hey, I just want to point out that the word "exact" has two meanings. It can mean the numerically accurate p-value as Bogdan asked in his first email, or it could mean the p-value calculated from the exact distribution of the statistic(In this case, U stat). These two are actually not related, even though they all called "exact". Best, Jiefei On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves < spencer.graves at effectivedefense.org> wrote:
On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
thanks a lot, Vivek ! in other words, assuming that we work with 1000
data
points, shall we use EXACT = TRUE, it uses the normal approximation, while if EXACT=FALSE (for these large samples), it does not ?
As David Winsemius noted, the documentation is not clear.
Consider the following:
set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > > wilcox.test(x,
y)$p.value
[1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > >
wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x,
y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y,
exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y,
EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here:
1.172189e-25 and 4.123875e-32. The first one, I think, is the normal
approximation, which is the same as exact=FALSE. I think that with
exact=FALSE, you get a permutation distribution, though I'm not sure.
You might try looking at "wilcox_test in package coin for exact,
asymptotic and Monte Carlo conditional p-values, including in the
presence of ties" to see if it is clearer. NOTE: R is case sensitive, so
"EXACT" is a different variable from "exact". It is interpreted as an
optional argument, which is not recognized and therefore ignored in this
context.
Hope this helps.
Spencer
On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind at gmail.com> wrote:
Hi Bogdan, You can also get the information from the link of the Wilcox.test
function
page. ?By default (if exact is not specified), an exact p-value is computed
if
the samples contain less than 50 finite values and there are no ties. Otherwise, a normal approximation is used.? For more:
Hope this helps! Best, VD On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tanasa at gmail.com>
wrote:
Dear Peter, thanks a lot. yes, we can see a very precise p-value, and
that
was the request from the journal. if I may ask another question please : what is the meaning of
"exact=TRUE"
or "exact=FALSE" in wilcox.test ? i can see that the "numerically precise" p-values are different.
thanks a
lot ! tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 8.535524e-25 tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) tst$p.value [1] 3.448211e-25 On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < peter.langfelder at gmail.com> wrote:
I thinnk the answer is much simpler. The print method for hypothesis tests (class htest) truncates the p-values. In the above example, instead of using wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) and copying the output, just print the p-value: tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 2.988368e-32 I think this value is what the journal asks for. HTH, Peter On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves <spencer.graves at effectivedefense.org> wrote:
I would push back on that from two perspectives:
1. I would study exactly what the journal said very
carefully. If they mandated "wilcox.test", that function has an
argument called "exact". If that's what they are asking, then using
that argument gives the exact p-value, e.g.:
> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
Wilcoxon rank sum exact test
data: rnorm(100) and rnorm(100, 2)
W = 691, p-value < 2.2e-16
2. If that's NOT what they are asking, then I'm not
convinced what they are asking makes sense: There is is no such
thing
as an "exact p value" except to the extent that certain assumptions
hold, and all models are wrong (but some are useful), as George Box
famously said years ago.[1] Truth only exists in mathematics, and
that's because it's a fiction to start with ;-)
Hope this helps.
Spencer Graves
[1]
https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
<
Dear all, i would appreciate having your advice on the following please : in R, the wilcox.test() provides "a p-value < 2.2e-16", when we
compare
sets of 1000 genes expression (in the genomics field).
however, the journal asks us to provide the exact p value ...
would it be legitimate to write : "p-value = 0" ? thanks a lot,
-- bogdan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- ---------------------------------------------------------- Vivek Das, PhD
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 2021-3-19 9:52 AM, Jiefei Wang wrote:
After digging into the R source, it turns out that the argument `exact` has nothing to do with the numeric precision. It only affects the statistic model used to compute the p-value. When `exact=TRUE` the true distribution of the statistic will be used. Otherwise, a normal approximation will be used. I think the documentation needs to be improved here, you can compute the exact p-value *only* when you do not have any ties in your data. If you have ties in your data you will get the p-value from the normal approximation no matter what value you put in `exact`. This behavior should be documented or a warning should be given when `exact=TRUE` and ties present. FYI, if the exact p-value is required, `pwilcox` function will be used to compute the p-value. There are no details on how it computes the pvalue but its C code seems to compute the probability table, so I assume it computes the exact p-value from the true distribution of the statistic, not a permutation or MC p-value.
????? My example shows that it does NOT use Monte Carlo, because otherwise it uses some distribution.? I believe the term "exact" means that it uses the permutation distribution, though I could be mistaken.? If it's NOT a permutation distribution, I don't know what it is. ????? Spencer
Best, Jiefei On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwjf08 at gmail.com> wrote:
Hey, I just want to point out that the word "exact" has two meanings. It can mean the numerically accurate p-value as Bogdan asked in his first email, or it could mean the p-value calculated from the exact distribution of the statistic(In this case, U stat). These two are actually not related, even though they all called "exact". Best, Jiefei On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves < spencer.graves at effectivedefense.org> wrote:
On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
thanks a lot, Vivek ! in other words, assuming that we work with 1000
data
points, shall we use EXACT = TRUE, it uses the normal approximation, while if EXACT=FALSE (for these large samples), it does not ?
As David Winsemius noted, the documentation is not clear.
Consider the following:
set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > > wilcox.test(x,
y)$p.value
[1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > >
wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x,
y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y,
exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y,
EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here:
1.172189e-25 and 4.123875e-32. The first one, I think, is the normal
approximation, which is the same as exact=FALSE. I think that with
exact=FALSE, you get a permutation distribution, though I'm not sure.
You might try looking at "wilcox_test in package coin for exact,
asymptotic and Monte Carlo conditional p-values, including in the
presence of ties" to see if it is clearer. NOTE: R is case sensitive, so
"EXACT" is a different variable from "exact". It is interpreted as an
optional argument, which is not recognized and therefore ignored in this
context.
Hope this helps.
Spencer
On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind at gmail.com> wrote:
Hi Bogdan, You can also get the information from the link of the Wilcox.test
function
page. ?By default (if exact is not specified), an exact p-value is computed
if
the samples contain less than 50 finite values and there are no ties. Otherwise, a normal approximation is used.? For more:
Hope this helps! Best, VD On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tanasa at gmail.com>
wrote:
Dear Peter, thanks a lot. yes, we can see a very precise p-value, and
that
was the request from the journal. if I may ask another question please : what is the meaning of
"exact=TRUE"
or "exact=FALSE" in wilcox.test ? i can see that the "numerically precise" p-values are different.
thanks a
lot ! tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 8.535524e-25 tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) tst$p.value [1] 3.448211e-25 On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < peter.langfelder at gmail.com> wrote:
I thinnk the answer is much simpler. The print method for hypothesis tests (class htest) truncates the p-values. In the above example, instead of using wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) and copying the output, just print the p-value: tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 2.988368e-32 I think this value is what the journal asks for. HTH, Peter On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves <spencer.graves at effectivedefense.org> wrote:
I would push back on that from two perspectives:
1. I would study exactly what the journal said very
carefully. If they mandated "wilcox.test", that function has an
argument called "exact". If that's what they are asking, then using
that argument gives the exact p-value, e.g.:
> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
Wilcoxon rank sum exact test
data: rnorm(100) and rnorm(100, 2)
W = 691, p-value < 2.2e-16
2. If that's NOT what they are asking, then I'm not
convinced what they are asking makes sense: There is is no such
thing
as an "exact p value" except to the extent that certain assumptions
hold, and all models are wrong (but some are useful), as George Box
famously said years ago.[1] Truth only exists in mathematics, and
that's because it's a fiction to start with ;-)
Hope this helps.
Spencer Graves
[1]
https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
<
Dear all, i would appreciate having your advice on the following please : in R, the wilcox.test() provides "a p-value < 2.2e-16", when we
compare
sets of 1000 genes expression (in the genomics field).
however, the journal asks us to provide the exact p value ...
would it be legitimate to write : "p-value = 0" ? thanks a lot,
-- bogdan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- ---------------------------------------------------------- Vivek Das, PhD
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dear Jiefei, This behavior is documented. From help(wilcox.test): "By default (if exact is not specified), an exact p-value is computed if the samples contain less than 50 finite values and there are no ties. Otherwise, a normal approximation is used." Best, Wolfgang
-----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jiefei Wang Sent: Friday, 19 March, 2021 15:52 To: Spencer Graves Cc: r-help; Bogdan Tanasa Subject: Re: [R] about a p-value < 2.2e-16 After digging into the R source, it turns out that the argument `exact` has nothing to do with the numeric precision. It only affects the statistic model used to compute the p-value. When `exact=TRUE` the true distribution of the statistic will be used. Otherwise, a normal approximation will be used. I think the documentation needs to be improved here, you can compute the exact p-value *only* when you do not have any ties in your data. If you have ties in your data you will get the p-value from the normal approximation no matter what value you put in `exact`. This behavior should be documented or a warning should be given when `exact=TRUE` and ties present. FYI, if the exact p-value is required, `pwilcox` function will be used to compute the p-value. There are no details on how it computes the pvalue but its C code seems to compute the probability table, so I assume it computes the exact p-value from the true distribution of the statistic, not a permutation or MC p-value. Best, Jiefei
Dear Wolfgang, Thanks for the documentation, but the document only states the default behavior, it does not mention what would happen if we tell it to compute the exact p-value but the data has ties. I think this would be misleading as people might think their result is exact by specifying `exact=TRUE` but the truth is that their data contains ties and the result is from the normal approximation. Best, Jiefei On Fri, Mar 19, 2021 at 11:18 PM Viechtbauer, Wolfgang (SP) <
wolfgang.viechtbauer at maastrichtuniversity.nl> wrote:
Dear Jiefei, This behavior is documented. From help(wilcox.test): "By default (if exact is not specified), an exact p-value is computed if the samples contain less than 50 finite values and there are no ties. Otherwise, a normal approximation is used." Best, Wolfgang
-----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jiefei
Wang
Sent: Friday, 19 March, 2021 15:52 To: Spencer Graves Cc: r-help; Bogdan Tanasa Subject: Re: [R] about a p-value < 2.2e-16 After digging into the R source, it turns out that the argument `exact`
has
nothing to do with the numeric precision. It only affects the statistic model used to compute the p-value. When `exact=TRUE` the true distribution of the statistic will be used. Otherwise, a normal approximation will be used. I think the documentation needs to be improved here, you can compute the exact p-value *only* when you do not have any ties in your data. If you have ties in your data you will get the p-value from the normal approximation no matter what value you put in `exact`. This behavior
should
be documented or a warning should be given when `exact=TRUE` and ties present. FYI, if the exact p-value is required, `pwilcox` function will be used to compute the p-value. There are no details on how it computes the pvalue
but
its C code seems to compute the probability table, so I assume it computes the exact p-value from the true distribution of the statistic, not a permutation or MC p-value. Best, Jiefei
Dear Jiefei, and all, many thanks for your time and comments, suggestions, insights. -- bogdan
On Fri, Mar 19, 2021 at 7:52 AM Jiefei Wang <szwjf08 at gmail.com> wrote:
After digging into the R source, it turns out that the argument `exact` has nothing to do with the numeric precision. It only affects the statistic model used to compute the p-value. When `exact=TRUE` the true distribution of the statistic will be used. Otherwise, a normal approximation will be used. I think the documentation needs to be improved here, you can compute the exact p-value *only* when you do not have any ties in your data. If you have ties in your data you will get the p-value from the normal approximation no matter what value you put in `exact`. This behavior should be documented or a warning should be given when `exact=TRUE` and ties present. FYI, if the exact p-value is required, `pwilcox` function will be used to compute the p-value. There are no details on how it computes the pvalue but its C code seems to compute the probability table, so I assume it computes the exact p-value from the true distribution of the statistic, not a permutation or MC p-value. Best, Jiefei On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwjf08 at gmail.com> wrote:
Hey, I just want to point out that the word "exact" has two meanings. It can mean the numerically accurate p-value as Bogdan asked in his first email, or it could mean the p-value calculated from the exact distribution of the statistic(In this case, U stat). These two are actually not related, even though they all called "exact". Best, Jiefei On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves < spencer.graves at effectivedefense.org> wrote:
On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
thanks a lot, Vivek ! in other words, assuming that we work with 1000
data
points, shall we use EXACT = TRUE, it uses the normal approximation, while if EXACT=FALSE (for these large samples), it does not ?
As David Winsemius noted, the documentation is not clear.
Consider the following:
set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > > wilcox.test(x,
y)$p.value
[1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > >
wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x,
y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y,
exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y,
EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here:
1.172189e-25 and 4.123875e-32. The first one, I think, is the normal
approximation, which is the same as exact=FALSE. I think that with
exact=FALSE, you get a permutation distribution, though I'm not sure.
You might try looking at "wilcox_test in package coin for exact,
asymptotic and Monte Carlo conditional p-values, including in the
presence of ties" to see if it is clearer. NOTE: R is case sensitive, so
"EXACT" is a different variable from "exact". It is interpreted as an
optional argument, which is not recognized and therefore ignored in this
context.
Hope this helps.
Spencer
On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind at gmail.com> wrote:
Hi Bogdan, You can also get the information from the link of the Wilcox.test
function
page. ?By default (if exact is not specified), an exact p-value is computed
if
the samples contain less than 50 finite values and there are no ties. Otherwise, a normal approximation is used.? For more:
Hope this helps! Best, VD On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tanasa at gmail.com>
wrote:
Dear Peter, thanks a lot. yes, we can see a very precise p-value,
and that
was the request from the journal. if I may ask another question please : what is the meaning of
"exact=TRUE"
or "exact=FALSE" in wilcox.test ? i can see that the "numerically precise" p-values are different.
thanks a
lot ! tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 8.535524e-25 tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) tst$p.value [1] 3.448211e-25 On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < peter.langfelder at gmail.com> wrote:
I thinnk the answer is much simpler. The print method for hypothesis tests (class htest) truncates the p-values. In the above example, instead of using wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) and copying the output, just print the p-value: tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 2.988368e-32 I think this value is what the journal asks for. HTH, Peter On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves <spencer.graves at effectivedefense.org> wrote:
I would push back on that from two perspectives:
1. I would study exactly what the journal said very
carefully. If they mandated "wilcox.test", that function has an
argument called "exact". If that's what they are asking, then
using
that argument gives the exact p-value, e.g.:
> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
Wilcoxon rank sum exact test
data: rnorm(100) and rnorm(100, 2)
W = 691, p-value < 2.2e-16
2. If that's NOT what they are asking, then I'm not
convinced what they are asking makes sense: There is is no such
thing
as an "exact p value" except to the extent that certain assumptions
hold, and all models are wrong (but some are useful), as George Box
famously said years ago.[1] Truth only exists in mathematics, and
that's because it's a fiction to start with ;-)
Hope this helps.
Spencer Graves
[1]
https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
<
Dear all, i would appreciate having your advice on the following please : in R, the wilcox.test() provides "a p-value < 2.2e-16", when we
compare
sets of 1000 genes expression (in the genomics field).
however, the journal asks us to provide the exact p value ...
would it be legitimate to write : "p-value = 0" ? thanks a lot,
-- bogdan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- ---------------------------------------------------------- Vivek Das, PhD
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Spencer, Thanks for your test results, I do not know the answer as I haven't used wilcox.test for many years. I do not know if it is possible to compute the exact distribution of the Wilcoxon rank sum statistic, but I think it is very likely, as the document of `Wilcoxon` says: This distribution is obtained as follows. Let x and y be two random, independent samples of size m and n. Then the Wilcoxon rank sum statistic is the number of all pairs (x[i], y[j]) for which y[j] is not greater than x[i]. This statistic takes values between 0 and m * n, and its mean and variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively. As a nice feature of the non-parametric statistic, it is usually distribution-free so you can pick any distribution you like to compute the same statistic. I wonder if this is the case, but I might be wrong. Cheers, Jiefei On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves <
spencer.graves at effectivedefense.org> wrote:
On 2021-3-19 9:52 AM, Jiefei Wang wrote:
After digging into the R source, it turns out that the argument `exact`
has
nothing to do with the numeric precision. It only affects the statistic model used to compute the p-value. When `exact=TRUE` the true
distribution
of the statistic will be used. Otherwise, a normal approximation will be used. I think the documentation needs to be improved here, you can compute the exact p-value *only* when you do not have any ties in your data. If you have ties in your data you will get the p-value from the normal approximation no matter what value you put in `exact`. This behavior
should
be documented or a warning should be given when `exact=TRUE` and ties present. FYI, if the exact p-value is required, `pwilcox` function will be used to compute the p-value. There are no details on how it computes the pvalue
but
its C code seems to compute the probability table, so I assume it
computes
the exact p-value from the true distribution of the statistic, not a permutation or MC p-value.
My example shows that it does NOT use Monte Carlo, because
otherwise it uses some distribution. I believe the term "exact" means
that it uses the permutation distribution, though I could be mistaken.
If it's NOT a permutation distribution, I don't know what it is.
Spencer
Best, Jiefei On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwjf08 at gmail.com> wrote:
Hey, I just want to point out that the word "exact" has two meanings. It can mean the numerically accurate p-value as Bogdan asked in his first
email,
or it could mean the p-value calculated from the exact distribution of
the
statistic(In this case, U stat). These two are actually not related,
even
though they all called "exact". Best, Jiefei On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves < spencer.graves at effectivedefense.org> wrote:
On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
thanks a lot, Vivek ! in other words, assuming that we work with 1000
data
points, shall we use EXACT = TRUE, it uses the normal approximation, while if EXACT=FALSE (for these large samples), it does not ?
As David Winsemius noted, the documentation is not clear.
Consider the following:
set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > > wilcox.test(x,
y)$p.value [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > > wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y, exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y, EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here: 1.172189e-25 and 4.123875e-32. The first one, I think, is the normal approximation, which is the same as exact=FALSE. I think that with exact=FALSE, you get a permutation distribution, though I'm not sure. You might try looking at "wilcox_test in package coin for exact, asymptotic and Monte Carlo conditional p-values, including in the presence of ties" to see if it is clearer. NOTE: R is case sensitive,
so
"EXACT" is a different variable from "exact". It is interpreted as an optional argument, which is not recognized and therefore ignored in
this
context.
Hope this helps.
Spencer
On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind at gmail.com>
wrote:
Hi Bogdan, You can also get the information from the link of the Wilcox.test
function
page. ?By default (if exact is not specified), an exact p-value is computed
if
the samples contain less than 50 finite values and there are no ties. Otherwise, a normal approximation is used.? For more:
Hope this helps! Best, VD On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tanasa at gmail.com>
wrote:
Dear Peter, thanks a lot. yes, we can see a very precise p-value,
and
that
was the request from the journal. if I may ask another question please : what is the meaning of
"exact=TRUE"
or "exact=FALSE" in wilcox.test ? i can see that the "numerically precise" p-values are different.
thanks a
lot ! tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 8.535524e-25 tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) tst$p.value [1] 3.448211e-25 On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < peter.langfelder at gmail.com> wrote:
I thinnk the answer is much simpler. The print method for
hypothesis
tests (class htest) truncates the p-values. In the above example, instead of using wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) and copying the output, just print the p-value: tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 2.988368e-32 I think this value is what the journal asks for. HTH, Peter On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves <spencer.graves at effectivedefense.org> wrote:
I would push back on that from two perspectives:
1. I would study exactly what the journal said
very
carefully. If they mandated "wilcox.test", that function has an argument called "exact". If that's what they are asking, then
using
that argument gives the exact p-value, e.g.:
> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
Wilcoxon rank sum exact test
data: rnorm(100) and rnorm(100, 2)
W = 691, p-value < 2.2e-16
2. If that's NOT what they are asking, then I'm
not
convinced what they are asking makes sense: There is is no such
thing
as an "exact p value" except to the extent that certain
assumptions
hold, and all models are wrong (but some are useful), as George
Box
famously said years ago.[1] Truth only exists in mathematics, and
that's because it's a fiction to start with ;-)
Hope this helps.
Spencer Graves
[1]
https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
<
Dear all, i would appreciate having your advice on the following please : in R, the wilcox.test() provides "a p-value < 2.2e-16", when we
compare
sets of 1000 genes expression (in the genomics field).
however, the journal asks us to provide the exact p value ...
would it be legitimate to write : "p-value = 0" ? thanks a lot,
-- bogdan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- ---------------------------------------------------------- Vivek Das, PhD
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I **believe** -- if my old memory still serves-- that the "exact" specification uses a home grown version of the algorithm to calculate exact, or close approximations to the exact, permutation distribution originally developed by Cyrus Mehta, founder of StatXact software. Of course, examining the C code source would determine this, but I don't care to attempt this. If this is (no longer?) correct, please point this out. Best, Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang <szwjf08 at gmail.com> wrote:
Hi Spencer, Thanks for your test results, I do not know the answer as I haven't used wilcox.test for many years. I do not know if it is possible to compute the exact distribution of the Wilcoxon rank sum statistic, but I think it is very likely, as the document of `Wilcoxon` says: This distribution is obtained as follows. Let x and y be two random, independent samples of size m and n. Then the Wilcoxon rank sum statistic is the number of all pairs (x[i], y[j]) for which y[j] is not greater than x[i]. This statistic takes values between 0 and m * n, and its mean and variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively. As a nice feature of the non-parametric statistic, it is usually distribution-free so you can pick any distribution you like to compute the same statistic. I wonder if this is the case, but I might be wrong. Cheers, Jiefei On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves < spencer.graves at effectivedefense.org> wrote:
On 2021-3-19 9:52 AM, Jiefei Wang wrote:
After digging into the R source, it turns out that the argument `exact`
has
nothing to do with the numeric precision. It only affects the statistic model used to compute the p-value. When `exact=TRUE` the true
distribution
of the statistic will be used. Otherwise, a normal approximation will
be
used. I think the documentation needs to be improved here, you can compute
the
exact p-value *only* when you do not have any ties in your data. If you have ties in your data you will get the p-value from the normal approximation no matter what value you put in `exact`. This behavior
should
be documented or a warning should be given when `exact=TRUE` and ties present. FYI, if the exact p-value is required, `pwilcox` function will be used
to
compute the p-value. There are no details on how it computes the pvalue
but
its C code seems to compute the probability table, so I assume it
computes
the exact p-value from the true distribution of the statistic, not a permutation or MC p-value.
My example shows that it does NOT use Monte Carlo, because
otherwise it uses some distribution. I believe the term "exact" means
that it uses the permutation distribution, though I could be mistaken.
If it's NOT a permutation distribution, I don't know what it is.
Spencer
Best, Jiefei On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwjf08 at gmail.com>
wrote:
Hey, I just want to point out that the word "exact" has two meanings. It
can
mean the numerically accurate p-value as Bogdan asked in his first
email,
or it could mean the p-value calculated from the exact distribution of
the
statistic(In this case, U stat). These two are actually not related,
even
though they all called "exact". Best, Jiefei On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves < spencer.graves at effectivedefense.org> wrote:
On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
thanks a lot, Vivek ! in other words, assuming that we work with
1000
data
points, shall we use EXACT = TRUE, it uses the normal approximation, while if EXACT=FALSE (for these large samples), it does not ?
As David Winsemius noted, the documentation is not clear.
Consider the following:
set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > >
wilcox.test(x,
y)$p.value [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > > wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 >
wilcox.test(x,
y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y, exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y, EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here: 1.172189e-25 and 4.123875e-32. The first one, I think, is the normal approximation, which is the same as exact=FALSE. I think that with exact=FALSE, you get a permutation distribution, though I'm not sure. You might try looking at "wilcox_test in package coin for exact, asymptotic and Monte Carlo conditional p-values, including in the presence of ties" to see if it is clearer. NOTE: R is case sensitive,
so
"EXACT" is a different variable from "exact". It is interpreted as an optional argument, which is not recognized and therefore ignored in
this
context.
Hope this helps.
Spencer
On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind at gmail.com>
wrote:
Hi Bogdan, You can also get the information from the link of the Wilcox.test
function
page. ?By default (if exact is not specified), an exact p-value is
computed
if
the samples contain less than 50 finite values and there are no
ties.
Otherwise, a normal approximation is used.? For more:
Hope this helps! Best, VD On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tanasa at gmail.com>
wrote:
Dear Peter, thanks a lot. yes, we can see a very precise p-value,
and
that
was the request from the journal. if I may ask another question please : what is the meaning of
"exact=TRUE"
or "exact=FALSE" in wilcox.test ? i can see that the "numerically precise" p-values are different.
thanks a
lot ! tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 8.535524e-25 tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) tst$p.value [1] 3.448211e-25 On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < peter.langfelder at gmail.com> wrote:
I thinnk the answer is much simpler. The print method for
hypothesis
tests (class htest) truncates the p-values. In the above example, instead of using wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) and copying the output, just print the p-value: tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 2.988368e-32 I think this value is what the journal asks for. HTH, Peter On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves <spencer.graves at effectivedefense.org> wrote:
I would push back on that from two perspectives:
1. I would study exactly what the journal said
very
carefully. If they mandated "wilcox.test", that function has an argument called "exact". If that's what they are asking, then
using
that argument gives the exact p-value, e.g.:
> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
Wilcoxon rank sum exact test
data: rnorm(100) and rnorm(100, 2)
W = 691, p-value < 2.2e-16
2. If that's NOT what they are asking, then I'm
not
convinced what they are asking makes sense: There is is no such
thing
as an "exact p value" except to the extent that certain
assumptions
hold, and all models are wrong (but some are useful), as George
Box
famously said years ago.[1] Truth only exists in mathematics,
and
that's because it's a fiction to start with ;-)
Hope this helps.
Spencer Graves
[1]
https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
<
Dear all, i would appreciate having your advice on the following please : in R, the wilcox.test() provides "a p-value < 2.2e-16", when we
compare
sets of 1000 genes expression (in the genomics field).
however, the journal asks us to provide the exact p value ...
would it be legitimate to write : "p-value = 0" ? thanks a lot,
-- bogdan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- ---------------------------------------------------------- Vivek Das, PhD
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
For me, it was always clear based on the documentation that if there are ties, then the normal approximation is used (irrespective of what 'exact' is set to). In fact, if there are ties, the output even tells you that this is happening: wilcox.test(c(1,3,2,2,4), exact=TRUE) [...] Warning message: In wilcox.test.default(c(1, 3, 2, 2, 4), exact = TRUE) : cannot compute exact p-value with ties Best, Wolfgang
-----Original Message----- From: Jiefei Wang [mailto:szwjf08 at gmail.com] Sent: Friday, 19 March, 2021 16:32 To: Viechtbauer, Wolfgang (SP) Cc: r-help Subject: Re: [R] about a p-value < 2.2e-16 Dear?Wolfgang, Thanks for the documentation, but the document only states the default behavior, it does not mention what would happen if we tell it to compute the exact p-value but the data has ties. I think this would be misleading as people might think their result is exact by specifying `exact=TRUE` but the truth is that their data contains ties and the result is from the normal approximation. Best, Jiefei On Fri, Mar 19, 2021 at 11:18 PM Viechtbauer, Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote: Dear Jiefei, This behavior is documented. From help(wilcox.test): "By default (if exact is not specified), an exact p-value is computed if the samples contain less than 50 finite values and there are no ties. Otherwise, a normal approximation is used." Best, Wolfgang
-----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jiefei Wang Sent: Friday, 19 March, 2021 15:52 To: Spencer Graves Cc: r-help; Bogdan Tanasa Subject: Re: [R] about a p-value < 2.2e-16 After digging into the R source, it turns out that the argument `exact` has nothing to do with the numeric precision. It only affects the statistic model used to compute the p-value. When `exact=TRUE` the true distribution of the statistic will be used. Otherwise, a normal approximation will be used. I think the documentation needs to be improved here, you can compute the exact p-value *only* when you do not have any ties in your data. If you have ties in your data you will get the p-value from the normal approximation no matter what value you put in `exact`. This behavior should be documented or a warning should be given when `exact=TRUE` and ties present. FYI, if the exact p-value is required, `pwilcox` function will be used to compute the p-value. There are no details on how it computes the pvalue but its C code seems to compute the probability table, so I assume it computes the exact p-value from the true distribution of the statistic, not a permutation or MC p-value. Best, Jiefei
Dear all, thank you all for comments and help. as far as i can see, shall we have samples of 1000 records, only "exact=FALSE" allows the code to run: wilcox.test(rnorm(1000), rnorm(1000, 2), exact=FALSE)$p.value [1] 7.304863e-231 shall i use "exact=TRUE", it runs out of memory on my 64GB RAM PC : wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value (the job is terminated by OS) shall you have any other suggestions, please let me know. thanks a lot !
On Fri, Mar 19, 2021 at 9:05 AM Bert Gunter <bgunter.4567 at gmail.com> wrote:
I **believe** -- if my old memory still serves-- that the "exact" specification uses a home grown version of the algorithm to calculate exact, or close approximations to the exact, permutation distribution originally developed by Cyrus Mehta, founder of StatXact software. Of course, examining the C code source would determine this, but I don't care to attempt this. If this is (no longer?) correct, please point this out. Best, Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang <szwjf08 at gmail.com> wrote:
Hi Spencer, Thanks for your test results, I do not know the answer as I haven't used wilcox.test for many years. I do not know if it is possible to compute the exact distribution of the Wilcoxon rank sum statistic, but I think it is very likely, as the document of `Wilcoxon` says: This distribution is obtained as follows. Let x and y be two random, independent samples of size m and n. Then the Wilcoxon rank sum statistic is the number of all pairs (x[i], y[j]) for which y[j] is not greater than x[i]. This statistic takes values between 0 and m * n, and its mean and variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively. As a nice feature of the non-parametric statistic, it is usually distribution-free so you can pick any distribution you like to compute the same statistic. I wonder if this is the case, but I might be wrong. Cheers, Jiefei On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves < spencer.graves at effectivedefense.org> wrote:
On 2021-3-19 9:52 AM, Jiefei Wang wrote:
After digging into the R source, it turns out that the argument
`exact`
has
nothing to do with the numeric precision. It only affects the
statistic
model used to compute the p-value. When `exact=TRUE` the true
distribution
of the statistic will be used. Otherwise, a normal approximation will
be
used. I think the documentation needs to be improved here, you can compute
the
exact p-value *only* when you do not have any ties in your data. If
you
have ties in your data you will get the p-value from the normal approximation no matter what value you put in `exact`. This behavior
should
be documented or a warning should be given when `exact=TRUE` and ties present. FYI, if the exact p-value is required, `pwilcox` function will be
used to
compute the p-value. There are no details on how it computes the
pvalue
but
its C code seems to compute the probability table, so I assume it
computes
the exact p-value from the true distribution of the statistic, not a permutation or MC p-value.
My example shows that it does NOT use Monte Carlo, because
otherwise it uses some distribution. I believe the term "exact" means
that it uses the permutation distribution, though I could be mistaken.
If it's NOT a permutation distribution, I don't know what it is.
Spencer
Best, Jiefei On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwjf08 at gmail.com>
wrote:
Hey, I just want to point out that the word "exact" has two meanings. It
can
mean the numerically accurate p-value as Bogdan asked in his first
email,
or it could mean the p-value calculated from the exact distribution
of
the
statistic(In this case, U stat). These two are actually not related,
even
though they all called "exact". Best, Jiefei On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves < spencer.graves at effectivedefense.org> wrote:
On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
thanks a lot, Vivek ! in other words, assuming that we work with
1000
data
points, shall we use EXACT = TRUE, it uses the normal approximation, while if EXACT=FALSE (for these large samples), it does not ?
As David Winsemius noted, the documentation is not clear.
Consider the following:
set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > >
wilcox.test(x,
y)$p.value [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > > wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 >
wilcox.test(x,
y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y, exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y, EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here: 1.172189e-25 and 4.123875e-32. The first one, I think, is the normal approximation, which is the same as exact=FALSE. I think that with exact=FALSE, you get a permutation distribution, though I'm not
sure.
You might try looking at "wilcox_test in package coin for exact, asymptotic and Monte Carlo conditional p-values, including in the presence of ties" to see if it is clearer. NOTE: R is case
sensitive,
so
"EXACT" is a different variable from "exact". It is interpreted as
an
optional argument, which is not recognized and therefore ignored in
this
context.
Hope this helps.
Spencer
On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind at gmail.com>
wrote:
Hi Bogdan, You can also get the information from the link of the Wilcox.test
function
page. ?By default (if exact is not specified), an exact p-value is
computed
if
the samples contain less than 50 finite values and there are no
ties.
Otherwise, a normal approximation is used.? For more:
Hope this helps! Best, VD On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tanasa at gmail.com>
wrote:
Dear Peter, thanks a lot. yes, we can see a very precise p-value,
and
that
was the request from the journal. if I may ask another question please : what is the meaning of
"exact=TRUE"
or "exact=FALSE" in wilcox.test ? i can see that the "numerically precise" p-values are different.
thanks a
lot ! tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 8.535524e-25 tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) tst$p.value [1] 3.448211e-25 On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < peter.langfelder at gmail.com> wrote:
I thinnk the answer is much simpler. The print method for
hypothesis
tests (class htest) truncates the p-values. In the above
example,
instead of using wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) and copying the output, just print the p-value: tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 2.988368e-32 I think this value is what the journal asks for. HTH, Peter On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves <spencer.graves at effectivedefense.org> wrote:
I would push back on that from two perspectives:
1. I would study exactly what the journal said
very
carefully. If they mandated "wilcox.test", that function has
an
argument called "exact". If that's what they are asking, then
using
that argument gives the exact p-value, e.g.:
> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
Wilcoxon rank sum exact test
data: rnorm(100) and rnorm(100, 2)
W = 691, p-value < 2.2e-16
2. If that's NOT what they are asking, then I'm
not
convinced what they are asking makes sense: There is is no
such
thing
as an "exact p value" except to the extent that certain
assumptions
hold, and all models are wrong (but some are useful), as George
Box
famously said years ago.[1] Truth only exists in mathematics,
and
that's because it's a fiction to start with ;-)
Hope this helps.
Spencer Graves
[1]
https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
<
Dear all, i would appreciate having your advice on the following please
:
in R, the wilcox.test() provides "a p-value < 2.2e-16", when
we
compare
sets of 1000 genes expression (in the genomics field). however, the journal asks us to provide the exact p value ... would it be legitimate to write : "p-value = 0" ? thanks a
lot,
-- bogdan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
-- ---------------------------------------------------------- Vivek Das, PhD
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
I have to ask since. Are you sure the journal simply means by exact p-value that they don?t want to see a p-value given as < 0.0001, for example, and simply want the actual number? I cannot imagine they really meant exact as in the p-value from some exact distribution.
Kevin E. Thorpe Head of Biostatistics, Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael's Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016 > On Mar 19, 2021, at 1:22 PM, Bogdan Tanasa <tanasa at gmail.com> wrote: > > EXTERNAL EMAIL: > > Dear all, thank you all for comments and help. > > as far as i can see, shall we have samples of 1000 records, only > "exact=FALSE" allows the code to run: > > wilcox.test(rnorm(1000), rnorm(1000, 2), exact=FALSE)$p.value > [1] 7.304863e-231 > > shall i use "exact=TRUE", it runs out of memory on my 64GB RAM PC : > > wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value > (the job is terminated by OS) > > shall you have any other suggestions, please let me know. thanks a lot ! > > On Fri, Mar 19, 2021 at 9:05 AM Bert Gunter <bgunter.4567 at gmail.com> wrote: > >> I **believe** -- if my old memory still serves-- that the "exact" >> specification uses a home grown version of the algorithm to calculate >> exact, or close approximations to the exact, permutation distribution >> originally developed by Cyrus Mehta, founder of StatXact software. Of >> course, examining the C code source would determine this, but I don't care >> to attempt this. >> >> If this is (no longer?) correct, please point this out. >> >> Best, >> >> Bert Gunter >> >> "The trouble with having an open mind is that people keep coming along and >> sticking things into it." >> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> >> >> On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang <szwjf08 at gmail.com> wrote: >> >>> Hi Spencer, >>> >>> Thanks for your test results, I do not know the answer as I haven't >>> used wilcox.test for many years. I do not know if it is possible to >>> compute >>> the exact distribution of the Wilcoxon rank sum statistic, but I think it >>> is very likely, as the document of `Wilcoxon` says: >>> >>> This distribution is obtained as follows. Let x and y be two random, >>> independent samples of size m and n. Then the Wilcoxon rank sum statistic >>> is the number of all pairs (x[i], y[j]) for which y[j] is not greater than >>> x[i]. This statistic takes values between 0 and m * n, and its mean and >>> variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively. >>> >>> As a nice feature of the non-parametric statistic, it is usually >>> distribution-free so you can pick any distribution you like to compute the >>> same statistic. I wonder if this is the case, but I might be wrong. >>> >>> Cheers, >>> Jiefei >>> >>> >>> On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves < >>> spencer.graves at effectivedefense.org> wrote: >>> >>>> >>>> >>>> On 2021-3-19 9:52 AM, Jiefei Wang wrote: >>>>> After digging into the R source, it turns out that the argument >>> `exact` >>>> has >>>>> nothing to do with the numeric precision. It only affects the >>> statistic >>>>> model used to compute the p-value. When `exact=TRUE` the true >>>> distribution >>>>> of the statistic will be used. Otherwise, a normal approximation will >>> be >>>>> used. >>>>> >>>>> I think the documentation needs to be improved here, you can compute >>> the >>>>> exact p-value *only* when you do not have any ties in your data. If >>> you >>>>> have ties in your data you will get the p-value from the normal >>>>> approximation no matter what value you put in `exact`. This behavior >>>> should >>>>> be documented or a warning should be given when `exact=TRUE` and ties >>>>> present. >>>>> >>>>> FYI, if the exact p-value is required, `pwilcox` function will be >>> used to >>>>> compute the p-value. There are no details on how it computes the >>> pvalue >>>> but >>>>> its C code seems to compute the probability table, so I assume it >>>> computes >>>>> the exact p-value from the true distribution of the statistic, not a >>>>> permutation or MC p-value. >>>> >>>> >>>> My example shows that it does NOT use Monte Carlo, because >>>> otherwise it uses some distribution. I believe the term "exact" means >>>> that it uses the permutation distribution, though I could be mistaken. >>>> If it's NOT a permutation distribution, I don't know what it is. >>>> >>>> >>>> Spencer >>>>> >>>>> Best, >>>>> Jiefei >>>>> >>>>> >>>>> >>>>> On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwjf08 at gmail.com> >>> wrote: >>>>> >>>>>> Hey, >>>>>> >>>>>> I just want to point out that the word "exact" has two meanings. It >>> can >>>>>> mean the numerically accurate p-value as Bogdan asked in his first >>>> email, >>>>>> or it could mean the p-value calculated from the exact distribution >>> of >>>> the >>>>>> statistic(In this case, U stat). These two are actually not related, >>>> even >>>>>> though they all called "exact". >>>>>> >>>>>> Best, >>>>>> Jiefei >>>>>> >>>>>> On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves < >>>>>> spencer.graves at effectivedefense.org> wrote: >>>>>> >>>>>>> >>>>>>> On 2021-3-19 12:54 AM, Bogdan Tanasa wrote: >>>>>>>> thanks a lot, Vivek ! in other words, assuming that we work with >>> 1000 >>>>>>> data >>>>>>>> points, >>>>>>>> >>>>>>>> shall we use EXACT = TRUE, it uses the normal approximation, >>>>>>>> >>>>>>>> while if EXACT=FALSE (for these large samples), it does not ? >>>>>>> >>>>>>> As David Winsemius noted, the documentation is not clear. >>>>>>> Consider the following: >>>>>>> >>>>>>>> set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > > >>> wilcox.test(x, >>>>>>> y)$p.value >>>>>>> [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > > >>>>>>> wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > >>> wilcox.test(x, >>>>>>> y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, >>>>>>> exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y, >>>>>>> exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y, >>>>>>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, >>>>>>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, >>>>>>> exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, >>>>>>> exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here: >>>>>>> 1.172189e-25 and 4.123875e-32. The first one, I think, is the normal >>>>>>> approximation, which is the same as exact=FALSE. I think that with >>>>>>> exact=FALSE, you get a permutation distribution, though I'm not >>> sure. >>>>>>> You might try looking at "wilcox_test in package coin for exact, >>>>>>> asymptotic and Monte Carlo conditional p-values, including in the >>>>>>> presence of ties" to see if it is clearer. NOTE: R is case >>> sensitive, >>>> so >>>>>>> "EXACT" is a different variable from "exact". It is interpreted as >>> an >>>>>>> optional argument, which is not recognized and therefore ignored in >>>> this >>>>>>> context. >>>>>>> Hope this helps. >>>>>>> Spencer >>>>>>> >>>>>>> >>>>>>>> On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind at gmail.com> >>>> wrote: >>>>>>>> >>>>>>>>> Hi Bogdan, >>>>>>>>> >>>>>>>>> You can also get the information from the link of the Wilcox.test >>>>>>> function >>>>>>>>> page. >>>>>>>>> >>>>>>>>> ?By default (if exact is not specified), an exact p-value is >>> computed >>>>>>> if >>>>>>>>> the samples contain less than 50 finite values and there are no >>> ties. >>>>>>>>> Otherwise, a normal approximation is used.? >>>>>>>>> >>>>>>>>> For more: >>>>>>>>> >>>>>>>>> >>>>>>> >>>> >>> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html >>>>>>>>> Hope this helps! >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> >>>>>>>>> VD >>>>>>>>> >>>>>>>>> >>>>>>>>> On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tanasa at gmail.com> >>>>>>> wrote: >>>>>>>>>> Dear Peter, thanks a lot. yes, we can see a very precise p-value, >>>> and >>>>>>> that >>>>>>>>>> was the request from the journal. >>>>>>>>>> >>>>>>>>>> if I may ask another question please : what is the meaning of >>>>>>> "exact=TRUE" >>>>>>>>>> or "exact=FALSE" in wilcox.test ? >>>>>>>>>> >>>>>>>>>> i can see that the "numerically precise" p-values are different. >>>>>>> thanks a >>>>>>>>>> lot ! >>>>>>>>>> >>>>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) >>>>>>>>>> tst$p.value >>>>>>>>>> [1] 8.535524e-25 >>>>>>>>>> >>>>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) >>>>>>>>>> tst$p.value >>>>>>>>>> [1] 3.448211e-25 >>>>>>>>>> >>>>>>>>>> On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < >>>>>>>>>> peter.langfelder at gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> I thinnk the answer is much simpler. The print method for >>>> hypothesis >>>>>>>>>>> tests (class htest) truncates the p-values. In the above >>> example, >>>>>>>>>>> instead of using >>>>>>>>>>> >>>>>>>>>>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) >>>>>>>>>>> >>>>>>>>>>> and copying the output, just print the p-value: >>>>>>>>>>> >>>>>>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) >>>>>>>>>>> tst$p.value >>>>>>>>>>> >>>>>>>>>>> [1] 2.988368e-32 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I think this value is what the journal asks for. >>>>>>>>>>> >>>>>>>>>>> HTH, >>>>>>>>>>> >>>>>>>>>>> Peter >>>>>>>>>>> >>>>>>>>>>> On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves >>>>>>>>>>> <spencer.graves at effectivedefense.org> wrote: >>>>>>>>>>>> I would push back on that from two perspectives: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 1. I would study exactly what the journal said >>>> very >>>>>>>>>>>> carefully. If they mandated "wilcox.test", that function has >>> an >>>>>>>>>>>> argument called "exact". If that's what they are asking, then >>>> using >>>>>>>>>>>> that argument gives the exact p-value, e.g.: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) >>>>>>>>>>>> >>>>>>>>>>>> Wilcoxon rank sum exact test >>>>>>>>>>>> >>>>>>>>>>>> data: rnorm(100) and rnorm(100, 2) >>>>>>>>>>>> W = 691, p-value < 2.2e-16 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> 2. If that's NOT what they are asking, then I'm >>>> not >>>>>>>>>>>> convinced what they are asking makes sense: There is is no >>> such >>>>>>> thing >>>>>>>>>>>> as an "exact p value" except to the extent that certain >>>> assumptions >>>>>>>>>>>> hold, and all models are wrong (but some are useful), as George >>>> Box >>>>>>>>>>>> famously said years ago.[1] Truth only exists in mathematics, >>> and >>>>>>>>>>>> that's because it's a fiction to start with ;-) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Hope this helps. >>>>>>>>>>>> Spencer Graves >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [1] >>>>>>>>>>>> https://en.wikipedia.org/wiki/All_models_are_wrong >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 2021-3-18 11:12 PM, Bogdan Tanasa wrote: >>>>>>>>>>>>> < >>>>>>> >>>> https://meta.stackexchange.com/questions/362285/about-a-p-value-2-2e-16 >>>>>>>>>>>>> Dear all, >>>>>>>>>>>>> >>>>>>>>>>>>> i would appreciate having your advice on the following please >>> : >>>>>>>>>>>>> >>>>>>>>>>>>> in R, the wilcox.test() provides "a p-value < 2.2e-16", when >>> we >>>>>>>>>> compare >>>>>>>>>>>>> sets of 1000 genes expression (in the genomics field). >>>>>>>>>>>>> >>>>>>>>>>>>> however, the journal asks us to provide the exact p value ... >>>>>>>>>>>>> >>>>>>>>>>>>> would it be legitimate to write : "p-value = 0" ? thanks a >>> lot, >>>>>>>>>>>>> >>>>>>>>>>>>> -- bogdan >>>>>>>>>>>>> >>>>>>>>>>>>> [[alternative HTML version deleted]] >>>>>>>>>>>>> >>>>>>>>>>>>> ______________________________________________ >>>>>>>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, >>>> see >>>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>>>>>>>> PLEASE do read the posting guide >>>>>>>>>>> http://www.R-project.org/posting-guide.html >>>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible >>>> code. >>>>>>>>>>>> ______________________________________________ >>>>>>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, >>> see >>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>>>>>>> PLEASE do read the posting guide >>>>>>>>>>> http://www.R-project.org/posting-guide.html >>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible >>> code. >>>>>>>>>> [[alternative HTML version deleted]] >>>>>>>>>> >>>>>>>>>> ______________________________________________ >>>>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, >>> see >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>>>>> PLEASE do read the posting guide >>>>>>>>>> http://www.R-project.org/posting-guide.html >>>>>>>>>> and provide commented, minimal, self-contained, reproducible >>> code. >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> ---------------------------------------------------------- >>>>>>>>> >>>>>>>>> Vivek Das, PhD >>>>>>>>> >>>>>>>> [[alternative HTML version deleted]] >>>>>>>> >>>>>>>> ______________________________________________ >>>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>>> PLEASE do read the posting guide >>>>>>> http://www.R-project.org/posting-guide.html >>>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>>> ______________________________________________ >>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>> PLEASE do read the posting guide >>>>>>> http://www.R-project.org/posting-guide.html >>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>>> >>>> >>>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thank you Kevin, their wording is "Please note that the exact p value should be provided, when possible, etc" by "exact p-value" i believe that they do mean indeed the actual number, and not to specify "exact=TRUE" ; as we are working with 1000 genes, shall i specify "exact=TRUE" on my PC, it runs out of memory ... wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value On Fri, Mar 19, 2021 at 11:10 AM Kevin Thorpe <kevin.thorpe at utoronto.ca> wrote:
I have to ask since. Are you sure the journal simply means by exact p-value that they don?t want to see a p-value given as < 0.0001, for example, and simply want the actual number? I cannot imagine they really meant exact as in the p-value from some exact distribution. -- Kevin E. Thorpe Head of Biostatistics, Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael's Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016
On Mar 19, 2021, at 1:22 PM, Bogdan Tanasa <tanasa at gmail.com> wrote: EXTERNAL EMAIL: Dear all, thank you all for comments and help. as far as i can see, shall we have samples of 1000 records, only "exact=FALSE" allows the code to run: wilcox.test(rnorm(1000), rnorm(1000, 2), exact=FALSE)$p.value [1] 7.304863e-231 shall i use "exact=TRUE", it runs out of memory on my 64GB RAM PC : wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value (the job is terminated by OS) shall you have any other suggestions, please let me know. thanks a lot ! On Fri, Mar 19, 2021 at 9:05 AM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
I **believe** -- if my old memory still serves-- that the "exact" specification uses a home grown version of the algorithm to calculate exact, or close approximations to the exact, permutation distribution originally developed by Cyrus Mehta, founder of StatXact software. Of course, examining the C code source would determine this, but I don't
care
to attempt this. If this is (no longer?) correct, please point this out. Best, Bert Gunter "The trouble with having an open mind is that people keep coming along
and
sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang <szwjf08 at gmail.com> wrote:
Hi Spencer, Thanks for your test results, I do not know the answer as I haven't used wilcox.test for many years. I do not know if it is possible to compute the exact distribution of the Wilcoxon rank sum statistic, but I think
it
is very likely, as the document of `Wilcoxon` says: This distribution is obtained as follows. Let x and y be two random, independent samples of size m and n. Then the Wilcoxon rank sum
statistic
is the number of all pairs (x[i], y[j]) for which y[j] is not greater
than
x[i]. This statistic takes values between 0 and m * n, and its mean and variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively. As a nice feature of the non-parametric statistic, it is usually distribution-free so you can pick any distribution you like to compute
the
same statistic. I wonder if this is the case, but I might be wrong. Cheers, Jiefei On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves < spencer.graves at effectivedefense.org> wrote:
On 2021-3-19 9:52 AM, Jiefei Wang wrote:
After digging into the R source, it turns out that the argument
`exact`
has
nothing to do with the numeric precision. It only affects the
statistic
model used to compute the p-value. When `exact=TRUE` the true
distribution
of the statistic will be used. Otherwise, a normal approximation will
be
used. I think the documentation needs to be improved here, you can compute
the
exact p-value *only* when you do not have any ties in your data. If
you
have ties in your data you will get the p-value from the normal approximation no matter what value you put in `exact`. This behavior
should
be documented or a warning should be given when `exact=TRUE` and ties present. FYI, if the exact p-value is required, `pwilcox` function will be
used to
compute the p-value. There are no details on how it computes the
pvalue
but
its C code seems to compute the probability table, so I assume it
computes
the exact p-value from the true distribution of the statistic, not a permutation or MC p-value.
My example shows that it does NOT use Monte Carlo, because
otherwise it uses some distribution. I believe the term "exact" means
that it uses the permutation distribution, though I could be mistaken.
If it's NOT a permutation distribution, I don't know what it is.
Spencer
Best, Jiefei On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwjf08 at gmail.com>
wrote:
Hey, I just want to point out that the word "exact" has two meanings. It
can
mean the numerically accurate p-value as Bogdan asked in his first
email,
or it could mean the p-value calculated from the exact distribution
of
the
statistic(In this case, U stat). These two are actually not related,
even
though they all called "exact". Best, Jiefei On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves < spencer.graves at effectivedefense.org> wrote:
On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
thanks a lot, Vivek ! in other words, assuming that we work with
1000
data
points, shall we use EXACT = TRUE, it uses the normal approximation, while if EXACT=FALSE (for these large samples), it does not ?
As David Winsemius noted, the documentation is not clear.
Consider the following:
set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > >
wilcox.test(x,
y)$p.value [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > > wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 >
wilcox.test(x,
y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y, exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y, EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here: 1.172189e-25 and 4.123875e-32. The first one, I think, is the
normal
approximation, which is the same as exact=FALSE. I think that with exact=FALSE, you get a permutation distribution, though I'm not
sure.
You might try looking at "wilcox_test in package coin for exact, asymptotic and Monte Carlo conditional p-values, including in the presence of ties" to see if it is clearer. NOTE: R is case
sensitive,
so
"EXACT" is a different variable from "exact". It is interpreted as
an
optional argument, which is not recognized and therefore ignored in
this
context.
Hope this helps.
Spencer
On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind at gmail.com>
wrote:
Hi Bogdan, You can also get the information from the link of the Wilcox.test
function
page. ?By default (if exact is not specified), an exact p-value is
computed
if
the samples contain less than 50 finite values and there are no
ties.
Otherwise, a normal approximation is used.? For more:
Hope this helps! Best, VD On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tanasa at gmail.com
wrote:
Dear Peter, thanks a lot. yes, we can see a very precise
p-value,
and
that
was the request from the journal. if I may ask another question please : what is the meaning of
"exact=TRUE"
or "exact=FALSE" in wilcox.test ? i can see that the "numerically precise" p-values are different.
thanks a
lot ! tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 8.535524e-25 tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) tst$p.value [1] 3.448211e-25 On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < peter.langfelder at gmail.com> wrote:
I thinnk the answer is much simpler. The print method for
hypothesis
tests (class htest) truncates the p-values. In the above
example,
instead of using wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) and copying the output, just print the p-value: tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 2.988368e-32 I think this value is what the journal asks for. HTH, Peter On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves <spencer.graves at effectivedefense.org> wrote:
I would push back on that from two perspectives:
1. I would study exactly what the journal said
very
carefully. If they mandated "wilcox.test", that function has
an
argument called "exact". If that's what they are asking, then
using
that argument gives the exact p-value, e.g.:
wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
Wilcoxon rank sum exact test
data: rnorm(100) and rnorm(100, 2)
W = 691, p-value < 2.2e-16
2. If that's NOT what they are asking, then I'm
not
convinced what they are asking makes sense: There is is no
such
thing
as an "exact p value" except to the extent that certain
assumptions
hold, and all models are wrong (but some are useful), as
George
Box
famously said years ago.[1] Truth only exists in mathematics,
and
that's because it's a fiction to start with ;-)
Hope this helps.
Spencer Graves
[1]
https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
<
Dear all, i would appreciate having your advice on the following please
:
in R, the wilcox.test() provides "a p-value < 2.2e-16", when
we
compare
sets of 1000 genes expression (in the genomics field). however, the journal asks us to provide the exact p value ... would it be legitimate to write : "p-value = 0" ? thanks a
lot,
-- bogdan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
-- ---------------------------------------------------------- Vivek Das, PhD
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Yes, Bogdan, that sounds *exactly* right. ;-) -- it runs out of memory trying to calculate the exact permutation distribution. What you apparently get with exact = FALSE is the exact answer( to within floating point arithmetic's approximation) to a normal approximation. ... and furthermore... I would imagine any random number below, say, 1e-100 would serve equally well and would be equally correct/incorrect. I also imagine that a sensible display of the paired differences or even just a count of how many of the thousand are, say, >0, would make even more sense than an overwrought and unnecessary p-value. But that is just my personal opinion of senseless standard scientific practice, and if anyone want to dispute it, please reply OFFLIST, though I would probably not disagree with any such criticism of my cynicism. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Fri, Mar 19, 2021 at 10:22 AM Bogdan Tanasa <tanasa at gmail.com> wrote:
Dear all, thank you all for comments and help. as far as i can see, shall we have samples of 1000 records, only "exact=FALSE" allows the code to run: wilcox.test(rnorm(1000), rnorm(1000, 2), exact=FALSE)$p.value [1] 7.304863e-231 shall i use "exact=TRUE", it runs out of memory on my 64GB RAM PC : wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value (the job is terminated by OS) shall you have any other suggestions, please let me know. thanks a lot ! On Fri, Mar 19, 2021 at 9:05 AM Bert Gunter <bgunter.4567 at gmail.com> wrote:
I **believe** -- if my old memory still serves-- that the "exact" specification uses a home grown version of the algorithm to calculate exact, or close approximations to the exact, permutation distribution originally developed by Cyrus Mehta, founder of StatXact software. Of course, examining the C code source would determine this, but I don't care to attempt this. If this is (no longer?) correct, please point this out. Best, Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang <szwjf08 at gmail.com> wrote:
Hi Spencer, Thanks for your test results, I do not know the answer as I haven't used wilcox.test for many years. I do not know if it is possible to compute the exact distribution of the Wilcoxon rank sum statistic, but I think it is very likely, as the document of `Wilcoxon` says: This distribution is obtained as follows. Let x and y be two random, independent samples of size m and n. Then the Wilcoxon rank sum statistic is the number of all pairs (x[i], y[j]) for which y[j] is not greater than x[i]. This statistic takes values between 0 and m * n, and its mean and variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively. As a nice feature of the non-parametric statistic, it is usually distribution-free so you can pick any distribution you like to compute the same statistic. I wonder if this is the case, but I might be wrong. Cheers, Jiefei On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves < spencer.graves at effectivedefense.org> wrote:
On 2021-3-19 9:52 AM, Jiefei Wang wrote:
After digging into the R source, it turns out that the argument
`exact`
has
nothing to do with the numeric precision. It only affects the
statistic
model used to compute the p-value. When `exact=TRUE` the true
distribution
of the statistic will be used. Otherwise, a normal approximation
will be
used. I think the documentation needs to be improved here, you can compute
the
exact p-value *only* when you do not have any ties in your data. If
you
have ties in your data you will get the p-value from the normal approximation no matter what value you put in `exact`. This behavior
should
be documented or a warning should be given when `exact=TRUE` and ties present. FYI, if the exact p-value is required, `pwilcox` function will be
used to
compute the p-value. There are no details on how it computes the
pvalue
but
its C code seems to compute the probability table, so I assume it
computes
the exact p-value from the true distribution of the statistic, not a permutation or MC p-value.
My example shows that it does NOT use Monte Carlo, because
otherwise it uses some distribution. I believe the term "exact" means
that it uses the permutation distribution, though I could be mistaken.
If it's NOT a permutation distribution, I don't know what it is.
Spencer
Best, Jiefei On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwjf08 at gmail.com>
wrote:
Hey, I just want to point out that the word "exact" has two meanings. It
can
mean the numerically accurate p-value as Bogdan asked in his first
email,
or it could mean the p-value calculated from the exact distribution
of
the
statistic(In this case, U stat). These two are actually not related,
even
though they all called "exact". Best, Jiefei On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves < spencer.graves at effectivedefense.org> wrote:
On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
thanks a lot, Vivek ! in other words, assuming that we work with
1000
data
points, shall we use EXACT = TRUE, it uses the normal approximation, while if EXACT=FALSE (for these large samples), it does not ?
As David Winsemius noted, the documentation is not clear.
Consider the following:
set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > >
wilcox.test(x,
y)$p.value [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > > wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 >
wilcox.test(x,
y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y, exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y, EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here: 1.172189e-25 and 4.123875e-32. The first one, I think, is the
normal
approximation, which is the same as exact=FALSE. I think that with exact=FALSE, you get a permutation distribution, though I'm not
sure.
You might try looking at "wilcox_test in package coin for exact, asymptotic and Monte Carlo conditional p-values, including in the presence of ties" to see if it is clearer. NOTE: R is case
sensitive,
so
"EXACT" is a different variable from "exact". It is interpreted as
an
optional argument, which is not recognized and therefore ignored in
this
context.
Hope this helps.
Spencer
On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind at gmail.com>
wrote:
Hi Bogdan, You can also get the information from the link of the Wilcox.test
function
page. ?By default (if exact is not specified), an exact p-value is
computed
if
the samples contain less than 50 finite values and there are no
ties.
Otherwise, a normal approximation is used.? For more:
Hope this helps! Best, VD On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tanasa at gmail.com
wrote:
Dear Peter, thanks a lot. yes, we can see a very precise
p-value,
and
that
was the request from the journal. if I may ask another question please : what is the meaning of
"exact=TRUE"
or "exact=FALSE" in wilcox.test ? i can see that the "numerically precise" p-values are different.
thanks a
lot ! tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 8.535524e-25 tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) tst$p.value [1] 3.448211e-25 On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < peter.langfelder at gmail.com> wrote:
I thinnk the answer is much simpler. The print method for
hypothesis
tests (class htest) truncates the p-values. In the above
example,
instead of using wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) and copying the output, just print the p-value: tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 2.988368e-32 I think this value is what the journal asks for. HTH, Peter On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves <spencer.graves at effectivedefense.org> wrote:
I would push back on that from two perspectives:
1. I would study exactly what the journal said
very
carefully. If they mandated "wilcox.test", that function has
an
argument called "exact". If that's what they are asking, then
using
that argument gives the exact p-value, e.g.:
> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
Wilcoxon rank sum exact test
data: rnorm(100) and rnorm(100, 2)
W = 691, p-value < 2.2e-16
2. If that's NOT what they are asking, then
I'm
not
convinced what they are asking makes sense: There is is no
such
thing
as an "exact p value" except to the extent that certain
assumptions
hold, and all models are wrong (but some are useful), as
George
Box
famously said years ago.[1] Truth only exists in
mathematics, and
that's because it's a fiction to start with ;-)
Hope this helps.
Spencer Graves
[1]
https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
<
Dear all, i would appreciate having your advice on the following
please :
in R, the wilcox.test() provides "a p-value < 2.2e-16", when
we
compare
sets of 1000 genes expression (in the genomics field). however, the journal asks us to provide the exact p value ... would it be legitimate to write : "p-value = 0" ? thanks a
lot,
-- bogdan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
-- ---------------------------------------------------------- Vivek Das, PhD
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Bogdan, I think the journal is asking about the exact value of the pvalue, it doesn't matter if it is from the exact distribution or normal approximation. However, it does not make any sense to report such a small pvlaue. If I was you, I would show the reviewers the exact pvalue they want and gently explain why you did not put it into your paper. If they insist that the number must be on the paper, then go ahead and do it. Best, Jiefei Bogdan Tanasa <tanasa at gmail.com> ? 2021?3?20??? ??2:39???
Thank you Kevin, their wording is "Please note that the exact p value should be provided, when possible, etc" by "exact p-value" i believe that they do mean indeed the actual number, and not to specify "exact=TRUE" ; as we are working with 1000 genes, shall i specify "exact=TRUE" on my PC, it runs out of memory ... wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value On Fri, Mar 19, 2021 at 11:10 AM Kevin Thorpe <kevin.thorpe at utoronto.ca> wrote:
I have to ask since. Are you sure the journal simply means by exact p-value that they don?t want to see a p-value given as < 0.0001, for example, and simply want the actual number? I cannot imagine they really meant exact as in the p-value from some
exact
distribution. -- Kevin E. Thorpe Head of Biostatistics, Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael's Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016
On Mar 19, 2021, at 1:22 PM, Bogdan Tanasa <tanasa at gmail.com> wrote: EXTERNAL EMAIL: Dear all, thank you all for comments and help. as far as i can see, shall we have samples of 1000 records, only "exact=FALSE" allows the code to run: wilcox.test(rnorm(1000), rnorm(1000, 2), exact=FALSE)$p.value [1] 7.304863e-231 shall i use "exact=TRUE", it runs out of memory on my 64GB RAM PC : wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value (the job is terminated by OS) shall you have any other suggestions, please let me know. thanks a lot
!
On Fri, Mar 19, 2021 at 9:05 AM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
I **believe** -- if my old memory still serves-- that the "exact" specification uses a home grown version of the algorithm to calculate exact, or close approximations to the exact, permutation distribution originally developed by Cyrus Mehta, founder of StatXact software. Of course, examining the C code source would determine this, but I don't
care
to attempt this. If this is (no longer?) correct, please point this out. Best, Bert Gunter "The trouble with having an open mind is that people keep coming along
and
sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang <szwjf08 at gmail.com>
wrote:
Hi Spencer, Thanks for your test results, I do not know the answer as I haven't used wilcox.test for many years. I do not know if it is possible to compute the exact distribution of the Wilcoxon rank sum statistic, but I
think
it
is very likely, as the document of `Wilcoxon` says: This distribution is obtained as follows. Let x and y be two random, independent samples of size m and n. Then the Wilcoxon rank sum
statistic
is the number of all pairs (x[i], y[j]) for which y[j] is not greater
than
x[i]. This statistic takes values between 0 and m * n, and its mean
and
variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively. As a nice feature of the non-parametric statistic, it is usually distribution-free so you can pick any distribution you like to
compute
the
same statistic. I wonder if this is the case, but I might be wrong. Cheers, Jiefei On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves < spencer.graves at effectivedefense.org> wrote:
On 2021-3-19 9:52 AM, Jiefei Wang wrote:
After digging into the R source, it turns out that the argument
`exact`
has
nothing to do with the numeric precision. It only affects the
statistic
model used to compute the p-value. When `exact=TRUE` the true
distribution
of the statistic will be used. Otherwise, a normal approximation
will
be
used. I think the documentation needs to be improved here, you can
compute
the
exact p-value *only* when you do not have any ties in your data. If
you
have ties in your data you will get the p-value from the normal approximation no matter what value you put in `exact`. This
behavior
should
be documented or a warning should be given when `exact=TRUE` and
ties
present. FYI, if the exact p-value is required, `pwilcox` function will be
used to
compute the p-value. There are no details on how it computes the
pvalue
but
its C code seems to compute the probability table, so I assume it
computes
the exact p-value from the true distribution of the statistic, not
a
permutation or MC p-value.
My example shows that it does NOT use Monte Carlo, because
otherwise it uses some distribution. I believe the term "exact"
means
that it uses the permutation distribution, though I could be
mistaken.
If it's NOT a permutation distribution, I don't know what it is.
Spencer
Best, Jiefei On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwjf08 at gmail.com>
wrote:
Hey, I just want to point out that the word "exact" has two meanings.
It
can
mean the numerically accurate p-value as Bogdan asked in his first
email,
or it could mean the p-value calculated from the exact
distribution
of
the
statistic(In this case, U stat). These two are actually not
related,
even
though they all called "exact". Best, Jiefei On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves < spencer.graves at effectivedefense.org> wrote:
On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
thanks a lot, Vivek ! in other words, assuming that we work with
1000
data
points, shall we use EXACT = TRUE, it uses the normal approximation, while if EXACT=FALSE (for these large samples), it does not ?
As David Winsemius noted, the documentation is not clear.
Consider the following:
set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > >
wilcox.test(x,
y)$p.value [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > > wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 >
wilcox.test(x,
y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y, exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y, EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here: 1.172189e-25 and 4.123875e-32. The first one, I think, is the
normal
approximation, which is the same as exact=FALSE. I think that
with
exact=FALSE, you get a permutation distribution, though I'm not
sure.
You might try looking at "wilcox_test in package coin for exact, asymptotic and Monte Carlo conditional p-values, including in the presence of ties" to see if it is clearer. NOTE: R is case
sensitive,
so
"EXACT" is a different variable from "exact". It is interpreted
as
an
optional argument, which is not recognized and therefore ignored
in
this
context.
Hope this helps.
Spencer
On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind at gmail.com>
wrote:
Hi Bogdan, You can also get the information from the link of the
Wilcox.test
function
page. ?By default (if exact is not specified), an exact p-value is
computed
if
the samples contain less than 50 finite values and there are no
ties.
Otherwise, a normal approximation is used.? For more:
Hope this helps! Best, VD On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <
tanasa at gmail.com
wrote:
Dear Peter, thanks a lot. yes, we can see a very precise
p-value,
and
that
was the request from the journal. if I may ask another question please : what is the meaning of
"exact=TRUE"
or "exact=FALSE" in wilcox.test ? i can see that the "numerically precise" p-values are
different.
thanks a
lot ! tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 8.535524e-25 tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) tst$p.value [1] 3.448211e-25 On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < peter.langfelder at gmail.com> wrote:
I thinnk the answer is much simpler. The print method for
hypothesis
tests (class htest) truncates the p-values. In the above
example,
instead of using wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) and copying the output, just print the p-value: tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 2.988368e-32 I think this value is what the journal asks for. HTH, Peter On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves <spencer.graves at effectivedefense.org> wrote:
I would push back on that from two perspectives:
1. I would study exactly what the journal
said
very
carefully. If they mandated "wilcox.test", that function
has
an
argument called "exact". If that's what they are asking,
then
using
that argument gives the exact p-value, e.g.:
wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
Wilcoxon rank sum exact test
data: rnorm(100) and rnorm(100, 2)
W = 691, p-value < 2.2e-16
2. If that's NOT what they are asking, then
I'm
not
convinced what they are asking makes sense: There is is no
such
thing
as an "exact p value" except to the extent that certain
assumptions
hold, and all models are wrong (but some are useful), as
George
Box
famously said years ago.[1] Truth only exists in
mathematics,
and
that's because it's a fiction to start with ;-)
Hope this helps.
Spencer Graves
[1]
https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
<
Dear all, i would appreciate having your advice on the following
please
:
in R, the wilcox.test() provides "a p-value < 2.2e-16",
when
we
compare
sets of 1000 genes expression (in the genomics field). however, the journal asks us to provide the exact p value
...
would it be legitimate to write : "p-value = 0" ? thanks a
lot,
-- bogdan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained,
reproducible
code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
-- ---------------------------------------------------------- Vivek Das, PhD
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
thanks a lot, Jiefei ! and thanks to all for your time and comments ! have a good weekend !
On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwjf08 at gmail.com> wrote:
Hi Bogdan, I think the journal is asking about the exact value of the pvalue, it doesn't matter if it is from the exact distribution or normal approximation. However, it does not make any sense to report such a small pvlaue. If I was you, I would show the reviewers the exact pvalue they want and gently explain why you did not put it into your paper. If they insist that the number must be on the paper, then go ahead and do it. Best, Jiefei Bogdan Tanasa <tanasa at gmail.com> ? 2021?3?20??? ??2:39???
Thank you Kevin, their wording is "Please note that the exact p value should be provided, when possible, etc" by "exact p-value" i believe that they do mean indeed the actual number, and not to specify "exact=TRUE" ; as we are working with 1000 genes, shall i specify "exact=TRUE" on my PC, it runs out of memory ... wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value On Fri, Mar 19, 2021 at 11:10 AM Kevin Thorpe <kevin.thorpe at utoronto.ca> wrote:
I have to ask since. Are you sure the journal simply means by exact p-value that they don?t want to see a p-value given as < 0.0001, for example, and simply want the actual number? I cannot imagine they really meant exact as in the p-value from some
exact
distribution. -- Kevin E. Thorpe Head of Biostatistics, Applied Health Research Centre (AHRC) Li Ka Shing Knowledge Institute of St. Michael's Assistant Professor, Dalla Lana School of Public Health University of Toronto email: kevin.thorpe at utoronto.ca Tel: 416.864.5776 Fax: 416.864.3016
On Mar 19, 2021, at 1:22 PM, Bogdan Tanasa <tanasa at gmail.com> wrote: EXTERNAL EMAIL: Dear all, thank you all for comments and help. as far as i can see, shall we have samples of 1000 records, only "exact=FALSE" allows the code to run: wilcox.test(rnorm(1000), rnorm(1000, 2), exact=FALSE)$p.value [1] 7.304863e-231 shall i use "exact=TRUE", it runs out of memory on my 64GB RAM PC : wilcox.test(rnorm(1000), rnorm(1000, 2), exact=TRUE)$p.value (the job is terminated by OS) shall you have any other suggestions, please let me know. thanks a
lot !
On Fri, Mar 19, 2021 at 9:05 AM Bert Gunter <bgunter.4567 at gmail.com>
wrote:
I **believe** -- if my old memory still serves-- that the "exact" specification uses a home grown version of the algorithm to calculate exact, or close approximations to the exact, permutation
distribution
originally developed by Cyrus Mehta, founder of StatXact software.
Of
course, examining the C code source would determine this, but I don't
care
to attempt this. If this is (no longer?) correct, please point this out. Best, Bert Gunter "The trouble with having an open mind is that people keep coming
along
and
sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Mar 19, 2021 at 8:42 AM Jiefei Wang <szwjf08 at gmail.com>
wrote:
Hi Spencer, Thanks for your test results, I do not know the answer as I haven't used wilcox.test for many years. I do not know if it is possible to compute the exact distribution of the Wilcoxon rank sum statistic, but I
think
it
is very likely, as the document of `Wilcoxon` says: This distribution is obtained as follows. Let x and y be two random, independent samples of size m and n. Then the Wilcoxon rank sum
statistic
is the number of all pairs (x[i], y[j]) for which y[j] is not
greater
than
x[i]. This statistic takes values between 0 and m * n, and its mean
and
variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively. As a nice feature of the non-parametric statistic, it is usually distribution-free so you can pick any distribution you like to
compute
the
same statistic. I wonder if this is the case, but I might be wrong. Cheers, Jiefei On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves < spencer.graves at effectivedefense.org> wrote:
On 2021-3-19 9:52 AM, Jiefei Wang wrote:
After digging into the R source, it turns out that the argument
`exact`
has
nothing to do with the numeric precision. It only affects the
statistic
model used to compute the p-value. When `exact=TRUE` the true
distribution
of the statistic will be used. Otherwise, a normal approximation
will
be
used. I think the documentation needs to be improved here, you can
compute
the
exact p-value *only* when you do not have any ties in your data.
If
you
have ties in your data you will get the p-value from the normal approximation no matter what value you put in `exact`. This
behavior
should
be documented or a warning should be given when `exact=TRUE` and
ties
present. FYI, if the exact p-value is required, `pwilcox` function will be
used to
compute the p-value. There are no details on how it computes the
pvalue
but
its C code seems to compute the probability table, so I assume it
computes
the exact p-value from the true distribution of the statistic,
not a
permutation or MC p-value.
My example shows that it does NOT use Monte Carlo, because
otherwise it uses some distribution. I believe the term "exact"
means
that it uses the permutation distribution, though I could be
mistaken.
If it's NOT a permutation distribution, I don't know what it is.
Spencer
Best, Jiefei On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwjf08 at gmail.com>
wrote:
Hey, I just want to point out that the word "exact" has two meanings.
It
can
mean the numerically accurate p-value as Bogdan asked in his
first
email,
or it could mean the p-value calculated from the exact
distribution
of
the
statistic(In this case, U stat). These two are actually not
related,
even
though they all called "exact". Best, Jiefei On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves < spencer.graves at effectivedefense.org> wrote:
On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
thanks a lot, Vivek ! in other words, assuming that we work
with
1000
data
points, shall we use EXACT = TRUE, it uses the normal approximation, while if EXACT=FALSE (for these large samples), it does not ?
As David Winsemius noted, the documentation is not clear.
Consider the following:
set.seed(1) > x <- rnorm(100) > y <- rnorm(100, 2) > >
wilcox.test(x,
y)$p.value [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 >
wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 >
wilcox.test(x,
y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y, exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y, EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y, exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here: 1.172189e-25 and 4.123875e-32. The first one, I think, is the
normal
approximation, which is the same as exact=FALSE. I think that
with
exact=FALSE, you get a permutation distribution, though I'm not
sure.
You might try looking at "wilcox_test in package coin for exact, asymptotic and Monte Carlo conditional p-values, including in
the
presence of ties" to see if it is clearer. NOTE: R is case
sensitive,
so
"EXACT" is a different variable from "exact". It is interpreted
as
an
optional argument, which is not recognized and therefore
ignored in
this
context.
Hope this helps.
Spencer
On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind at gmail.com
wrote:
Hi Bogdan, You can also get the information from the link of the
Wilcox.test
function
page. ?By default (if exact is not specified), an exact p-value is
computed
if
the samples contain less than 50 finite values and there are
no
ties.
Otherwise, a normal approximation is used.? For more:
Hope this helps! Best, VD On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <
tanasa at gmail.com
wrote:
Dear Peter, thanks a lot. yes, we can see a very precise
p-value,
and
that
was the request from the journal. if I may ask another question please : what is the meaning of
"exact=TRUE"
or "exact=FALSE" in wilcox.test ? i can see that the "numerically precise" p-values are
different.
thanks a
lot ! tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 8.535524e-25 tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE) tst$p.value [1] 3.448211e-25 On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder < peter.langfelder at gmail.com> wrote:
I thinnk the answer is much simpler. The print method for
hypothesis
tests (class htest) truncates the p-values. In the above
example,
instead of using wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) and copying the output, just print the p-value: tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE) tst$p.value [1] 2.988368e-32 I think this value is what the journal asks for. HTH, Peter On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves <spencer.graves at effectivedefense.org> wrote:
I would push back on that from two perspectives:
1. I would study exactly what the journal
said
very
carefully. If they mandated "wilcox.test", that function
has
an
argument called "exact". If that's what they are asking,
then
using
that argument gives the exact p-value, e.g.:
wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
Wilcoxon rank sum exact test
data: rnorm(100) and rnorm(100, 2)
W = 691, p-value < 2.2e-16
2. If that's NOT what they are asking, then
I'm
not
convinced what they are asking makes sense: There is is no
such
thing
as an "exact p value" except to the extent that certain
assumptions
hold, and all models are wrong (but some are useful), as
George
Box
famously said years ago.[1] Truth only exists in
mathematics,
and
that's because it's a fiction to start with ;-)
Hope this helps.
Spencer Graves
[1]
https://en.wikipedia.org/wiki/All_models_are_wrong
On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
<
Dear all, i would appreciate having your advice on the following
please
:
in R, the wilcox.test() provides "a p-value < 2.2e-16",
when
we
compare
sets of 1000 genes expression (in the genomics field). however, the journal asks us to provide the exact p value
...
would it be legitimate to write : "p-value = 0" ? thanks a
lot,
-- bogdan
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained,
reproducible
code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained,
reproducible
code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
-- ---------------------------------------------------------- Vivek Das, PhD
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.