Hi guys, Recently, when I try to repeat my method using R, which was implemented in Matlab before, I found that the p-values for the two-sample ks-test in the two languages are different, even with the same data and parameters (The p-value in R is greater than the one in Matlab). In the meanwhile, the p-values of two-sample ks-test are the same in Matlab and python. In addition, I also test the p-value in Mann-Whitney-Wilcoxon test and Poission test, the p-values for the tests in R and Matlab are also different. Of course, the difference in two-sample ks-test is the most significant. May anyone tell me the reason for that and which language is more reliable? Thanks in advance! Best, Xionghui Xionghui Zhou Ph.D. Research Fellow Division of Human Genetics Cincinnati Children?s Hospital Medical Center Phone: +1 (513) 636-4200 Email: Xionghui.Zhou at cchmc.org<mailto:Yaping.Liu at cchmc.org> Office: R1.1026 3333 Burnet Ave Cincinnati, OH 45229
[RsR] About the inconsistency p-value of two sample Kolmogorov–Smirnov test in R and Matlab
4 messages · Zhou, Xionghui, Phillip Alday
6 days later
Dear Xionghui, Cross-posting to two lists simultaneously generally isn't desirable. Can you provide a minimum working example (code+data) for this? Otherwise, it's hard to see what's going on. Best, Phillip
On 11/11/19 9:50 pm, Zhou, Xionghui wrote:
Hi guys, Recently, when I try to repeat my method using R, which was implemented in Matlab before, I found that the p-values for the two-sample ks-test in the two languages are different, even with the same data and parameters (The p-value in R is greater than the one in Matlab). In the meanwhile, the p-values of two-sample ks-test are the same in Matlab and python. In addition, I also test the p-value in Mann-Whitney-Wilcoxon test and Poission test, the p-values for the tests in R and Matlab are also different. Of course, the difference in two-sample ks-test is the most significant. May anyone tell me the reason for that and which language is more reliable? Thanks in advance! Best, Xionghui Xionghui Zhou Ph.D. Research Fellow Division of Human Genetics Cincinnati Children?s Hospital Medical Center Phone: +1 (513) 636-4200 Email: Xionghui.Zhou at cchmc.org<mailto:Yaping.Liu at cchmc.org> Office: R1.1026 3333 Burnet Ave Cincinnati, OH 45229 [[alternative HTML version deleted]]
_______________________________________________ R-SIG-Robust at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-robust
1 day later
Dear Phillip, Thanks for your reply. If I use two-sample KS test in R to test whether one vector is significantly smaller than another vector, I use the command below as a demo: p = ks.test((1:10), (20:200), alternative = "greater"). The p is 5.873e-09. However, If I use Matlab to do the same case: [~,p]=kstest2((1:10)',(20:200)','tail','larger'). P-value is 8.2222e-10. The differences are also present in other method, such as Wilcoxon rank sum test and Poission test. Thanks! Regards, Xionghui
?On 11/18/19, 11:26 AM, "Phillip Alday" <phillip.alday at mpi.nl> wrote:
Dear Xionghui,
Cross-posting to two lists simultaneously generally isn't desirable.
Can you provide a minimum working example (code+data) for this?
Otherwise, it's hard to see what's going on.
Best,
Phillip
On 11/11/19 9:50 pm, Zhou, Xionghui wrote:
> Hi guys,
>
>
> Recently, when I try to repeat my method using R, which was implemented in Matlab before, I found that the p-values for the two-sample ks-test in the two languages are different, even with the same data and parameters (The p-value in R is greater than the one in Matlab). In the meanwhile, the p-values of two-sample ks-test are the same in Matlab and python. In addition, I also test the p-value in Mann-Whitney-Wilcoxon test and Poission test, the p-values for the tests in R and Matlab are also different. Of course, the difference in two-sample ks-test is the most significant. May anyone tell me the reason for that and which language is more reliable? Thanks in advance!
>
>
> Best,
>
>
>
> Xionghui
>
>
> Xionghui Zhou Ph.D.
> Research Fellow
> Division of Human Genetics
> Cincinnati Children?s Hospital Medical Center
>
> Phone: +1 (513) 636-4200
> Email: Xionghui.Zhou at cchmc.org<mailto:Yaping.Liu at cchmc.org>
> Office: R1.1026
> 3333 Burnet Ave
> Cincinnati, OH 45229
>
>
> [[alternative HTML version deleted]]
>
>
> _______________________________________________
> R-SIG-Robust at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-robust
>
4 days later
Both MATLAB and R generate the same test statistic, but differ in their p-values. For a p-value that small, you're getting to the point where the probabilities involved are on the order of a random bit flip in your computer's memory, so I wouldn't worry about that too much. A quick look at the R documentation suggests that the approximation used can break down for small sample sizes and that an exact test is only possible with more data. MATLAB didn't provide as much detail about how they converted the test statistic to p-value. In other words, don't worry about it. You should be trying to interpret p values as a precise number anyway, especially on such a small sample. Best, Phillip
On 19/11/2019 22:35, Zhou, Xionghui wrote:
Dear Phillip,
Thanks for your reply. If I use two-sample KS test in R to test whether one vector is significantly smaller than another vector, I use the command below as a demo: p = ks.test((1:10), (20:200), alternative = "greater"). The p is 5.873e-09. However, If I use Matlab to do the same case: [~,p]=kstest2((1:10)',(20:200)','tail','larger'). P-value is 8.2222e-10. The differences are also present in other method, such as Wilcoxon rank sum test and Poission test. Thanks!
Regards,
Xionghui
?On 11/18/19, 11:26 AM, "Phillip Alday" <phillip.alday at mpi.nl> wrote:
Dear Xionghui,
Cross-posting to two lists simultaneously generally isn't desirable.
Can you provide a minimum working example (code+data) for this?
Otherwise, it's hard to see what's going on.
Best,
Phillip
On 11/11/19 9:50 pm, Zhou, Xionghui wrote:
> Hi guys,
>
>
> Recently, when I try to repeat my method using R, which was implemented in Matlab before, I found that the p-values for the two-sample ks-test in the two languages are different, even with the same data and parameters (The p-value in R is greater than the one in Matlab). In the meanwhile, the p-values of two-sample ks-test are the same in Matlab and python. In addition, I also test the p-value in Mann-Whitney-Wilcoxon test and Poission test, the p-values for the tests in R and Matlab are also different. Of course, the difference in two-sample ks-test is the most significant. May anyone tell me the reason for that and which language is more reliable? Thanks in advance!
>
>
> Best,
>
>
>
> Xionghui
>
>
> Xionghui Zhou Ph.D.
> Research Fellow
> Division of Human Genetics
> Cincinnati Children?s Hospital Medical Center
>
> Phone: +1 (513) 636-4200
> Email: Xionghui.Zhou at cchmc.org<mailto:Yaping.Liu at cchmc.org>
> Office: R1.1026
> 3333 Burnet Ave
> Cincinnati, OH 45229
>
>
> [[alternative HTML version deleted]]
>
>
> _______________________________________________
> R-SIG-Robust at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-robust
>