Skip to content

[RsR] About the inconsistency p-value of two sample Kolmogorov–Smirnov test in R and Matlab

4 messages · Zhou, Xionghui, Phillip Alday

#
Hi guys,


Recently, when I try to repeat my method using R, which was implemented in Matlab before, I found that the p-values for the two-sample ks-test in the two languages are different, even with the same data and parameters (The p-value in R is greater than the one in Matlab). In the meanwhile, the p-values of two-sample ks-test are the same in Matlab and python. In addition, I also test the p-value in Mann-Whitney-Wilcoxon test and Poission test, the p-values for the tests in R and Matlab are also different. Of course, the difference in two-sample ks-test is the most significant. May anyone tell me the reason for that and which language is more reliable? Thanks in advance!


Best,



Xionghui


Xionghui Zhou Ph.D.
Research Fellow
Division of Human Genetics
Cincinnati Children?s Hospital Medical Center

Phone: +1 (513) 636-4200
Email: Xionghui.Zhou at cchmc.org<mailto:Yaping.Liu at cchmc.org>
Office: R1.1026
3333 Burnet Ave
Cincinnati, OH 45229
6 days later
#
Dear Xionghui,

Cross-posting to two lists simultaneously generally isn't desirable.

Can you provide a minimum working example (code+data) for this? 
Otherwise, it's hard to see what's going on.

Best,
Phillip
On 11/11/19 9:50 pm, Zhou, Xionghui wrote:
1 day later
#
Dear Phillip,
Thanks for your reply. If I use two-sample KS test in R to test whether one vector is significantly smaller than another vector, I use the command below as a demo:  p = ks.test((1:10), (20:200), alternative = "greater"). The p is 5.873e-09. However, If I use Matlab to do the same case: [~,p]=kstest2((1:10)',(20:200)','tail','larger'). P-value is 8.2222e-10. The differences are also present in other method, such as Wilcoxon rank sum test and Poission test. Thanks!

Regards,


Xionghui
?On 11/18/19, 11:26 AM, "Phillip Alday" <phillip.alday at mpi.nl> wrote:
Dear Xionghui,
    
    Cross-posting to two lists simultaneously generally isn't desirable.
    
    Can you provide a minimum working example (code+data) for this? 
    Otherwise, it's hard to see what's going on.
    
    Best,
    Phillip
On 11/11/19 9:50 pm, Zhou, Xionghui wrote:
> Hi guys,
    > 
    > 
    > Recently, when I try to repeat my method using R, which was implemented in Matlab before, I found that the p-values for the two-sample ks-test in the two languages are different, even with the same data and parameters (The p-value in R is greater than the one in Matlab). In the meanwhile, the p-values of two-sample ks-test are the same in Matlab and python. In addition, I also test the p-value in Mann-Whitney-Wilcoxon test and Poission test, the p-values for the tests in R and Matlab are also different. Of course, the difference in two-sample ks-test is the most significant. May anyone tell me the reason for that and which language is more reliable? Thanks in advance!
    > 
    > 
    > Best,
    > 
    > 
    > 
    > Xionghui
    > 
    > 
    > Xionghui Zhou Ph.D.
    > Research Fellow
    > Division of Human Genetics
    > Cincinnati Children?s Hospital Medical Center
    > 
    > Phone: +1 (513) 636-4200
    > Email: Xionghui.Zhou at cchmc.org<mailto:Yaping.Liu at cchmc.org>
    > Office: R1.1026
    > 3333 Burnet Ave
    > Cincinnati, OH 45229
    > 
    > 
    > 	[[alternative HTML version deleted]]
    > 
    > 
    > _______________________________________________
    > R-SIG-Robust at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-sig-robust
    >
4 days later
#
Both MATLAB and R generate the same test statistic, but differ in their
p-values. For a p-value that small, you're getting to the point where
the probabilities involved are on the order of a random bit flip in your
computer's memory, so I wouldn't worry about that too much. A quick look
at the R documentation suggests that the approximation used can break
down for small sample sizes and that an exact test is only possible with
more data. MATLAB didn't provide as much detail about how they converted
the test statistic to p-value.

In other words, don't worry about it. You should be trying to interpret
p values as a precise number anyway, especially on such a small sample.


Best,

Phillip
On 19/11/2019 22:35, Zhou, Xionghui wrote: