Kolmogorov-Smirnov tests: overflow
On Sun, 23 Jun 2002, Arne Mueller wrote:
Hello, thanks evrybody for your quick response. Yes, the two distributions should be discrete. The reason why I wanted to use ks.test was to get very rough idea whether two distributions are different. However, both distributions have a similar shape (but defenetely they are no normal distributions). However, i'm not very familiar with stats. Below yoy write that the two datasets are so large that they'll be significantly different anyway. Is that a general problem with large datasets?
That's not what I said, but what I did say is a standard problem in large datasets.
regards, Arne ripley at stats.ox.ac.uk wrote:
Both this and your previous post suggest that your data are from a discrete distribution (here as they have ties). The standard distribution of the KS test is inappropriate: see the first para of `Details' in ?ks.test. Even if it were not, your data sets would be so large that you would get statistical significance for practically insignificant differences, but if you really wanted to get some idea of the p value, there is a well-known asympototic expansion for the significance levels in terms of m and n. My memory is the there is a monograph by Jim Durbin on this, On Sun, 23 Jun 2002, Arne Mueller wrote:
Dear All, I've got a problem with ks.test. I've two realy large vectors, that I'd like to test, but I get an overflow, and the p-value cannot be calculated:
length(genomesv)
[1] 390025
length(scopv)
[1] 140002
ks.test(genomesv, scopv)
Two-sample Kolmogorov-Smirnov test
data: genomesv and scopv
D = 0.2081, p-value = NA
alternative hypothesis: two.sided
Warning messages:
1: NAs produced by integer overflow in: n.x * n.y
2: NAs produced by integer overflow in: n.x * n.y
3: cannot compute correct p-values with ties in: ks.test(genomesv,
scopv)
Is there anything I can do about this? I'd realy like to know what the
p-value is ;-)
thanks a lot for help,
Arne
--
Arne Mueller
Biomolecular Modelling Laboratory
Cancer Research UK, London Research Institute
44 Lincoln's Inn Fields
London WC2A 3PX, U.K.
phone1 : +44-(0)20-72693405 | fax : +44-(0)20-75945789
phone2 : +44-(0)20-75945776 | mobil: +44-(0)7984601749
email : a.mueller at cancer.org.uk | http://www.bmm.icnet.uk
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595
-- Arne Mueller Biomolecular Modelling Laboratory Cancer Research UK, London Research Institute 44 Lincoln's Inn Fields London WC2A 3PX, U.K. phone1 : +44-(0)20-72693405 | fax : +44-(0)20-75945789 phone2 : +44-(0)20-75945776 | mobil: +44-(0)7984601749 email : a.mueller at cancer.org.uk | http://www.bmm.icnet.uk
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._