On Tue, 3 Jul 2001, A. G. McDowell wrote:
In message <Pine.GSO.4.31.0107010731110.7616-100000@auk.stats>, Prof Brian D Ripley <ripley@stats.ox.ac.uk> writes
You do realize that the Kolmogorov tests (and the Kolmogorov-Smirnov extension) assume continuous distributions, so the distribution theory is not valid in this case? S-PLUS does stop you doing this:
ks.gof(o, dist="binomial", size=100, prob=0.25)
Problem in not.cont1(ttest = d.test, nx = nx, alt.ex..: For testing
discrete distributions when sample size > 50, use the
Chi-square test
Thank you for your prompt reply to my bug report. While I agree that the distribution theory for the Kolmogorov tests assumes a continuous distribution, I would like to request a modification to the existing routines. The purpose of this would be to provide a result that would represent a conservative test in the case when the underlying distribution is discrete. This would be in accord with P 432 of the 3rd edition of "Practical Nonparametric Statistics", by Conover, and section 25.38 of "Kendall's Advanced Theory of Statistics, 6th Edition, Vol 2A", by Stewart, Ord, and Arnold, both of which refer to Noether (1963) "Note on the Kolmogorov Statistic in the discrete case", Metrika, 7, 115. Users reared on these and similar textbooks would be less surprised at the behaviour of R if this modification was made, whereas users who do not attempt to apply the Kolmogorov-Smirnov test to discrete distributions would not notice any difference.
(Hopefully readers of those textbooks would understand that the results you reported as a bug *are* the behaviour of KS test. Nowhere does R say it has implemented a modified KS test. The one data point we have suggests otherwise ....)
It would also be in accord with the behaviour of R in the two-sample
case, where the effect of the existing code seems to be to provide
a conservative test (since the statistic returned is no larger than
might be returned in any possible tie-breaking) coupled with a warning,
(to which I would have no objection in the one-sample case).
It seems to me that the following modification would suffice: replace
x <- y(sort(x), ...) - (0 : (n-1)) / n
with
x <- sort(x)
untied <- c(x[1:n-1] != x[2:n], TRUE)
x <- y(x, ...) - (0 : (n-1)) / n
x <- x[untied]
In your original examples, this reduces a sample of size 10000 to one of size 101 or 2. Conservative - yes. Useful - very unlikely!
Users dealing with data derived from continuous distributions would not see any difference, because (except with very small probability due to floating point inaccuracy) they would never produce tied data.
There are circumstances in which one would want the original KS definition for all data sets, where one wnats the test value and not the p value. I've added a warning, but I do not think we should be implementing a different definition.
Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._