(PR#1007) [Rd] ks.test doesn't compute correct empirical
On Sun, 1 Jul 2001 mcdowella@mcdowella.demon.co.uk wrote:
Full_Name: Andrew Grant McDowell Version: R 1.1.1 (but source in 1.3.0 looks fishy as well) OS: Windows 2K Professional (Consumer) Submission from: (NULL) (194.222.243.209)
Please upgrade: we've found a number of Win2k bugs and worked around them since then, let alone teh bug fixes and improvements in R ....
In article <xeQ_6.1949$xd.353840@typhoon.snet.net>, johnt@tman.dnsalias.com writes
Can someone help? In R, I am generating a vector of 1000 samples from Bin (1000, 0.25). I then do a Kolmogorov Smirnov test to test if the vector has been drawn from a population of Bin (1000, 0.25). I would expect a reasonably high p-value.....
You do realize that the Kolmogorov tests (and the Kolmogorov-Smirnov extension) assume continuous distributions, so the distribution theory is not valid in this case? S-PLUS does stop you doing this:
ks.gof(o, dist="binomial", size=100, prob=0.25)
Problem in not.cont1(ttest = d.test, nx = nx, alt.ex..: For testing
discrete distributions when sample size > 50, use the
Chi-square test
Either I am doing something wrong in R, or I am misunderstanding how this test should work (both quite possible)... Thanks, JT..
#### 1000 random samples from binomial dist with mean =.25, n=100... o<-rbinom (1000, 100, .25) mean (o);
[1] 25.178
var (o);
[1] 19.61193
ks.test (o, "pbinom", 100, .25);
One-sample Kolmogorov-Smirnov test
data: o
D = 0.0967, p-value = 1.487e-08
alternative hypothesis: two.sided
p-value is mighty small, leading me to reject the null hypothesis that
the sample has been drawn from the Bin(100, 0.25) distribution!!!
That's OK. That's not what you tested (see above). An S language point: the `;' are unnecessary.
Some more oddities:
o<-rbinom(10000, 1, 0.25) ks.test(o, "pbinom", 1, 0.25)
One-sample Kolmogorov-Smirnov test
data: o
D = 0.75, p-value = < 2.2e-16
alternative hypothesis: two.sided
length(o[o==0])
[1] 7491
length(o[o==1])
[1] 2509
o<-rep(0,10000) ks.test(o, "pbinom", 1, 0.25)
One-sample Kolmogorov-Smirnov test
data: o
D = 0.75, p-value = < 2.2e-16
alternative hypothesis: two.sided
length(o[o==0])
[1] 10000
length(o[o==1])
[1] 0 Here zeroing out the data does not change the reported D value
Nor does it change the maximum discrepancy.
ks.test(rep(1,10000), "pbinom", 1, 0.25)
One-sample Kolmogorov-Smirnov test data: rep(1, 10000) D = 1, p-value = < 2.2e-16 alternative hypothesis: two.sided shows 0 is special here.
After playing about with ks.test(c(rep(0, X), rep(1, 1000-x)), "pbinom", 1, p) for a bit I conjecture that ks.test() takes no account whatsoever of ties, but merely sorts the input values and looks for max (position/N - pbinom(value, 1, p)). Anybody got the source handy? After 30 minutes of download, the relevant part of ks.test.R would appear to be
Eh? Just type ks.test in your R session for the source ....
Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272860 (secr) Oxford OX1 3TG, UK Fax: +44 1865 272595 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._