We've seen the integer overflow problem in ks.test before, easily solved. The help page says x and y must be numeric, so this is user error. I've added tests to the code. Why do people file bug reports without reading the help/man page?
On Tue, 14 Jan 2003 bates@stat.wisc.edu wrote:
This was filed as a bug report on the Debian r-base package. It is more properly a bug report on the ctest package in R. The default method for wilcox.test manipulates x and y without checking the class or data.class of these objects. Possible solutions are - create wilcox.test.factor (if appropriate) - check the class and/or data.class of x and y in wilcox.test.default and produce error messages or warnings for inappropriate objects - coerce to numeric unconditionally (probably not a good idea) Martin Michlmayr <tbm@cyrius.com> writes:
Package: r-base Version: 1.5.0-2 / 1.6.1.cvs.20030103-1 Severity: normal I have some ordinal data and I wanted to perform an u-test. However, a problem occured:
x <- read.table("spss-3.txt", header=TRUE)
a = factor(x$a)
b = factor(x$b)
summary(a)
1 2 3 4 5 6 23900 20362 15238 10007 3399 472
summary(b)
1 2 3 4 5 6 23809 20649 15069 9952 3415 484
wilcox.test(a, b)
Wilcoxon rank sum test with continuity correction
data: a and b
W = 5384330884, p-value = NA
alternative hypothesis: true mu is not equal to 0
Warning messages:
1: "-" not meaningful for factors in: Ops.factor(x, mu)
2: NAs produced by integer overflow in: n.x * n.y
3: NAs produced by integer overflow in: n.x * n.y
Now there appear to be two issues: First of all, the NAs produced by integer overflow. Since they go away when I use less data, this looks like an R bug with big data sets. When I use less data, the warning goes away: 57:tbm@arborlon: ~] wc -l s 40000 s
summary(a)
1 2 3 4 5 6 13034 11086 8341 5412 1869 257
summary(b)
1 2 3 4 5 6 13034 11086 8341 5412 1869 257
wilcox.test(a, b)
Wilcoxon rank sum test with continuity correction
data: a and b
W = 1599920001, p-value = < 2.2e-16
alternative hypothesis: true mu is not equal to 0
Warning message:
"-" not meaningful for factors in: Ops.factor(x, mu)
However, I still don't know what the other warning is. I dont have an "-" in my data. I reduced the data to 2 lines and the problem still occurs:
summary(a)
2 3 1 1
summary(b)
2 3 1 1
wilcox.test(a, b)
Wilcoxon rank sum test
data: a and b
W = 4, p-value = 0.3333
alternative hypothesis: true mu is not equal to 0
Warning message:
"-" not meaningful for factors in: Ops.factor(x, mu)
The file is: 67:tbm@arborlon: ~] cat s a b 2 4 3 1 68:tbm@arborlon: ~] I'm not an R expert, so this might be a pilot error; but I don't see where. -- System Information: Debian Release: 3.0 Architecture: i386 Kernel: Linux regression 2.4.19-686 #1 Thu Aug 8 21:30:09 EST 2002 i686 Locale: LANG=en_US, LC_CTYPE=en_US Versions of packages r-base depends on: ii r-base-core 1.5.0-2 GNU R core of statistical computin ii r-base-html 1.5.0-2 GNU R html docs for statistical co ii r-base-latex 1.5.0-2 GNU R LaTeX docs for statistical c -- no debconf information -- Martin Michlmayr tbm@cyrius.com
Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595