unique possible bug
I see. For now: Yes, you need to change and recompile. I will take a look what was actually changed and will run some test cases. Best, Uwe
On 06.10.2011 16:50, Patrick McCann wrote:
The error I am referring to is in unique.c in Base R, it cannot accomodate greater than 2^29 values, even though it appears the overflow protection should be 2^30. The only relevance of the arules package is I was using it while I discovered this issue. Thanks, Patrick 2011/10/6 Uwe Ligges<ligges at statistik.tu-dortmund.de>:
On 05.10.2011 22:15, Patrick McCann wrote:
Hi, I am trying to read in a rather large list of transactions using the arules library.
You mean the arules package?
It seems in the coerce method into the dgCmatrix, it somewhere calls unique. Unique.c throws an error when n> 536870912; however, when 4*n was modified to 2*n in 2004, the overflow protection should have changed from 2^29 to 2^30, right? If so, how would I change it in my copy? Do I have to recompile everything?
Yes. There is also the way to ask the maintainer to improve it, but it won't work without reinstallation of the changed package sources. Uwe Ligges
Thanks, Patrick McCann Here is a simple to reproduce example:
runif(2^29+5)->a sum(unique(a))->b
Error in unique.default(a) : length 536870917 is too large for hashing
traceback()
3: unique.default(a) 2: unique(a) 1: unique(a)
unique.default
function (x, incomparables = FALSE, fromLast = FALSE, ...)
{
z<- .Internal(unique(x, incomparables, fromLast))
if (is.factor(x))
factor(z, levels = seq_len(nlevels(x)), labels = levels(x),
ordered = is.ordered(x))
else if (inherits(x, "POSIXct"))
structure(z, class = class(x), tzone = attr(x, "tzone"))
else if (inherits(x, "Date"))
structure(z, class = class(x))
else z
}
<environment: namespace:base>
/*
Choose M to be the smallest power of 2
not less than 2*n and set K = log2(M).
Need K>= 1 and hence M>= 2, and 2^M<= 2^31 -1, hence n<= 2^29.
Dec 2004: modified from 4*n to 2*n, since in the worst case we have
a 50% full table, and that is still rather efficient -- see
R. Sedgewick (1998) Algorithms in C++ 3rd edition p.606.
*/
static void MKsetup(int n, HashData *d)
{
int n4 = 2 * n;
if(n< 0 || n> 536870912) /* protect against overflow to -ve */
error(_("length %d is too large for hashing"), n);
d->M = 2;
d->K = 1;
while (d->M< n4) {
d->M *= 2;
d->K += 1;
}
}
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.