Skip to content

unique possible bug

4 messages · Uwe Ligges, Patrick McCann

#
Hi,

I am trying to read in a rather large list of transactions using the
arules library. It seems in the coerce method into the dgCmatrix, it
somewhere calls unique. Unique.c throws an error when  n > 536870912;
however, when 4*n was modified to 2*n in 2004, the overflow protection
should have changed from 2^29 to 2^30, right? If so, how would I
change it in my copy? Do I have to recompile everything?

Thanks,
Patrick McCann


Here is a simple to reproduce example:
Error in unique.default(a) : length 536870917 is too large for hashing
3: unique.default(a)
2: unique(a)
1: unique(a)
function (x, incomparables = FALSE, fromLast = FALSE, ...)
{
    z <- .Internal(unique(x, incomparables, fromLast))
    if (is.factor(x))
        factor(z, levels = seq_len(nlevels(x)), labels = levels(x),
            ordered = is.ordered(x))
    else if (inherits(x, "POSIXct"))
        structure(z, class = class(x), tzone = attr(x, "tzone"))
    else if (inherits(x, "Date"))
        structure(z, class = class(x))
    else z
}
<environment: namespace:base>
/*
 Choose M to be the smallest power of 2
 not less than 2*n and set K = log2(M).
 Need K >= 1 and hence M >= 2, and 2^M <= 2^31 -1, hence n <= 2^29.

 Dec 2004: modified from 4*n to 2*n, since in the worst case we have
 a 50% full table, and that is still rather efficient -- see
 R. Sedgewick (1998) Algorithms in C++ 3rd edition p.606.
*/
static void MKsetup(int n, HashData *d)
{
   int n4 = 2 * n;

   if(n < 0 || n > 536870912) /* protect against overflow to -ve */
       error(_("length %d is too large for hashing"), n);
   d->M = 2;
   d->K = 1;
   while (d->M < n4) {
       d->M *= 2;
       d->K += 1;
   }
}
#
On 05.10.2011 22:15, Patrick McCann wrote:
You mean the arules package?
Yes.

There is also the way to ask the maintainer to improve it, but it won't 
work without reinstallation of the changed package sources.

Uwe Ligges
#
The error I am referring to is in unique.c in Base R, it cannot
accomodate greater than 2^29 values, even though it appears the
overflow protection should be 2^30. The only relevance of the arules
package is I was using it while I discovered this issue.

Thanks,
Patrick



2011/10/6 Uwe Ligges <ligges at statistik.tu-dortmund.de>:
#
I see.  For now: Yes, you need to change and recompile.
I will take a look what was actually changed and will run some test cases.

Best,
Uwe
On 06.10.2011 16:50, Patrick McCann wrote: