Skip to content

bug with mapply() on an S4 object

6 messages · Hervé Pagès, Martin Maechler, Cook, Malcolm

#
Hi,

Starting with ordinary vectors, so we know what to expect:

   > mapply(function(x, y) {x * y}, 101:106, rep(1:3, 2))
   [1] 101 204 309 104 210 318

   > mapply(function(x, y) {x * y}, 101:106, 1:3)
   [1] 101 204 309 104 210 318

Now with an S4 object:

   setClass("A", representation(aa="integer"))
   a <- new("A", aa=101:106)

   > length(a)
   [1] 1

Implementing length():

   setMethod("length", "A", function(x) length(x at aa))

Testing length():

   > length(a)  # sanity check
   [1] 6

No [[ yet for those objects so the following error is expected:

   > mapply(function(x, y) {x * y}, a, rep(1:3, 2))
   Error in dots[[1L]][[1L]] : this S4 class is not subsettable

Implementing [[:

   setMethod("[[", "A", function(x, i, j, ...) x at aa[[i]])

Testing [[:

   > a[[1]]
   [1] 101
   > a[[5]]
   [1] 105

Trying mapply again:

   > mapply(function(x, y) {x * y}, a, rep(1:3, 2))
   [1] 101 202 303 101 202 303

Wrong. It looks like internally a[[1]] is always used instead of a[[i]].

The real problem it seems is that 'a' is treated as if it was of
length 1:

   > mapply(function(x, y) {x * y}, a, 1:3)
   [1] 101 202 303
   > mapply(function(x, y) {x * y}, a, 5)
   [1] 505

In other words, internal dispatch works for [[ but not for length().

Thanks,
H.
12 days later
#
Hi,

Here is a patch for this (against current R-devel). The "caching" of
the .Primitive for 'length' is taken from seq_along() C code (in
R-devel/src/main/seq.c).

hpages at thinkpad:~/svn/R$ svn diff R-devel
Index: R-devel/src/main/mapply.c
===================================================================
--- R-devel/src/main/mapply.c	(revision 61172)
+++ R-devel/src/main/mapply.c	(working copy)
@@ -32,14 +32,39 @@
      int i, j, m, named, zero = 0;
      R_xlen_t *lengths, *counters, longest = 0;
      SEXP vnames, fcall = R_NilValue,  mindex, nindex, tmp1, tmp2, ans;
+    static SEXP length_op = NULL;

+    /* Store the .Primitive for 'length' for DispatchOrEval to use. */
+    if (length_op == NULL) {
+	SEXP R_lengthSymbol = install("length");
+	length_op = eval(R_lengthSymbol, R_BaseEnv);
+	if (TYPEOF(length_op) != BUILTINSXP) {
+	    length_op = NULL;
+	    error("'length' is not a BUILTIN");
+	}
+	R_PreserveObject(length_op);
+    }
+
      m = length(varyingArgs);
      vnames = PROTECT(getAttrib(varyingArgs, R_NamesSymbol));
      named = vnames != R_NilValue;

      lengths = (R_xlen_t *)  R_alloc(m, sizeof(R_xlen_t));
      for(i = 0; i < m; i++){
-	lengths[i] = xlength(VECTOR_ELT(varyingArgs, i));
+	int dispatch_ok = 0;
+	tmp1 = VECTOR_ELT(varyingArgs, i);
+	if (isObject(tmp1)) {
+	    /* Looks like DispatchOrEval() needs a pairlist. We reproduce what
+	       pairlist(tmp1) would do i.e. tmp2 <- as.pairlist(list(tmp1)).
+	       Is there a more direct way to go from tmp1 to tmp2? */
+	    PROTECT(tmp2 = allocVector(VECSXP, 1));
+	    SET_VECTOR_ELT(tmp2, 0, tmp1);
+	    PROTECT(tmp2 = coerceVector(tmp2, LISTSXP));
+	    dispatch_ok = DispatchOrEval(call, length_op, "length",
+					 tmp2, rho, &ans, 0, 1);
+	    UNPROTECT(2);
+	}
+	lengths[i] = dispatch_ok ? asInteger(ans) : xlength(tmp1);
  	if(lengths[i] == 0) zero++;
  	if (lengths[i] > longest) longest = lengths[i];
      }

Hopefully the bug can be fixed. Thanks!
H.
On 11/14/2012 09:42 PM, Herv? Pag?s wrote:

  
    
#
Some formatting issues when copy/pasting the patch in the body of the
email so I've attached the diff file.

Cheers,
H.
On 11/27/2012 04:56 PM, Herv? Pag?s wrote:

  
    
#
> Some formatting issues when copy/pasting the patch in the
    > body of the email so I've attached the diff file.

Thank you, Herv?.

I have committed (a slightly simplified version of) your patch
to R-devel (to become 2.16.0).
Backporting to '2.15.2 patched' would be a bit of work
-- mapply's C interface is not .Internal() there --
so I've kept it in R-devel.
> On 11/27/2012 04:56 PM, Herv? Pag?s wrote:
>> Hi,
    >> 
    >> Here is a patch for this (against current R-devel). The "caching" of
    >> the .Primitive for 'length' is taken from seq_along() C code (in
    >> R-devel/src/main/seq.c).
    >> 
    >> hpages at thinkpad:~/svn/R$ svn diff R-devel
    >> Index: R-devel/src/main/mapply.c
    >> ===================================================================
    >> --- R-devel/src/main/mapply.c    (revision 61172)
    >> +++ R-devel/src/main/mapply.c    (working copy)

[.............]

    >> lengths = (R_xlen_t *)  R_alloc(m, sizeof(R_xlen_t));
    >> for(i = 0; i < m; i++){
    >> -    lengths[i] = xlength(VECTOR_ELT(varyingArgs, i));
    >> +    int dispatch_ok = 0;
    >> +    tmp1 = VECTOR_ELT(varyingArgs, i);
    >> +    if (isObject(tmp1)) {
    >> +        /* Looks like DispatchOrEval() needs a pairlist. We reproduce what
    >> +           pairlist(tmp1) would do i.e. tmp2 <- as.pairlist(list(tmp1)).
    >> +           Is there a more direct way to go from tmp1 to tmp2? */

indeed, there is a more direct way:

        tmp2 = lang1(tmp1)

and that's what I've used in the commit.

    >> +        PROTECT(tmp2 = allocVector(VECSXP, 1));
    >> +        SET_VECTOR_ELT(tmp2, 0, tmp1);
    >> +        PROTECT(tmp2 = coerceVector(tmp2, LISTSXP));
    >> +        dispatch_ok = DispatchOrEval(call, length_op, "length",
    >> +                     tmp2, rho, &ans, 0, 1);
    >> +        UNPROTECT(2);
    >> +    }
    >> +    lengths[i] = dispatch_ok ? asInteger(ans) : xlength(tmp1);
    >> if(lengths[i] == 0) zero++;
    >> if (lengths[i] > longest) longest = lengths[i];
    >> }
    >> 
    >> Hopefully the bug can be fixed. Thanks!

Many thanks to you, Herv?!
Martin
>> On 11/14/2012 09:42 PM, Herv? Pag?s wrote:
>>> Hi,
    >>> 
    >>> Starting with ordinary vectors, so we know what to expect:
    >>> 
    >>> > mapply(function(x, y) {x * y}, 101:106, rep(1:3, 2))
    >>> [1] 101 204 309 104 210 318
    >>> 
    >>> > mapply(function(x, y) {x * y}, 101:106, 1:3)
    >>> [1] 101 204 309 104 210 318
    >>> 
    >>> Now with an S4 object:
    >>> 
    >>> setClass("A", representation(aa="integer"))
    >>> a <- new("A", aa=101:106)
    >>> 
    >>> > length(a)
    >>> [1] 1
    >>> 
    >>> Implementing length():
    >>> 
    >>> setMethod("length", "A", function(x) length(x at aa))
    >>> 
    >>> Testing length():
    >>> 
    >>> > length(a)  # sanity check
    >>> [1] 6
    >>> 
    >>> No [[ yet for those objects so the following error is expected:
    >>> 
    >>> > mapply(function(x, y) {x * y}, a, rep(1:3, 2))
    >>> Error in dots[[1L]][[1L]] : this S4 class is not subsettable
    >>> 
    >>> Implementing [[:
    >>> 
    >>> setMethod("[[", "A", function(x, i, j, ...) x at aa[[i]])
    >>> 
    >>> Testing [[:
    >>> 
    >>> > a[[1]]
    >>> [1] 101
    >>> > a[[5]]
    >>> [1] 105
    >>> 
    >>> Trying mapply again:
    >>> 
    >>> > mapply(function(x, y) {x * y}, a, rep(1:3, 2))
    >>> [1] 101 202 303 101 202 303
    >>> 
    >>> Wrong. It looks like internally a[[1]] is always used instead of a[[i]].
    >>> 
    >>> The real problem it seems is that 'a' is treated as if it was of
    >>> length 1:
    >>> 
    >>> > mapply(function(x, y) {x * y}, a, 1:3)
    >>> [1] 101 202 303
    >>> > mapply(function(x, y) {x * y}, a, 5)
    >>> [1] 505
    >>> 
    >>> In other words, internal dispatch works for [[ but not for length().
    >>> 
    >>> Thanks,
    >>> H.
    >>> 
    >> 

    > -- 
    > Herv? Pag?s

    > Program in Computational Biology
    > Division of Public Health Sciences
    > Fred Hutchinson Cancer Research Center
    > 1100 Fairview Ave. N, M1-B514
    > P.O. Box 19024
    > Seattle, WA 98109-1024

    > E-mail: hpages at fhcrc.org
    > Phone:  (206) 667-5791
    > Fax:    (206) 667-1319

    > [DELETED ATTACHMENT external: mapply.diff, text/x-patch]
#
Yes, yes, excellent and great , I am tracking this development with great interest.

Am I correct that the implications for BioConductor is the tearing out of the Xapply from generics and the expecations that List and descendents would now "just work" with {t,mc,mcl,...}apply?  That would be a grand outcome.  

~malcolm_cook at stowers.org
> Some formatting issues when copy/pasting the patch in the
    > body of the email so I've attached the diff file.

Thank you, Herv?.

I have committed (a slightly simplified version of) your patch
to R-devel (to become 2.16.0).
Backporting to '2.15.2 patched' would be a bit of work
-- mapply's C interface is not .Internal() there --
so I've kept it in R-devel.
> On 11/27/2012 04:56 PM, Herv? Pag?s wrote:
>> Hi,
    >>
    >> Here is a patch for this (against current R-devel). The "caching" of
    >> the .Primitive for 'length' is taken from seq_along() C code (in
    >> R-devel/src/main/seq.c).
    >>
    >> hpages at thinkpad:~/svn/R$ svn diff R-devel
    >> Index: R-devel/src/main/mapply.c
    >> ===================================================================
    >> --- R-devel/src/main/mapply.c    (revision 61172)
    >> +++ R-devel/src/main/mapply.c    (working copy)

[.............]

    >> lengths = (R_xlen_t *)  R_alloc(m, sizeof(R_xlen_t));
    >> for(i = 0; i < m; i++){
    >> -    lengths[i] = xlength(VECTOR_ELT(varyingArgs, i));
    >> +    int dispatch_ok = 0;
    >> +    tmp1 = VECTOR_ELT(varyingArgs, i);
    >> +    if (isObject(tmp1)) {
    >> +        /* Looks like DispatchOrEval() needs a pairlist. We reproduce what
    >> +           pairlist(tmp1) would do i.e. tmp2 <- as.pairlist(list(tmp1)).
    >> +           Is there a more direct way to go from tmp1 to tmp2? */

indeed, there is a more direct way:

        tmp2 = lang1(tmp1)

and that's what I've used in the commit.

    >> +        PROTECT(tmp2 = allocVector(VECSXP, 1));
    >> +        SET_VECTOR_ELT(tmp2, 0, tmp1);
    >> +        PROTECT(tmp2 = coerceVector(tmp2, LISTSXP));
    >> +        dispatch_ok = DispatchOrEval(call, length_op, "length",
    >> +                     tmp2, rho, &ans, 0, 1);
    >> +        UNPROTECT(2);
    >> +    }
    >> +    lengths[i] = dispatch_ok ? asInteger(ans) : xlength(tmp1);
    >> if(lengths[i] == 0) zero++;
    >> if (lengths[i] > longest) longest = lengths[i];
    >> }
    >>
    >> Hopefully the bug can be fixed. Thanks!

Many thanks to you, Herv?!
Martin
>> On 11/14/2012 09:42 PM, Herv? Pag?s wrote:
>>> Hi,
    >>>
    >>> Starting with ordinary vectors, so we know what to expect:
    >>>
    >>> > mapply(function(x, y) {x * y}, 101:106, rep(1:3, 2))
    >>> [1] 101 204 309 104 210 318
    >>>
    >>> > mapply(function(x, y) {x * y}, 101:106, 1:3)
    >>> [1] 101 204 309 104 210 318
    >>>
    >>> Now with an S4 object:
    >>>
    >>> setClass("A", representation(aa="integer"))
    >>> a <- new("A", aa=101:106)
    >>>
    >>> > length(a)
    >>> [1] 1
    >>>
    >>> Implementing length():
    >>>
    >>> setMethod("length", "A", function(x) length(x at aa))
    >>>
    >>> Testing length():
    >>>
    >>> > length(a)  # sanity check
    >>> [1] 6
    >>>
    >>> No [[ yet for those objects so the following error is expected:
    >>>
    >>> > mapply(function(x, y) {x * y}, a, rep(1:3, 2))
    >>> Error in dots[[1L]][[1L]] : this S4 class is not subsettable
    >>>
    >>> Implementing [[:
    >>>
    >>> setMethod("[[", "A", function(x, i, j, ...) x at aa[[i]])
    >>>
    >>> Testing [[:
    >>>
    >>> > a[[1]]
    >>> [1] 101
    >>> > a[[5]]
    >>> [1] 105
    >>>
    >>> Trying mapply again:
    >>>
    >>> > mapply(function(x, y) {x * y}, a, rep(1:3, 2))
    >>> [1] 101 202 303 101 202 303
    >>>
    >>> Wrong. It looks like internally a[[1]] is always used instead of a[[i]].
    >>>
    >>> The real problem it seems is that 'a' is treated as if it was of
    >>> length 1:
    >>>
    >>> > mapply(function(x, y) {x * y}, a, 1:3)
    >>> [1] 101 202 303
    >>> > mapply(function(x, y) {x * y}, a, 5)
    >>> [1] 505
    >>>
    >>> In other words, internal dispatch works for [[ but not for length().
    >>>
    >>> Thanks,
    >>> H.
    >>>
    >>

    > --
    > Herv? Pag?s

    > Program in Computational Biology
    > Division of Public Health Sciences
    > Fred Hutchinson Cancer Research Center
    > 1100 Fairview Ave. N, M1-B514
    > P.O. Box 19024
    > Seattle, WA 98109-1024

    > E-mail: hpages at fhcrc.org
    > Phone:  (206) 667-5791
    > Fax:    (206) 667-1319

    > [DELETED ATTACHMENT external: mapply.diff, text/x-patch]

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
#
Hi Martin, Malcolm,

Thanks Martin for applying the fix.
On 11/28/2012 05:47 AM, Cook, Malcolm wrote:
Yes it's nice to have more things work out-of-the-box on
list-like or vector-like S4 objects. It makes implementation of
new List or Vector concrete subclasses easier and the overall code
for the 100+ subclasses currently defined in Bioconductor simpler,
cleaner, and easier to maintain (less code duplication).

In the particular case of base::mapply(), it's true that, with this
fix, we don't need to explicitly turn mapply() into a generic anymore
(in BiocGenerics), at least in theory, because now it works on objects
with length and [[ methods. However, we might want to keep the generic
in BiocGenerics anyway, because a BioC developer might still want to
override the default method (i.e. the mapply,ANY method, pointing to
base::mapply) with a method for particular objects that is semantically
equivalent but more efficient on those objects.

Cheers,
H.