Skip to content

Proper way to define cbind, rbind for s4 classes in package

4 messages · Michael Lawrence, Mario Annau, Martin Maechler

#
Hi all,
this question has already been posted on stackoverflow, however without
success, see also
http://stackoverflow.com/questions/27886535/proper-way-to-use-cbind-rbind-with-s4-classes-in-package.

I have written a package using S4 classes and would like to use the
functions rbind, cbind with these defined classes.

Since it does not seem to be possible to define rbind and cbind directly
as S4 methods (see ?cBind) I defined rbind2 and cbind2 instead:

setMethod("rbind2", signature(x="ClassA", y = "ANY"),
    function(x, y) {
      # Do stuff ...
})

setMethod("cbind2", signature(x="ClassA", y = "ANY"),
    function(x, y) {
      # Do stuff ...
})
methods:::bind_activation to replace rbind and cbind from base.

I included the call in the package file R/zzz.R using the .onLoad function:

.onLoad <- function(...) {
  # Bind activation of cbind(2) and rbind(2) for S4 classes
  methods:::bind_activation(TRUE)
}
This works as expected. However, running R CMD check I am now getting
the following NOTE since I am using an unexported function in methods:

* checking dependencies in R code ... NOTE
Unexported object imported by a ':::' call: 'methods:::bind_activation'
  See the note in ?`:::` about the use of this operator.
How can I get rid of the NOTE and what is the proper way to define the
methods cbind and rbind for S4 classes in a package?

Best,
mario
#
On Sat, Jan 24, 2015 at 12:58 AM, Mario Annau <mario.annau at gmail.com> wrote:
This needs some clarification. It certainly is possible to define
cbind and rbind methods. The BiocGenerics package defines generics for
those and many methods are defined by e.g. S4Vectors, IRanges, etc.
The issue is that dispatch on "..." is singular, i.e., you can only
specify one class that all args in "..." must share (potentially
through inheritance). Thus, trying to combine objects from a different
hierarchy (or non-S4 objects) will not work. This has not been a huge
problem for us in practice. For example, we have a DataFrame object
that mimics data.frame. To cbind a data.frame with a DataFrame, the
user can just call the DataFrame() constructor. rbind() between
different data structures is much less common.

The cBind and rBind functions in Matrix (and the r/cbind that get
installed by bind_activation, the code is shared) work by recursing,
dropping the first argument until two are left, and then combining
with r/cbind2(). The Biobase package uses a similar strategy to mimic
c() via its non-standard combine() generic. The nice thing about the
combine() approach is the user entry point and the generic are the
same, instead of having methods on rbind2() and the user calling
rBind().

I would argue that bind_activation(TRUE) should be discouraged,
because it replaces the native rbind and cbind with recursive variants
that are going to cause problems, performance and otherwise. This is
why it is hidden. Perhaps a reasonable compromise would be for the
native cbind and rbind to check whether any arguments are S4 and if
so, resort to recursion. Recursion does seem to be a clean way to
implement "type promotion", i.e., to answer the question "which type
should the result be when faced with mixed-type args?".

Hopefully others have better ideas.

Michael
#
This is unfortunately an issue in my case since I would like to dispatch
on different classes.

To be more explicit than in the toy example, my actual method definition
is as follows:

setMethod("cbind2", signature(x="DataSet", y = "matrix"),
  function(x, y) {
	# Do stuff ...
}
setMethod("rbind2", signature(x="DataSet", y = "matrix"),
  function(x, y) {
	# Do stuff ...
}

The class DataSet actually wraps a pointer to a 2-dimensional HDF5
dataset. To make DataSet extensions more intuitive for the user I
thought that overloading cbind/rbind would be a good idea.

Best,
mario
1 day later
#
> On Sat, Jan 24, 2015 at 12:58 AM, Mario Annau
> <mario.annau at gmail.com> wrote:
>> Hi all, this question has already been posted on
    >> stackoverflow, however without success, see also
    >> http://stackoverflow.com/questions/27886535/proper-way-to-use-cbind-rbind-with-s4-classes-in-package.
    >> 
    >> I have written a package using S4 classes and would like
    >> to use the functions rbind, cbind with these defined
    >> classes.
    >> 
    >> Since it does not seem to be possible to define rbind and
    >> cbind directly as S4 methods (see ?cBind) I defined
    >> rbind2 and cbind2 instead:
    >> 

    > This needs some clarification. It certainly is possible to
    > define cbind and rbind methods. The BiocGenerics package
    > defines generics for those and many methods are defined by
    > e.g. S4Vectors, IRanges, etc.  The issue is that dispatch
    > on "..." is singular, i.e., you can only specify one class
    > that all args in "..." must share (potentially through
    > inheritance).

    > Thus, trying to combine objects from a
    > different hierarchy (or non-S4 objects) will not
    > work. 

Yes, indeed, that's the drawback

I've been there almost surely before everyone else, with the
Matrix package...
and I have been the author of  
    cbind2(), rbind2(), and of course, of  cBind(), and rBind().

At the time when I introduced these, the above possibility of
writing S4 methods for  '...'  where not yet part of R.

    > This has not been a huge problem for us in
    > practice. For example, we have a DataFrame object that
    > mimics data.frame. To cbind a data.frame with a DataFrame,
    > the user can just call the DataFrame()
    > constructor. rbind() between different data structures is
    > much less common.

well... yes and no.  Think of using the Matrix package, maybe
with another package that defines another generalized matrix class...
It would be nice if things worked automatically / perfectly there.

    > The cBind and rBind functions in Matrix (and the r/cbind
    > that get installed by bind_activation, the code is shared)
    > work by recursing, dropping the first argument until two
    > are left, and then combining with r/cbind2(). The Biobase
    > package uses a similar strategy to mimic c() via its
    > non-standard combine() generic. The nice thing about the
    > combine() approach is the user entry point and the generic
    > are the same, instead of having methods on rbind2() and
    > the user calling rBind().

    > I would argue that bind_activation(TRUE) should be
    > discouraged, 

Yes, you are right Michael; it should be discouraged at least to
be run in a *package*.
One could think of its use by an explicit user call.

    > because it replaces the native rbind and
    > cbind with recursive variants that are going to cause
    > problems, performance and otherwise. This is why it is
    > hidden. Perhaps a reasonable compromise would be for the
    > native cbind and rbind to check whether any arguments are
    > S4 and if so, resort to recursion. Recursion does seem to
    > be a clean way to implement "type promotion", i.e., to
    > answer the question "which type should the result be when
    > faced with mixed-type args?".

Exactly.  That has been my idea at the time ..
((yes, I'm also the author of the  bind_activation() 
  "(mis)functionality".))

    > Hopefully others have better ideas.

that would be great.

And even if not, it would be great if we could implement your
idea
    > Perhaps a reasonable compromise would be for the
    > native cbind and rbind to check whether any arguments are
    > S4 and if so, resort to recursion.

without a noticable performance penalty in the case of no S4
arguments.

Martin


    > Michael

    >> setMethod("rbind2", signature(x="ClassA", y = "ANY"),
    >> function(x, y) { # Do stuff ...  })
    >> 
    >> setMethod("cbind2", signature(x="ClassA", y = "ANY"),
    >> function(x, y) { # Do stuff ...  })
    >> 
    >> >From ?cbind2 I learned that these functions need to be
    >> activated using methods:::bind_activation to replace
    >> rbind and cbind from base.
    >> 
    >> I included the call in the package file R/zzz.R using the
    >> .onLoad function:
    >> 
    >> .onLoad <- function(...) { # Bind activation of cbind(2)
    >> and rbind(2) for S4 classes
    >> methods:::bind_activation(TRUE) } This works as
    >> expected. However, running R CMD check I am now getting
    >> the following NOTE since I am using an unexported
    >> function in methods:
    >> 
    >> * checking dependencies in R code ... NOTE Unexported
    >> object imported by a ':::' call:
    >> 'methods:::bind_activation' See the note in ?`:::` about
    >> the use of this operator.  How can I get rid of the NOTE
    >> and what is the proper way to define the methods cbind
    >> and rbind for S4 classes in a package?
    >> 
    >> Best, mario
    >> 
    >> ______________________________________________
    >> R-devel at r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel