Skip to content

any updates w.r.t. lapply, sapply, apply retaining classes

7 messages · Mike Williamson, Joshua Wiley, Richard M. Heiberger +1 more

#
Hi Mike,

This isn't really an answer to your question, but perhaps will serve
to continue discussion.  I think that there are some fundamental
issues when working special classes.  As a thought example, suppose I
wrote a class, "posreal", which inherits from the numeric class.  It
is only valid for positive, real numbers.  I use it in a package, but
do not develop methods for it.  A user comes along and creates a
vector, x that is a posreal.  Then tries: mean(x * -3).  Since I never
bothered to write a special method for mean for my class, R falls back
to the inherited numeric, but gives a value that is clearly not valid
for posreal.  What should happen?  S3 methods do not really have
validation, so in principle, one could write a function like:

f <- function(x) {
  vclass <- class(x)
  res <- mean(x)
  class(res) <- vclass
  return(res)
}

which "retains" the appropriate class, but in name only.  R core
cannot possibly know or imagine all classes that may be written that
inherit from more basic types but with possible special aspects and
requirements.  I think the inherited is considered to be more generic
and that is returned.  It is usually up to the user to ensure that the
function (whose methods were not specific to that special class but
the inherited) is valid for that class and can manually convert it
back:

res <- as.posreal(res)

What about lapply and sapply?  Neither are generic or have methods for
difftime, and so do some unexpected/desirable things.  Again, without
methods defined for a particular class, they cannot know what is
special or appropriate way to handle it, they use defaults which
sometimes work but may give unexpected or undesirable results, but
what else can be done?  (okay, they could just throw an error)  If a
function is naive about a class, it does not seem right to operate on
it using unknown methods and then pretend to be returning the same
type of data.  As it stands, they convert to a data type they know and
return that.

Now, you mention that for loops are slow in R, and this is true to a
degree.  However, the *apply functions are basically just internal
loops, so they do not really save you (they are certainly not
vectorized!), though they are more elegant than explicit loops IMO.
One way to use them while retaining class would be like:

sapply(seq_along(test), function(i) class(test[i]))

this is less efficient then sapply(test, class), but the overhead
drops considerably as the function does nontrivial calculations.
Finally, I find the (relatively) new compiler package really shines at
making functions that are just wrappers for for loops more efficient.
Take a look at the examples from:

require(compiler)
?cmpfun

I am not familiar with numPy so I do not know how it handles new
classes, but with some tweaks to my workflow, I do not find myself
running into problems with how R handles them.  I definitely
appreciate your position because I have been there...as I became more
familiar with R, classes, and methods, I find I work in a way that
avoids passing objects to functions that do not know how to handle
them properly.

Cheers,

Josh
On Thu, Nov 3, 2011 at 11:08 AM, Mike Williamson <this.is.mvw at gmail.com> wrote:

  
    
#
Two comments:

* sapply is generally only _slightly_ faster than a for loop

* it's almost always better to use vapply instead of sapply.

But I agree that simplify2array should be a generic so that you can
write custom methods to support new classes.

Hadley
#
I don't see why that command should be a problem because class()
returns a string.

A better example might be sapply(x, identity) which in general you
would hope to be identical to x:

x <- structure(1:10, class = "blah")
identical(x, sapply(x, identity))
# [1] FALSE

Hadley
#
Hi Mike,

I definitely understand your point.  I don't have any particularly
good ideas, though I think you might like S4, which is the newer
formal class/methods system.

As a note, I misspoke (or miswrote) that difftime inherits from
numeric---the mode is numeric, but does not inherit.

Cheers,

Josh
On Thu, Nov 3, 2011 at 4:49 PM, Mike Williamson <this.is.mvw at gmail.com> wrote: