Skip to content

Very slow using S4 classes

3 messages · Martin Morgan, André Rossi

#
Dear Martin Morgan and Martin Maechler...

Here is an example of the computational time when a slot of a S4 class is of
another S4 class and when it is just one object. I'm sending you the data
file.

Thank you!

Best regards,

Andr? Rossi

############################################################

setClass("SupervisedExample",
    representation(
        attr.value = "ANY",
        target.value = "ANY"
))

setClass("StreamBuffer",
    representation=representation(
        examples = "list", #SupervisedExample
        max.length = "integer"
    ),
    prototype=list(
            max.length = as.integer(10000)
    )
)

b <- new("StreamBuffer")

load("~/Dropbox/dataList2.RData")

b at examples <- data #data is a list of SupervisedExample class.
user  system elapsed
 16.837   0.108  18.244
user  system elapsed
  0.024   0.000   0.026

############################################################


2011/9/10 Martin Morgan <mtmorgan at fhcrc.org>
#
Hi Andr?...
On 09/12/2011 07:20 AM, Andr? Rossi wrote:
For a reproducible example, I guess you have something like

   data <- replicate(10000, new("SupervisedExample"))
Yes, this is slow. [[<-,S4 is not as clever as [[<-,list and performs 
extra duplication, including those 10,000 S4 objects it contains.

As before, an improvement is to think in terms of vectors, maybe a 
'SupervisedExamples' class to act as a collection of examples

setClass("SupervisedExamples",
          representation=representation(
            attr.value = "list",
            target.value = "list"))

setClass("StreamBuffer",
          representation=representation(
            examples="SupervisedExamples"))

SupervisedExamples <-
     function(attr.value=vector("list", n),
              target.value=vector("list", n), n, ...)
{
     new("SupervisedExamples", attr.value=attr.value,
         target.value=target.value, ...)
}

StreamBuffer <-
     function(examples, ...)
{
     new("StreamBuffer", examples=examples, ...)
}

data <- SupervisedExamples(n=100000)

b <- StreamBuffer(data)

I then have

 > system.time({for (i in 1:100) data at attr.value[[1]] = 2 })
    user  system elapsed
   1.081   0.013   1.094
 > system.time({for (i in 1:100) b at examples@attr.value[[1]] <- 2})
    user  system elapsed
   4.283   0.000   4.295

(note the 10x increase in size); still slower, but this will be 
amortized when the updates are vectorized, e.g.,

 > idx = sample(length(b at examples@attr.value), 100)
 > system.time(b at examples@attr.value[idx] <- list(2))
    user  system elapsed
   0.013   0.000   0.014

A further change might be to recognize 'StreamBuffer' as an abstract 
class that SupervisedExamples extends

setClass("StreamBuffer",
          representation=representation(
            "VIRTUAL", max.len="integer"),
          prototype=prototype(max.len=100000L),
          validity=function(object) {
              if (object at max.len < length(object))
                  "too many elements"
              else TRUE
          })

setMethod(length, "StreamBuffer", function(x) {
     stop("'length' undefined on '", class(x), "'")
})

setClass("SupervisedExamples",
          representation=representation(
            attr.value = "list",
            target.value = "list"),
          contains="StreamBuffer")

setMethod(length, "SupervisedExamples", function(x) {
     length(x at attr.value)
})

SupervisedExamples <-
     function(attr.value=vector("list", n),
              target.value=vector("list", n), n, ...)
{
     new("SupervisedExamples", attr.value=attr.value,
         target.value=target.value, ...)
}

data <- SupervisedExamples(n=100000)

 > system.time({for (i in 1:100) data at attr.value[[1]] = 2 })
    user  system elapsed
   1.043   0.014   1.061

Martin Morgan