Skip to content
Prev 10346 / 63424 Next

efficiency and memory use of S4 data objects

Thanks for your thoughtful and considered response, as always. I think I 
need to make my position a little more clear.

I develop software in R which for the most part I'm happy with. For the 
most part my code seems to be correct, fast, reliable, useful for me and 
for other people. But it is mostly either S3 for not oop at all. I am under 
lots of pressure from people I respect and would like to cooperate with to 
convert my code to S4. I am not entirely happy about this because I believe 
that converting to S4 will substantially reduce the size of data set that 
my code can handle and will substantially increase overall execution times. 
(The example in my previous post was not sufficient in itself to prove 
this, but more about that below.) There are other issues such as how to 
document S4 methods and how to pass RCMD check, but I would like to focus 
on the efficiency issue here.
At 01:05 AM 22/08/2003, John Chambers wrote:
I am sorry for giving the impression that I wanted to focus on the numeric 
computations. Of course it is the efficiency of the S4 classes and methods 
themselves that I am interested in. I deliberately chose an example which 
added a layer of functions and method dispatch, because that is what 
converting code to S4 does.

Here is another example with no user-defined methods:

 > system.time( structure(list(x=rep(1,10^7),class="MyS3Class")))
[1] 1.05 0.00 1.05   NA   NA
 > system.time( new("MyClass",x=rep(1,10^7)))
[1]  3.15  0.34 11.19    NA    NA

This seems to me to show that simply associating a formal S4 data class 
identity with a new object (no computation involved!) can increase the time 
required to create the object 11-fold compared with the S3 equivalent. 11 
seconds is a lot of time if you have a call to "new" at the end of every 
function in a large package, and some of these functions are called a very 
large number of times.
You seem to be suggesting that the effect might be an artifact of my 
particular artificial example, but all examples seem to be point in the 
same direction, i.e., that introducing S4 methods into code will slow it 
down. I can't construct any examples of S4 usage which are not at least 
slightly slower than the S3 equivalent. Can anyone else? I am not 
suggesting that I have identified the root cause any bottlenecks.
One can see plenty of realistic examples of S4 usage by trying the 
Bioconductor packages. But large realistic examples don't lend themselves 
easily to a post to r-devel.
I am already doing all computation-intensive operations in C or Fortran 
through appropriate use of R functions which themselves call C. I can't see 
how use of C can side-step the need to create S4 objects or to use S4 
methods in a package based on S4.

Regards
Gordon