Skip to content

Size of a refClass instance

5 messages · Martin Morgan, Jeff Newmiller, David Kulp

#
I'm using refClass for a complex multi-directional tree structure with possibly 100,000s of nodes.  The refClass design is very impressive and I'd love to use it, but I've found that the size of refClass instances are very large and creation time is slow.  For example, below is a RefClass and normal S4 class.  The RefClass requires about 4KB per instance vs 500B for the S4 class -- based on adding the Ncells and Vcells of used memory reported by gc().  And instantiation is more than twice as slow for a RefClass.  (R 2.14.2)

Anyone have thoughts on this and whether there's any hope for improving resources on either front?  

I wonder what others are doing.  I've been thinking about lightweight alternative implementations, but nothing particularly elegant has come to mind, yet!

Thanks!


simple <- 
  setRefClass('simple', 
              fields = list(a = "character", b="numeric")
)
gc()
system.time(simple.list <- lapply(1:100000, function(i) { simple$new(a='foo',b=i) }))
gc()

setClass('simple2', representation(a="character",b="numeric"))
setMethod("initialize", "simple2",
          function(.Object, a, b) {
            .Object at a <- a
            .Object at b <- b
            .Object
          })

gc()
system.time(simple2.list <- lapply(1:100000, function(i) { new('simple2',a='foo',b=i) }))
gc()
1 day later
#
On 05/01/2013 11:20 AM, David Kulp wrote:
Hi David -- not necessarily helpful but creating a few large objects is always 
better than creating many small in R, so perhaps re-conceptualize your data 
structure? As a rough analogy, instead of constructing a graph as a large number 
of 'Node' instances each pointing to one another, a graph could be represented 
as a data.frame containing columns of 'from' and 'to' indexes (neighbour-edge 
list, a few large objects) or as an adjacency matrix. One would also implement 
creation and update of the few large objects in an R-friendly (vectorized) way.

Perhaps there are existing packages that already model the data you're 
interested in? If your multi-directional tree can be represented as a graph, 
then perhaps

   http://bioconductor.org/packages/release/bioc/html/graph.html

including facilities in the Boost graph library (RBGL, on the Bioconductor web 
site, too) or the igraph package can be put to use.

Martin

  
    
#
Good tip.  Thanks Morgan.
I agree that a different structure might (necessarily) be in order.  I wanted to create a tree where nodes in a tree were of different derived sub-classes -- possibly holding more data and behaving polymorphically.  OO programming seemed ideal for this: lots of small things with specialized behavior -- but this isn't R's strength.
On May 2, 2013, at 4:57 PM, Martin Morgan wrote:

            
#
Interesting conclusion. Alternatively, that representation of your object model may not be computationally effective. This discrepancy may be less exaggerated in C++, but you may still find that large numbers of objects are less efficient in their use of memory or cpu time than vector processing even there. I would read the point of Martin's response as "Don't confuse your mental model of the solution with its implementation".
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.
David Kulp <dkulp at fiksu.com> wrote:

            
1 day later
#
Yes, I agree.  How does one conceptually achieve polymorphic behavior without instantiating 100,000s of instances?  Perhaps one way around this is to represent the data in an efficient R way -- i.e. a data.frame -- and create a set of re-usable singleton instances of different node types.  To perform some polymorphic operation on a node, a singleton gets assigned to a node in the tree.  But behavior such as node$parent() or node$child(1) will require a small pool of these singletons.  Doable, I think.

PS. FWIW, I found another strike against the "massive tree of refClass instances".  It's save().  save() appears to unnecessarily expand/duplicate refClass structures.  Write time becomes prohibitive and loading in the data structure again results in a far greater memory usage.
On May 3, 2013, at 9:47 AM, Jeff Newmiller wrote: