Skip to content

Memory management issues

7 messages · Duncan Murdoch, Whit Armstrong, Yuri D'Elia +1 more

#
Hi everybody,

I have been interfacing some C++ library code into an R package but
ran into optimization issues specific to memory management that require
some insight into the GC.

One of the C++ libraries returns simple vectors of integers, doubles and
complex which are allocated and managed from the library itself. I
cannot know the length of the array beforehand, so I cannot
pre-allocate that memory through the GC.

Right now I'm allocating via allocVector and copying all the data in it.
However, this requires twice the amount of space (and time), and we're
running out of memory when doing concurrent analysis.

What I'd would like to do is:

- "patch" the SEXP returned to R so that DATAPTR() points directly to
  the required address. 

- create a normal LISTSXP in the package, which holds a reference
  to all these objects, so that GC never takes place.

- turn these objects read-only, or, at least, ensure that they are
  never free()d or remalloc()ed. overwriting the contents is not a
  critical issue.

Would that approach work?
Are there any alternative approaches?
Any specific advice about turning these objects read-only?

Thanks in advance.
#
On 05/07/2009 10:54 AM, Yuri D'Elia wrote:
The normal way to do what you want is to use an "external pointer".  R 
assumes that memory management for those is handled completely 
externally.  External pointers can have finalizers, so when you no 
longer have a need for the object, you can ask the external library to 
release it.
The list would hold the external pointers, which act like references.
That won't happen.

I wouldn't try to trick the memory manager into thinking that it 
allocated these things; that will likely just lead to problems.

Duncan Murdoch
#
If you are in control of the c++ library (i.e. it is not from a
vendor), then you can also override the new operator of your object so
that it allocates an SEXP.  if you implement PROTECT/UNPROTECT calls
correctly, then GC will not be a problem.

The approach that I've taken with my time series library is that you
specify a storage policy as a template parameter.  If you are using
regular c++, then vectors of double/int are just allocated normally in
c++, however, if you specify the R storage backend, then the
constructor allocates an SEXP of doubles and sets the object's pointer
to the first element in the vector.  The ojbect doesn't really know
that it's using R's backend storage.

Sources here:
R backend storage policy: http://github.com/armstrtw/r.tslib.backend/tree/master
tslib: http://github.com/armstrtw/tslib/tree/master

-Whit
On Sun, Jul 5, 2009 at 10:54 AM, Yuri D'Elia<wavexx at users.sf.net> wrote:
#
In article <4A5102FF.8040303 at stats.uwo.ca>,
Duncan Murdoch <murdoch at stats.uwo.ca> wrote:

            
I don't think external pointers can be read from R sources like normal 
vectors, or am I wrong?

Using external pointers would imply either proxy calls for every action 
you need to perform or deep copies (like I'm doing now).
I'm not afraid of patching R sources somehow, if that's the only 
solution, but it's one that I would like to avoid.
#
In article 
<8ec76080907051259q4744d40bp46b2434b086d5adc at mail.gmail.com>,
Whit Armstrong <armstrong.whit at gmail.com> wrote:

            
The library returns addresses which may not be from the top of the 
allocated object. For some pre-calculated values, I actually use a 
shared memory pool.

I really need to be able to 'wire' this address directly into the SEXP.

Like I wrote in another message, changing the GC code (if necessary) 
could be a viable option.
I really like this approach, it would turn useful for many other 
projects :).

Thanks
#
On 05/07/2009 6:05 PM, Yuri D'Elia wrote:
No, you're right.
Yes.

Duncan Murdoch
#
On Jul 5, 2009, at 10:54 AM, Yuri D'Elia wrote:

            
Why don't you just "patch" the library to use allocVector? That's most  
reliable and trivial to do. Messing around with internal SEXP  
representation is asking for trouble as that may change at any point  
without notice (note that all access is through functions to avoid  
that).

Cheers,
Simon