Hi everybody, I have been interfacing some C++ library code into an R package but ran into optimization issues specific to memory management that require some insight into the GC. One of the C++ libraries returns simple vectors of integers, doubles and complex which are allocated and managed from the library itself. I cannot know the length of the array beforehand, so I cannot pre-allocate that memory through the GC. Right now I'm allocating via allocVector and copying all the data in it. However, this requires twice the amount of space (and time), and we're running out of memory when doing concurrent analysis. What I'd would like to do is: - "patch" the SEXP returned to R so that DATAPTR() points directly to the required address. - create a normal LISTSXP in the package, which holds a reference to all these objects, so that GC never takes place. - turn these objects read-only, or, at least, ensure that they are never free()d or remalloc()ed. overwriting the contents is not a critical issue. Would that approach work? Are there any alternative approaches? Any specific advice about turning these objects read-only? Thanks in advance.
Memory management issues
7 messages · Duncan Murdoch, Whit Armstrong, Yuri D'Elia +1 more
On 05/07/2009 10:54 AM, Yuri D'Elia wrote:
Hi everybody, I have been interfacing some C++ library code into an R package but ran into optimization issues specific to memory management that require some insight into the GC. One of the C++ libraries returns simple vectors of integers, doubles and complex which are allocated and managed from the library itself. I cannot know the length of the array beforehand, so I cannot pre-allocate that memory through the GC. Right now I'm allocating via allocVector and copying all the data in it. However, this requires twice the amount of space (and time), and we're running out of memory when doing concurrent analysis. What I'd would like to do is: - "patch" the SEXP returned to R so that DATAPTR() points directly to the required address.
The normal way to do what you want is to use an "external pointer". R assumes that memory management for those is handled completely externally. External pointers can have finalizers, so when you no longer have a need for the object, you can ask the external library to release it.
- create a normal LISTSXP in the package, which holds a reference to all these objects, so that GC never takes place.
The list would hold the external pointers, which act like references.
- turn these objects read-only, or, at least, ensure that they are never free()d or remalloc()ed. overwriting the contents is not a critical issue.
That won't happen. I wouldn't try to trick the memory manager into thinking that it allocated these things; that will likely just lead to problems. Duncan Murdoch
Would that approach work? Are there any alternative approaches? Any specific advice about turning these objects read-only? Thanks in advance.
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
If you are in control of the c++ library (i.e. it is not from a vendor), then you can also override the new operator of your object so that it allocates an SEXP. if you implement PROTECT/UNPROTECT calls correctly, then GC will not be a problem. The approach that I've taken with my time series library is that you specify a storage policy as a template parameter. If you are using regular c++, then vectors of double/int are just allocated normally in c++, however, if you specify the R storage backend, then the constructor allocates an SEXP of doubles and sets the object's pointer to the first element in the vector. The ojbect doesn't really know that it's using R's backend storage. Sources here: R backend storage policy: http://github.com/armstrtw/r.tslib.backend/tree/master tslib: http://github.com/armstrtw/tslib/tree/master -Whit
On Sun, Jul 5, 2009 at 10:54 AM, Yuri D'Elia<wavexx at users.sf.net> wrote:
Hi everybody, I have been interfacing some C++ library code into an R package but ran into optimization issues specific to memory management that require some insight into the GC. One of the C++ libraries returns simple vectors of integers, doubles and complex which are allocated and managed from the library itself. I cannot know the length of the array beforehand, so I cannot pre-allocate that memory through the GC. Right now I'm allocating via allocVector and copying all the data in it. However, this requires twice the amount of space (and time), and we're running out of memory when doing concurrent analysis. What I'd would like to do is: - "patch" the SEXP returned to R so that DATAPTR() points directly to ?the required address. - create a normal LISTSXP in the package, which holds a reference ?to all these objects, so that GC never takes place. - turn these objects read-only, or, at least, ensure that they are ?never free()d or remalloc()ed. overwriting the contents is not a ?critical issue. Would that approach work? Are there any alternative approaches? Any specific advice about turning these objects read-only? Thanks in advance.
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
In article <4A5102FF.8040303 at stats.uwo.ca>,
Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
What I'd would like to do is: - "patch" the SEXP returned to R so that DATAPTR() points directly to the required address.
The normal way to do what you want is to use an "external pointer". R assumes that memory management for those is handled completely externally. External pointers can have finalizers, so when you no longer have a need for the object, you can ask the external library to release it.
I don't think external pointers can be read from R sources like normal vectors, or am I wrong? Using external pointers would imply either proxy calls for every action you need to perform or deep copies (like I'm doing now).
I wouldn't try to trick the memory manager into thinking that it allocated these things; that will likely just lead to problems.
I'm not afraid of patching R sources somehow, if that's the only solution, but it's one that I would like to avoid.
In article <8ec76080907051259q4744d40bp46b2434b086d5adc at mail.gmail.com>,
Whit Armstrong <armstrong.whit at gmail.com> wrote:
If you are in control of the c++ library (i.e. it is not from a vendor), then you can also override the new operator of your object so that it allocates an SEXP. if you implement PROTECT/UNPROTECT calls correctly, then GC will not be a problem.
The library returns addresses which may not be from the top of the allocated object. For some pre-calculated values, I actually use a shared memory pool. I really need to be able to 'wire' this address directly into the SEXP. Like I wrote in another message, changing the GC code (if necessary) could be a viable option.
Sources here: R backend storage policy: http://github.com/armstrtw/r.tslib.backend/tree/master tslib: http://github.com/armstrtw/tslib/tree/master
I really like this approach, it would turn useful for many other projects :). Thanks
On 05/07/2009 6:05 PM, Yuri D'Elia wrote:
In article <4A5102FF.8040303 at stats.uwo.ca>, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
What I'd would like to do is: - "patch" the SEXP returned to R so that DATAPTR() points directly to the required address.
The normal way to do what you want is to use an "external pointer". R assumes that memory management for those is handled completely externally. External pointers can have finalizers, so when you no longer have a need for the object, you can ask the external library to release it.
I don't think external pointers can be read from R sources like normal vectors, or am I wrong?
No, you're right.
Using external pointers would imply either proxy calls for every action you need to perform or deep copies (like I'm doing now).
Yes. Duncan Murdoch
I wouldn't try to trick the memory manager into thinking that it allocated these things; that will likely just lead to problems.
I'm not afraid of patching R sources somehow, if that's the only solution, but it's one that I would like to avoid.
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On Jul 5, 2009, at 10:54 AM, Yuri D'Elia wrote:
Hi everybody, I have been interfacing some C++ library code into an R package but ran into optimization issues specific to memory management that require some insight into the GC. One of the C++ libraries returns simple vectors of integers, doubles and complex which are allocated and managed from the library itself. I cannot know the length of the array beforehand, so I cannot pre-allocate that memory through the GC. Right now I'm allocating via allocVector and copying all the data in it. However, this requires twice the amount of space (and time), and we're running out of memory when doing concurrent analysis. What I'd would like to do is: - "patch" the SEXP returned to R so that DATAPTR() points directly to the required address.
Why don't you just "patch" the library to use allocVector? That's most reliable and trivial to do. Messing around with internal SEXP representation is asking for trouble as that may change at any point without notice (note that all access is through functions to avoid that). Cheers, Simon
- create a normal LISTSXP in the package, which holds a reference to all these objects, so that GC never takes place. - turn these objects read-only, or, at least, ensure that they are never free()d or remalloc()ed. overwriting the contents is not a critical issue. Would that approach work? Are there any alternative approaches? Any specific advice about turning these objects read-only? Thanks in advance.
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel