Background:
I have an algorithm which produces a large number of small polygons (of the
spatial kind) which I would like to use within R using objects from sp. I
can't predict the exact number of polygons a-priori, the polygons will be
grouped into regions, and each region will be filled sequentially, so an
appropriate C++ 'framework' (for the point of illustration) might be:
typedef std::pair<double, double> Point;
typedef std::vector<Point> Polygon;
typedef std::vector<Polygon> Polygons;
typedef std::vector<Polygons> Regions;
struct Holder {
void notifyNewRegion(void) const {
regions.push_back(Polygons());
}
template<typename Iter>
void addSubPoly(Iter b, Iter e) {
regions.back().push_back(Polygon(b, e));
}
private:
Regions regions;
};
where the reference_type of Iter is convertible to Point. In practice I use
pointers in a couple of places to avoid resizing in push_back becoming too
expensive.
To construct the corresponding sp::Polygon, sp::Polygons and
sp::SpatialPolygons at the end of the algorithm, I iterate over the result
turning each Polygon into a two column matrix and calling the C functions
corresponding to the 'constructors' for these objects.
This is all working fine, but I could cut my memory consumption in half if
I could construct the sp::Polygon objects in addSubPoly, and the
sp::Polygons objects in notifyNewRegion. My vector typedefs would then all
be:
typedef std::vector<SEXP>
Question:
What I'm not sure about (and finally my question) is: I will have datasets
where I have more than 10,000 SEXPs in the Polygon and Polygons objects for
a single region, and possibly more than 10,000 regions, so how do I PROTECT
all those SEXPs (noting that the protection stack is limited to 10,000 and
bearing in mind that I don't know how many there will be before I start)?
I am also interested in this just out of general curiosity.
Thoughts:
1) I could create an environment and store the objects themselves in there
while keeping pointers in the vectors, but am not sure if this would be
that efficient (guidance would be appreciated), or
2) Just keep them in R vectors and grow these myself (as push_back is doing
for me in the above), but that sounds like a pain and I'm not sure if the
objects or just the pointers would be copied when I reassigned things
(guidance would be appreciated again). Bare in mind that I keep pointers in
the vectors, but omitted that for the sake of clarity.
Is there some other R type that would be suited to this, or a general
approach?
Cheers and thanks in advance,
Simon Knapp
Holding a large number of SEXPs in C++
2 messages · Simon Knapp, Simon Urbanek
On Oct 17, 2014, at 7:31 AM, Simon Knapp <sleepingwell at gmail.com> wrote:
Background:
I have an algorithm which produces a large number of small polygons (of the
spatial kind) which I would like to use within R using objects from sp. I
can't predict the exact number of polygons a-priori, the polygons will be
grouped into regions, and each region will be filled sequentially, so an
appropriate C++ 'framework' (for the point of illustration) might be:
typedef std::pair<double, double> Point;
typedef std::vector<Point> Polygon;
typedef std::vector<Polygon> Polygons;
typedef std::vector<Polygons> Regions;
struct Holder {
void notifyNewRegion(void) const {
regions.push_back(Polygons());
}
template<typename Iter>
void addSubPoly(Iter b, Iter e) {
regions.back().push_back(Polygon(b, e));
}
private:
Regions regions;
};
where the reference_type of Iter is convertible to Point. In practice I use
pointers in a couple of places to avoid resizing in push_back becoming too
expensive.
To construct the corresponding sp::Polygon, sp::Polygons and
sp::SpatialPolygons at the end of the algorithm, I iterate over the result
turning each Polygon into a two column matrix and calling the C functions
corresponding to the 'constructors' for these objects.
This is all working fine, but I could cut my memory consumption in half if
I could construct the sp::Polygon objects in addSubPoly, and the
sp::Polygons objects in notifyNewRegion. My vector typedefs would then all
be:
typedef std::vector<SEXP>
Question:
What I'm not sure about (and finally my question) is: I will have datasets
where I have more than 10,000 SEXPs in the Polygon and Polygons objects for
a single region, and possibly more than 10,000 regions, so how do I PROTECT
all those SEXPs (noting that the protection stack is limited to 10,000 and
bearing in mind that I don't know how many there will be before I start)?
I am also interested in this just out of general curiosity.
Thoughts:
1) I could create an environment and store the objects themselves in there
while keeping pointers in the vectors, but am not sure if this would be
that efficient (guidance would be appreciated), or
2) Just keep them in R vectors and grow these myself (as push_back is doing
for me in the above), but that sounds like a pain and I'm not sure if the
objects or just the pointers would be copied when I reassigned things
(guidance would be appreciated again). Bare in mind that I keep pointers in
the vectors, but omitted that for the sake of clarity.
Is there some other R type that would be suited to this, or a general
approach?
Lists in R (LISTSXP aka pairlists) are suited to appending (since that is fast and trivial) and sequential processing. The only issue is that pairlists are slow for random access. If you only want to load the polygons and finalize, then you can hold them in a pairlist and at the end copy to a generic vector (if random access is expected). DB applications typically use a hybrid approach - allocate vector blocks and keep them in pairlists, but that's probably an overkill for your use (if you really cared about performance you wouldn't use sp objects for this ;)) Note that you only have to protect the top-level object, so you don't need to protect the individual elements. Cheers, Simon
Cheers and thanks in advance, Simon Knapp [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel