On Sep 23, 2019, at 3:17 PM, Wang Jiefei <szwjf08 at gmail.com> wrote:
Sorry for post a lot of things, for the first part of code, I copied my C++
iter macro by mistake(and you can see an explicit type casting). Here is
the macro definition from R_exts/Itermacros.h
#define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype, \
strt, nfull, expr) do { \
* const** etype *px = DATAPTR_OR_NULL(sx); * \
if (px != NULL) { \
R_xlen_t __ibr_n__ = strt + nfull; \
R_xlen_t nb = __ibr_n__; \
for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) { \
expr \
} \
} \
else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, vtype,
\
strt, nfull, expr); \
} while (0)
Best,
Jiefei
On Mon, Sep 23, 2019 at 3:12 PM Wang Jiefei <szwjf08 at gmail.com> wrote:
Hi Gabriel,
I have tried the macro and found a small issue, it seems like the macro is
written in C and does an implicit type conversion(const void * to const int
*), see below. While it is allowed in C, C++ seems not happy with it. Is it
possible to add an explicit type casting so that it can be compatible with
both language?
#define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype, \
strt, nfull, expr) do { \
*const etype *px = (const** etype *)DATAPTR_OR_NULL(sx); *
\
if (px != NULL) { \
R_xlen_t __ibr_n__ = strt + nfull; \
R_xlen_t nb = __ibr_n__; \
for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) { \
expr \
} \
} \
else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype,
vtype, \
strt, nfull, expr); \
} while (0)
Also, I notice that the element type(etype) and vector type(vtype) has
to be specified in the macro. Since the SEXP is the first argument in the
macro, it seems redundant to define etype and vtype for they have to match
the type of the SEXP. I'm wondering if this is intentional? Will there be a
type-free macro in R in the future? Here is a simple type-free macro I'm
using.
#define type_free_iter(sx, ptr, ind, nbatch,expr)\
switch(TYPEOF(sx)){\
case INTSXP:\
ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, INTEGER, expr);\
break; \
case REALSXP:\
ITERATE_BY_REGION(sx, ptr, ind, nbatch, double, REAL, expr);\
break; \
case LGLSXP:\
ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, LOGICAL, expr);\
break; \
default:\
Rf_error("Unknow data type\n"); \
break; \
}
// [[Rcpp::export]]
double sillysum(SEXP x) {
double s = 0.0;
type_free_iter(x, ptr, ind, nbatch,
{
for (int i = 0; i < nbatch; i++) { s = s + ptr[i]; }
});
return s;
}
Best,
Jiefei
On Wed, Aug 28, 2019 at 2:32 PM Wang Jiefei <szwjf08 at gmail.com> wrote:
Thank you, Gabriel. The loop macro is very helpful. It is also exciting
to see that there are lots of changes in ALTREP in R devel version. I
really appreciate your help!
Best,
Jiefei
On Wed, Aug 28, 2019 at 7:37 AM Gabriel Becker <gabembecker at gmail.com>
wrote:
Jiefei,
I've been meaning to write up something about this so hopefully this
will be an impetus for me to actually do that, but until then, responses
inline.
On Tue, Aug 27, 2019, 7:22 PM Wang Jiefei <szwjf08 at gmail.com> wrote:
Hi devel team,
I'm working on C/C++ level ALTREP compatibility for a package. The
package
previously used pointers to access the data of a SEXP, so it would not
work
for some ALTREP objects which do not have a pointer. I plan to rewrite
the
code and use functions like get_elt, get_region, and get_subset to
access
the values of a vector, so I have a few questions for ALTREP:
1. Since an ALTREP do not have to define all of the above
functions(element, region, subset), is there any way to check which
function has been defined for an ALTREP class? I did a search on
RInternal.h and altrep.c but did not find a solution for it. If not,
will
it be added in the future?
Element and region are guaranteed to always be defined and work (for
altrep and non-altrep INTSXP, REALSXP, LGLSXPs, etc, we currently don't
have region for STRSXP or VECSXP, I believe). If the altrep class does not
provide them then default methods will be used, which may be inefficient in
some cases but will work. Subset is currently a forward looking stub, but
once implimented, that will also be guaranteed to work for all valid ALTREP
classes.
2. Given the diversity of ALTREP classes, what is the best way to loop
over
an ALTREP object? I hope there can be an all-in-one function which can
get
the values from a vector as long as at least one of the above functions
has
been defined, so package developers would not be bothered by tons of
`if-else` statement if they want their package to work with ALTREP.
Since
it seems like there is no such function exist, what could be the best
way
to do the loop under the current R version?
The best way to loop over all SEXPs, which supports both altrep and
nonaltrep objects is, with the ITERATE_BY_REGION (which has been in R for a
number of released versions, at least since 3.5.0 I think) and the much
newer (devel only) ITERATE_BY_REGION_PARTIAL macros defined in
R_exts/Itermacros.h
The meaning of the arguments is as follows for ITERATE_BY_REGION_PARTIAL
are as follows (ITERATE_BY_REGION is the same except no strt, and nfull).
- sx - C level variable name of the SEXP to iterate over
- px - variable name to use for the pointer populated with data from
a region of sx
- idx - variable name to use for the "outer", batch counter in the
for loop. This will contain the 0-indexed start position of the batch
you're currently processing
- nb - variable name to use for the current batch size. This will
always either be GET_REGION_BUFFSIZE (512), or the number of elements
remaining in the vector, whichever is smaller
- etype - element (C) type, e.g., int, double, of the data
- vtype - vector (access API) type, e.g, INTEGER, REAL
- strt - the 0-indexed position in the vector to start iterating
- nfull - the total number oif elements to iterate over from the
vector
- expr - the code to process a single batch (Which will do things to
px, typically)
So code to perform badly implemented not good idea summing of REALSXP
data might look like
double sillysum(SEXP x) {
double s = 0.0;
ITERATE_BY_REGION(x, ptr, ind, nbatch, double, REAL,
{
for(int i = 0; i < nbatch; i++) { s = s + ptr[i];}
})
return s;
}
For meatier examples of ITERATE_BY_REGION's use in practice you can grep
the R sources. I know it is used in the implementations of the various
C-level summaries (summary.c), print and formatting functions, and anyNA.
Some things to remember
- If you have an inner loop like the one above, your total position
in the original vector is ind + i
- ITERATE_BY_REGION always processes the whole vector, if you need
to only do part of it yo'll either need custom breaking for both inner and
outer loopsl, or in R-devel you can use ITERATE_BY_REGION_PARTIAL
- Don't use the variants ending in 0, all they do is skip over
things that are a good idea in the case of non-altreps (and some very
specific altreps).
Hope that helps.
Best,
~G
Best,
Jiefei
[[alternative HTML version deleted]]