Skip to content
Prev 57394 / 63421 Next

What is the best way to loop over an ALTREP vector?

Jiefei,

I've been meaning to write up something about this so hopefully this will
be an impetus for me to actually do that, but until then, responses inline.
On Tue, Aug 27, 2019, 7:22 PM Wang Jiefei <szwjf08 at gmail.com> wrote:

            
Element and region are guaranteed to always be defined and work (for altrep
and non-altrep INTSXP, REALSXP, LGLSXPs, etc, we currently don't have
region for STRSXP or VECSXP, I believe). If the altrep class does not
provide them then default methods will be used, which may be inefficient in
some cases but will work. Subset is currently a forward looking stub, but
once implimented, that will also be guaranteed to work for all valid ALTREP
classes.
The best way to loop over all SEXPs, which supports both altrep and
nonaltrep objects is, with the ITERATE_BY_REGION (which has been in R for a
number of released versions, at least since 3.5.0 I think) and the much
newer (devel only) ITERATE_BY_REGION_PARTIAL macros defined in
R_exts/Itermacros.h

The meaning of the arguments is as follows for ITERATE_BY_REGION_PARTIAL
are as follows (ITERATE_BY_REGION is the same except no strt, and nfull).


   - sx - C level variable name of the SEXP to iterate over
   - px - variable name to use for the pointer populated with data from a
   region of sx
   - idx - variable name to use for the "outer", batch counter in the for
   loop. This will contain the 0-indexed start position of the batch you're
   currently processing
   - nb - variable name to use for the current batch size. This will always
   either be GET_REGION_BUFFSIZE (512), or the number of elements remaining in
   the vector, whichever is smaller
   - etype - element (C) type, e.g., int, double, of the data
   - vtype - vector (access API) type, e.g, INTEGER, REAL
   - strt - the 0-indexed position in the vector to start iterating
   - nfull - the total number oif elements to iterate over from the vector
   - expr - the code to process a single batch (Which will do things to px,
   typically)


So code to perform badly implemented not good idea summing of REALSXP data
might look like

double sillysum(SEXP x) {

    double s = 0.0;

    ITERATE_BY_REGION(x, ptr, ind, nbatch, double, REAL,
        {

            for(int i = 0; i < nbatch; i++) { s = s + ptr[i];}
        })

     return s;
}

For meatier examples of ITERATE_BY_REGION's use in practice you can grep
the R sources. I know it is used in the implementations of the various
C-level summaries (summary.c), print and formatting functions, and anyNA.

Some things to remember

   - If you have an inner loop like the one above, your total position in
   the original vector is ind + i
   - ITERATE_BY_REGION always processes the whole vector, if you need to
   only do part of it yo'll either need custom breaking for both inner and
   outer loopsl, or in R-devel you can use ITERATE_BY_REGION_PARTIAL
   - Don't use the variants ending in 0, all they do is skip over things
   that are a good idea in the case of non-altreps (and some very specific
   altreps).

Hope that helps.

Best,
~G