Skip to content

What is the best way to loop over an ALTREP vector?

6 messages · Jiefei Wang, Bob Rudis, Gabriel Becker +2 more

#
Hi Gabriel,

I have tried the macro and found a small issue, it seems like the macro is
written in C and does an implicit type conversion(const void * to const int
*), see below. While it is allowed in C, C++ seems not happy with it. Is it
possible to add an explicit type casting so that it can be compatible with
both language?


#define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \

                             strt, nfull, expr) do {         \

       *const etype *px = (const** etype *)DATAPTR_OR_NULL(sx);  *
\

       if (px != NULL) {                                      \

           R_xlen_t __ibr_n__ = strt + nfull;                        \

           R_xlen_t nb = __ibr_n__;                                  \

           for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \

              expr                                            \

            }                                                 \

       }                                                      \

       else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, vtype,
\

                                   strt, nfull, expr);        \

    } while (0)


  Also, I notice that the element type(etype) and vector type(vtype) has to
be specified in the macro. Since the SEXP is the first argument in the
macro, it seems redundant to define etype and vtype for they have to match
the type of the SEXP. I'm wondering if this is intentional? Will there be a
type-free macro in R in the future? Here is a simple type-free macro I'm
using.

#define type_free_iter(sx, ptr, ind, nbatch,expr)\

switch(TYPEOF(sx)){\

case INTSXP:\

       ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, INTEGER, expr);\

       break; \

case REALSXP:\

       ITERATE_BY_REGION(sx, ptr, ind, nbatch, double, REAL, expr);\

       break; \

case LGLSXP:\

       ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, LOGICAL, expr);\

       break; \

default:\

       Rf_error("Unknow data type\n"); \

       break; \

}



// [[Rcpp::export]]

double sillysum(SEXP x) {

       double s = 0.0;

       type_free_iter(x, ptr, ind, nbatch,

              {

                     for (int i = 0; i < nbatch; i++) { s = s + ptr[i]; }

              });

              return s;

}




Best,

Jiefei
On Wed, Aug 28, 2019 at 2:32 PM Wang Jiefei <szwjf08 at gmail.com> wrote:

            

  
  
#
Sorry for post a lot of things, for the first part of code, I copied my C++
iter macro by mistake(and you can see an explicit type casting). Here is
the macro definition from R_exts/Itermacros.h

#define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype,     \

                             strt, nfull, expr) do {         \

*       const** etype *px = DATAPTR_OR_NULL(sx);           *             \

       if (px != NULL) {                                      \

           R_xlen_t __ibr_n__ = strt + nfull;                        \

           R_xlen_t nb = __ibr_n__;                                  \

           for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) {   \

              expr                                            \

            }                                                 \

       }                                                      \

       else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, vtype,
\

                                  strt, nfull, expr);        \

    } while (0)


Best,

Jiefei
On Mon, Sep 23, 2019 at 3:12 PM Wang Jiefei <szwjf08 at gmail.com> wrote:

            

  
  
#
Not sure if you're using just C++ or Rcpp for C++ access but https://purrple.cat/blog/2018/10/14/altrep-and-cpp/ has some tips on using C++ w/ALTREP.
#
Hi Bob,

Thanks for sending around the link to that. It looks mostly right and looks
like a useful onramp. There are a few things to watch out for though (I've
cc'ed Romain so he's aware of these comments). @romain I hope you taake the
following comments as they are intended, as help rather than attacks.

The largest issue I see is that the contract for Get_region is that it
*populates the
provided buffer with a copy of the data. *That buffer is expected to be
safe to destructively modify, shuffle, etc though I don't know if we are
actually doing that anywhere. As such, if I understand his C++ correctly,
that Get_region method  is not safe and shouldn't be used.

The other point is that Dataptr_or_null is not actually *guaranteed *not to
allocate. The default method returns NULL, but we have no way of preventing
an allocation in a user-defined method, and probably (?) no easy way of
detecting that it is occurring before it causes a bug. That said, Romain is
correct that when you are writing Dataptr_or_null methods you should write
them so that they don't allocate, generally. Basically your methods for
Dataptr_or_null shouldn't allocate, but you also should not write code that
relies on hard assumptions that no one's ever will.

Also, a small nitpick, R's internal mean function doesn't hit Dataptr, it
hits either INTEGER_ELT (which really should probably be a
ITERATE_BY_REGION) or ITERATE_BY_REGION.

Anyway, I hope that helps.
~G
On Mon, Sep 23, 2019 at 6:12 PM Bob Rudis <bob at rud.is> wrote:

            

  
  
#
Le 24/09/2019 ? 07:48, Gabriel Becker a ?crit?:
Even if it is not the main point of this thread, I was wondering if 
mean() could take an advantage of sum() (which handles ALTREP in 
efficient way) to be defined as mean(x)=sum(x)/length(x)? Currently, 
sum(1:1e14) is almost instantaneous while mean(1:1e14) is very long.

Best Serguei.
#
Thanks for these comments. I should alter the blog post or write some follow up. 

This was a weekend blog post that only benefited from a short time of research research. I?m glad people find it useful, but I?m sure a detailed documentation of the features from the authors would be more useful. 

Romain