Hello together, I'm working on some custom (grouped, weighted) sum, min and
max functions and I want them to support the special case of plain integer
sequences using ALTREP. I thereby encountered some behavior I cannot
explain to myself. The head of my fsum C function looks like this (g is
optional grouping vector, w is optional weights vector):
SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
if(ALTREP(x) && ng == 0 && nwl) {
switch(tx) {
case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
default: error("ALTREP object must be integer or real typed");
}
}
// ...
}
when I let x <- 1:1e8, fsum(x) works fine and returns the correct value. If
I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn this
into a vector again, dim(x) <- NULL, fsum(x) gives NULL and a warning
message 'converting NULL pointer to R NULL'. For functions fmin and fmax
(similarly defined using ALTINTEGER_MIN/MAX), I get this error right away
e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R
NULL'. So what is going on here? What do these functions return? And how do
I make this a robust implementation?
Best regards,
Sebastian Krantz
ALTREP ALTINTEGER_SUM/MIN/MAX Return Value and Behavior
10 messages · iuke-tier@ey m@iii@g oii uiow@@edu, Bill Dunlap, Sebastian Martin Krantz +1 more
ALTINTEGER_SUM and friends are _not_ intended for use in package code. Once we get some time to clean up headers they will no longer be visible to packages. Best, luke
On Tue, 29 Jun 2021, Sebastian Martin Krantz wrote:
Hello together, I'm working on some custom (grouped, weighted) sum, min and
max functions and I want them to support the special case of plain integer
sequences using ALTREP. I thereby encountered some behavior I cannot
explain to myself. The head of my fsum C function looks like this (g is
optional grouping vector, w is optional weights vector):
SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
if(ALTREP(x) && ng == 0 && nwl) {
switch(tx) {
case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
default: error("ALTREP object must be integer or real typed");
}
}
// ...
}
when I let x <- 1:1e8, fsum(x) works fine and returns the correct value. If
I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn this
into a vector again, dim(x) <- NULL, fsum(x) gives NULL and a warning
message 'converting NULL pointer to R NULL'. For functions fmin and fmax
(similarly defined using ALTINTEGER_MIN/MAX), I get this error right away
e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R
NULL'. So what is going on here? What do these functions return? And how do
I make this a robust implementation?
Best regards,
Sebastian Krantz
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Adding the dimensions attribute takes away the altrep-ness. Removing dimensions does not make it altrep. E.g.,
a <- 1:10
am <- a ; dim(am) <- c(2L,5L)
amn <- am ; dim(amn) <- NULL
.Call("is_altrep", a)
[1] TRUE
.Call("is_altrep", am)
[1] FALSE
.Call("is_altrep", amn)
[1] FALSE
where is_altrep() is defined by the following C code:
#include <R.h>
#include <Rinternals.h>
SEXP is_altrep(SEXP x)
{
return Rf_ScalarLogical(ALTREP(x));
}
-Bill
On Tue, Jun 29, 2021 at 8:03 AM Sebastian Martin Krantz <
sebastian.krantz at graduateinstitute.ch> wrote:
Hello together, I'm working on some custom (grouped, weighted) sum, min and
max functions and I want them to support the special case of plain integer
sequences using ALTREP. I thereby encountered some behavior I cannot
explain to myself. The head of my fsum C function looks like this (g is
optional grouping vector, w is optional weights vector):
SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
if(ALTREP(x) && ng == 0 && nwl) {
switch(tx) {
case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
default: error("ALTREP object must be integer or real typed");
}
}
// ...
}
when I let x <- 1:1e8, fsum(x) works fine and returns the correct value. If
I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn this
into a vector again, dim(x) <- NULL, fsum(x) gives NULL and a warning
message 'converting NULL pointer to R NULL'. For functions fmin and fmax
(similarly defined using ALTINTEGER_MIN/MAX), I get this error right away
e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R
NULL'. So what is going on here? What do these functions return? And how do
I make this a robust implementation?
Best regards,
Sebastian Krantz
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Thanks both. Is there a suggested way I can get this speedup in a package? Or just leave it for now? Thanks also for the clarification Bill. The issue I have with that is that in my C code ALTREP(x) evaluates to true even after adding and removing dimensions (otherwise it would be handled by the normal sum method and I?d be fine). Also .Internal(inspect(x)) still shows the compact representation. -Sebastian
On Tue 29. Jun 2021 at 19:43, Bill Dunlap <williamwdunlap at gmail.com> wrote:
Adding the dimensions attribute takes away the altrep-ness. Removing dimensions does not make it altrep. E.g.,
a <- 1:10
am <- a ; dim(am) <- c(2L,5L)
amn <- am ; dim(amn) <- NULL
.Call("is_altrep", a)
[1] TRUE
.Call("is_altrep", am)
[1] FALSE
.Call("is_altrep", amn)
[1] FALSE
where is_altrep() is defined by the following C code:
#include <R.h>
#include <Rinternals.h>
SEXP is_altrep(SEXP x)
{
return Rf_ScalarLogical(ALTREP(x));
}
-Bill
On Tue, Jun 29, 2021 at 8:03 AM Sebastian Martin Krantz <
sebastian.krantz at graduateinstitute.ch> wrote:
Hello together, I'm working on some custom (grouped, weighted) sum, min
and
max functions and I want them to support the special case of plain integer
sequences using ALTREP. I thereby encountered some behavior I cannot
explain to myself. The head of my fsum C function looks like this (g is
optional grouping vector, w is optional weights vector):
SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
if(ALTREP(x) && ng == 0 && nwl) {
switch(tx) {
case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
default: error("ALTREP object must be integer or real typed");
}
}
// ...
}
when I let x <- 1:1e8, fsum(x) works fine and returns the correct value.
If
I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn this
into a vector again, dim(x) <- NULL, fsum(x) gives NULL and a warning
message 'converting NULL pointer to R NULL'. For functions fmin and fmax
(similarly defined using ALTINTEGER_MIN/MAX), I get this error right away
e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R
NULL'. So what is going on here? What do these functions return? And how
do
I make this a robust implementation?
Best regards,
Sebastian Krantz
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
It depends on the size. For a larger vector adding dim will create a wrapper ALTREP. Currently the wrapper does not try to use the payload's sum method; this could be added. Best, luke
On Tue, 29 Jun 2021, Bill Dunlap wrote:
Adding the dimensions attribute takes away the altrep-ness. Removing dimensions does not make it altrep. E.g.,
a <- 1:10
am <- a ; dim(am) <- c(2L,5L)
amn <- am ; dim(amn) <- NULL
.Call("is_altrep", a)
[1] TRUE
.Call("is_altrep", am)
[1] FALSE
.Call("is_altrep", amn)
[1] FALSE
where is_altrep() is defined by the following C code:
#include <R.h>
#include <Rinternals.h>
SEXP is_altrep(SEXP x)
{
return Rf_ScalarLogical(ALTREP(x));
}
-Bill
On Tue, Jun 29, 2021 at 8:03 AM Sebastian Martin Krantz <
sebastian.krantz at graduateinstitute.ch> wrote:
Hello together, I'm working on some custom (grouped, weighted) sum, min and
max functions and I want them to support the special case of plain integer
sequences using ALTREP. I thereby encountered some behavior I cannot
explain to myself. The head of my fsum C function looks like this (g is
optional grouping vector, w is optional weights vector):
SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
if(ALTREP(x) && ng == 0 && nwl) {
switch(tx) {
case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
default: error("ALTREP object must be integer or real typed");
}
}
// ...
}
when I let x <- 1:1e8, fsum(x) works fine and returns the correct value. If
I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn this
into a vector again, dim(x) <- NULL, fsum(x) gives NULL and a warning
message 'converting NULL pointer to R NULL'. For functions fmin and fmax
(similarly defined using ALTINTEGER_MIN/MAX), I get this error right away
e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R
NULL'. So what is going on here? What do these functions return? And how do
I make this a robust implementation?
Best regards,
Sebastian Krantz
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Call the R sum() function, either before going to C code or by calling back into R. You may only want to do this if the vector is long enough for e possible savings to be worth while.
On Tue, 29 Jun 2021, Sebastian Martin Krantz wrote:
Thanks both. Is there a suggested way I can get this speedup in a package? Or just leave it for now? Thanks also for the clarification Bill. The issue I have with that is that in my C code ALTREP(x) evaluates to true even after adding and removing dimensions (otherwise it would be handled by the normal sum method and I?d be fine).
When you use a longer vector
Also .Internal(inspect(x)) still shows the compact representation.
A different representation (wrapper around a compact sequence). Best, luke
-Sebastian On Tue 29. Jun 2021 at 19:43, Bill Dunlap <williamwdunlap at gmail.com> wrote:
Adding the dimensions attribute takes away the altrep-ness. Removing dimensions does not make it altrep. E.g.,
a <- 1:10
am <- a ; dim(am) <- c(2L,5L)
amn <- am ; dim(amn) <- NULL
.Call("is_altrep", a)
[1] TRUE
.Call("is_altrep", am)
[1] FALSE
.Call("is_altrep", amn)
[1] FALSE
where is_altrep() is defined by the following C code:
#include <R.h>
#include <Rinternals.h>
SEXP is_altrep(SEXP x)
{
return Rf_ScalarLogical(ALTREP(x));
}
-Bill
On Tue, Jun 29, 2021 at 8:03 AM Sebastian Martin Krantz <
sebastian.krantz at graduateinstitute.ch> wrote:
Hello together, I'm working on some custom (grouped, weighted) sum, min
and
max functions and I want them to support the special case of plain integer
sequences using ALTREP. I thereby encountered some behavior I cannot
explain to myself. The head of my fsum C function looks like this (g is
optional grouping vector, w is optional weights vector):
SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
if(ALTREP(x) && ng == 0 && nwl) {
switch(tx) {
case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
default: error("ALTREP object must be integer or real typed");
}
}
// ...
}
when I let x <- 1:1e8, fsum(x) works fine and returns the correct value.
If
I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn this
into a vector again, dim(x) <- NULL, fsum(x) gives NULL and a warning
message 'converting NULL pointer to R NULL'. For functions fmin and fmax
(similarly defined using ALTINTEGER_MIN/MAX), I get this error right away
e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R
NULL'. So what is going on here? What do these functions return? And how
do
I make this a robust implementation?
Best regards,
Sebastian Krantz
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Hi Sebastian, So the way that it is currently factored, there isn't a good way of getting what you want under the constraints of what Luke said (ALTINTEGER_SUM is not part of the API). I don't know what his reason are for saying that per say and would not want to speak for him, but of the top of my head, I suspect it is because ALTREP sum methods are allowed to return NULL (the C version) to say "I don't have a sum method that is applicable here, please continue with the normal code". So, just as an example, your exact code is likely to segfault, I think, if you hit an ALTREP that chooses not to implement a sum method because you'll be running around with a SEXP that has the value NULL (the C one, not the R one). One thing you could do, is check for altrepness and then construct and evaluate a call to the R sum function in that case, but that probably isn't quite what you want either, as this will hit the code you're trying to bypass/speedup in the case where the ALTREP class doesn't implement a sum methods. I see that Luke just mentioned this as well but I'll leave it in since I had already typed it. I hope that helps clarify some things. Best, ~G On Tue, Jun 29, 2021 at 10:13 AM Sebastian Martin Krantz <
sebastian.krantz at graduateinstitute.ch> wrote:
Thanks both. Is there a suggested way I can get this speedup in a package? Or just leave it for now? Thanks also for the clarification Bill. The issue I have with that is that in my C code ALTREP(x) evaluates to true even after adding and removing dimensions (otherwise it would be handled by the normal sum method and I?d be fine). Also .Internal(inspect(x)) still shows the compact representation. -Sebastian On Tue 29. Jun 2021 at 19:43, Bill Dunlap <williamwdunlap at gmail.com> wrote:
Adding the dimensions attribute takes away the altrep-ness. Removing dimensions does not make it altrep. E.g.,
a <- 1:10
am <- a ; dim(am) <- c(2L,5L)
amn <- am ; dim(amn) <- NULL
.Call("is_altrep", a)
[1] TRUE
.Call("is_altrep", am)
[1] FALSE
.Call("is_altrep", amn)
[1] FALSE
where is_altrep() is defined by the following C code:
#include <R.h>
#include <Rinternals.h>
SEXP is_altrep(SEXP x)
{
return Rf_ScalarLogical(ALTREP(x));
}
-Bill
On Tue, Jun 29, 2021 at 8:03 AM Sebastian Martin Krantz <
sebastian.krantz at graduateinstitute.ch> wrote:
Hello together, I'm working on some custom (grouped, weighted) sum, min and max functions and I want them to support the special case of plain
integer
sequences using ALTREP. I thereby encountered some behavior I cannot
explain to myself. The head of my fsum C function looks like this (g is
optional grouping vector, w is optional weights vector):
SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
if(ALTREP(x) && ng == 0 && nwl) {
switch(tx) {
case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
default: error("ALTREP object must be integer or real typed");
}
}
// ...
}
when I let x <- 1:1e8, fsum(x) works fine and returns the correct value.
If
I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn
this
into a vector again, dim(x) <- NULL, fsum(x) gives NULL and a warning message 'converting NULL pointer to R NULL'. For functions fmin and fmax (similarly defined using ALTINTEGER_MIN/MAX), I get this error right
away
e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R
NULL'. So what is going on here? What do these functions return? And how
do
I make this a robust implementation?
Best regards,
Sebastian Krantz
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Also, @Luke Tierney <luke-tierney at uiowa.edu> I can prepare a patch that has wrappers delegate to payload's ALTREP class methods for things like sum, min, max, etc once conference season calms down a bit. Best, ~G On Tue, Jun 29, 2021 at 11:07 AM Gabriel Becker <gabembecker at gmail.com> wrote:
Hi Sebastian, So the way that it is currently factored, there isn't a good way of getting what you want under the constraints of what Luke said (ALTINTEGER_SUM is not part of the API). I don't know what his reason are for saying that per say and would not want to speak for him, but of the top of my head, I suspect it is because ALTREP sum methods are allowed to return NULL (the C version) to say "I don't have a sum method that is applicable here, please continue with the normal code". So, just as an example, your exact code is likely to segfault, I think, if you hit an ALTREP that chooses not to implement a sum method because you'll be running around with a SEXP that has the value NULL (the C one, not the R one). One thing you could do, is check for altrepness and then construct and evaluate a call to the R sum function in that case, but that probably isn't quite what you want either, as this will hit the code you're trying to bypass/speedup in the case where the ALTREP class doesn't implement a sum methods. I see that Luke just mentioned this as well but I'll leave it in since I had already typed it. I hope that helps clarify some things. Best, ~G On Tue, Jun 29, 2021 at 10:13 AM Sebastian Martin Krantz < sebastian.krantz at graduateinstitute.ch> wrote:
Thanks both. Is there a suggested way I can get this speedup in a package? Or just leave it for now? Thanks also for the clarification Bill. The issue I have with that is that in my C code ALTREP(x) evaluates to true even after adding and removing dimensions (otherwise it would be handled by the normal sum method and I?d be fine). Also .Internal(inspect(x)) still shows the compact representation. -Sebastian On Tue 29. Jun 2021 at 19:43, Bill Dunlap <williamwdunlap at gmail.com> wrote:
Adding the dimensions attribute takes away the altrep-ness. Removing dimensions does not make it altrep. E.g.,
a <- 1:10
am <- a ; dim(am) <- c(2L,5L)
amn <- am ; dim(amn) <- NULL
.Call("is_altrep", a)
[1] TRUE
.Call("is_altrep", am)
[1] FALSE
.Call("is_altrep", amn)
[1] FALSE
where is_altrep() is defined by the following C code:
#include <R.h>
#include <Rinternals.h>
SEXP is_altrep(SEXP x)
{
return Rf_ScalarLogical(ALTREP(x));
}
-Bill
On Tue, Jun 29, 2021 at 8:03 AM Sebastian Martin Krantz <
sebastian.krantz at graduateinstitute.ch> wrote:
Hello together, I'm working on some custom (grouped, weighted) sum, min and max functions and I want them to support the special case of plain
integer
sequences using ALTREP. I thereby encountered some behavior I cannot
explain to myself. The head of my fsum C function looks like this (g is
optional grouping vector, w is optional weights vector):
SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
if(ALTREP(x) && ng == 0 && nwl) {
switch(tx) {
case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
default: error("ALTREP object must be integer or real typed");
}
}
// ...
}
when I let x <- 1:1e8, fsum(x) works fine and returns the correct
value.
If I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn
this
into a vector again, dim(x) <- NULL, fsum(x) gives NULL and a warning message 'converting NULL pointer to R NULL'. For functions fmin and
fmax
(similarly defined using ALTINTEGER_MIN/MAX), I get this error right
away
e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R NULL'. So what is going on here? What do these functions return? And
how
do
I make this a robust implementation?
Best regards,
Sebastian Krantz
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Thanks Gabriel and Luke, I understand now the functions return NULL if no method is applicable. I wonder though why do ALTINTEGER_MIN and MAX return NULL on a plain integer sequence? I also see that min() and max() are not optimized i.e. min(1:1e8) appears to materialize the vector. In general I expect my functions to mostly be applied to real data so this is not a huge issue for me (I?d rather get rid of it again than calling sum() or risking that the macros are removed from the API), but it could be nice to have this speedup available to packages. If these macros have matured and it can be made explicit that they return NULL if no method is applicable, or, better, they internally dispatch to a normal sum method if this is the case, they could become very manageable and useful. Best, Sebastian
On Tue 29. Jun 2021 at 21:09, Gabriel Becker <gabembecker at gmail.com> wrote:
Also, @Luke Tierney <luke-tierney at uiowa.edu> I can prepare a patch that has wrappers delegate to payload's ALTREP class methods for things like sum, min, max, etc once conference season calms down a bit. Best, ~G On Tue, Jun 29, 2021 at 11:07 AM Gabriel Becker <gabembecker at gmail.com> wrote:
Hi Sebastian, So the way that it is currently factored, there isn't a good way of getting what you want under the constraints of what Luke said (ALTINTEGER_SUM is not part of the API). I don't know what his reason are for saying that per say and would not want to speak for him, but of the top of my head, I suspect it is because ALTREP sum methods are allowed to return NULL (the C version) to say "I don't have a sum method that is applicable here, please continue with the normal code". So, just as an example, your exact code is likely to segfault, I think, if you hit an ALTREP that chooses not to implement a sum method because you'll be running around with a SEXP that has the value NULL (the C one, not the R one). One thing you could do, is check for altrepness and then construct and evaluate a call to the R sum function in that case, but that probably isn't quite what you want either, as this will hit the code you're trying to bypass/speedup in the case where the ALTREP class doesn't implement a sum methods. I see that Luke just mentioned this as well but I'll leave it in since I had already typed it. I hope that helps clarify some things. Best, ~G On Tue, Jun 29, 2021 at 10:13 AM Sebastian Martin Krantz < sebastian.krantz at graduateinstitute.ch> wrote:
Thanks both. Is there a suggested way I can get this speedup in a package? Or just leave it for now? Thanks also for the clarification Bill. The issue I have with that is that in my C code ALTREP(x) evaluates to true even after adding and removing dimensions (otherwise it would be handled by the normal sum method and I?d be fine). Also .Internal(inspect(x)) still shows the compact representation. -Sebastian On Tue 29. Jun 2021 at 19:43, Bill Dunlap <williamwdunlap at gmail.com> wrote:
Adding the dimensions attribute takes away the altrep-ness. Removing dimensions does not make it altrep. E.g.,
a <- 1:10
am <- a ; dim(am) <- c(2L,5L)
amn <- am ; dim(amn) <- NULL
.Call("is_altrep", a)
[1] TRUE
.Call("is_altrep", am)
[1] FALSE
.Call("is_altrep", amn)
[1] FALSE
where is_altrep() is defined by the following C code:
#include <R.h>
#include <Rinternals.h>
SEXP is_altrep(SEXP x)
{
return Rf_ScalarLogical(ALTREP(x));
}
-Bill
On Tue, Jun 29, 2021 at 8:03 AM Sebastian Martin Krantz <
sebastian.krantz at graduateinstitute.ch> wrote:
Hello together, I'm working on some custom (grouped, weighted) sum,
min
and max functions and I want them to support the special case of plain
integer
sequences using ALTREP. I thereby encountered some behavior I cannot explain to myself. The head of my fsum C function looks like this (g
is
optional grouping vector, w is optional weights vector):
SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
if(ALTREP(x) && ng == 0 && nwl) {
switch(tx) {
case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
default: error("ALTREP object must be integer or real typed");
}
}
// ...
}
when I let x <- 1:1e8, fsum(x) works fine and returns the correct
value.
If I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn
this
into a vector again, dim(x) <- NULL, fsum(x) gives NULL and a warning message 'converting NULL pointer to R NULL'. For functions fmin and
fmax
(similarly defined using ALTINTEGER_MIN/MAX), I get this error right
away
e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R NULL'. So what is going on here? What do these functions return? And
how
do
I make this a robust implementation?
Best regards,
Sebastian Krantz
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Hi Sebastian, min/max do not materialize the vector, you will see it as compact after same as before. It *does* however do a pass over the data chunked by region, which is much more expensive than it need be for compact sequences, that is true. I think in some version of code that never made it out of the branch, I had default min/max methods which took sortedness into account if it was known. One thing that significantly complicated that cod ewas that you have to find the edge of the NAs(/NaNs for the real case) if narm is TRUE, which involves a binary search using ELT (or a linear one using ITERATE_BY_REGION, I suppose). That said a newer version of the count nas code did get in from a later patch to update, so it is available in r-devel and could be used to revisit that approach. That aside, it is true that compact sequences in particular never have NAs so the min and max altrep methods for those classes would be trivial. I kind of doubt people are creating compact sequences and then asking for the min/max/mean of them very often in practice. Best, ~G On Tue, Jun 29, 2021 at 11:26 AM Sebastian Martin Krantz <
sebastian.krantz at graduateinstitute.ch> wrote:
Thanks Gabriel and Luke, I understand now the functions return NULL if no method is applicable. I wonder though why do ALTINTEGER_MIN and MAX return NULL on a plain integer sequence? I also see that min() and max() are not optimized i.e. min(1:1e8) appears to materialize the vector. In general I expect my functions to mostly be applied to real data so this is not a huge issue for me (I?d rather get rid of it again than calling sum() or risking that the macros are removed from the API), but it could be nice to have this speedup available to packages. If these macros have matured and it can be made explicit that they return NULL if no method is applicable, or, better, they internally dispatch to a normal sum method if this is the case, they could become very manageable and useful. Best, Sebastian On Tue 29. Jun 2021 at 21:09, Gabriel Becker <gabembecker at gmail.com> wrote:
Also, @Luke Tierney <luke-tierney at uiowa.edu> I can prepare a patch that has wrappers delegate to payload's ALTREP class methods for things like sum, min, max, etc once conference season calms down a bit. Best, ~G On Tue, Jun 29, 2021 at 11:07 AM Gabriel Becker <gabembecker at gmail.com> wrote:
Hi Sebastian, So the way that it is currently factored, there isn't a good way of getting what you want under the constraints of what Luke said (ALTINTEGER_SUM is not part of the API). I don't know what his reason are for saying that per say and would not want to speak for him, but of the top of my head, I suspect it is because ALTREP sum methods are allowed to return NULL (the C version) to say "I don't have a sum method that is applicable here, please continue with the normal code". So, just as an example, your exact code is likely to segfault, I think, if you hit an ALTREP that chooses not to implement a sum method because you'll be running around with a SEXP that has the value NULL (the C one, not the R one). One thing you could do, is check for altrepness and then construct and evaluate a call to the R sum function in that case, but that probably isn't quite what you want either, as this will hit the code you're trying to bypass/speedup in the case where the ALTREP class doesn't implement a sum methods. I see that Luke just mentioned this as well but I'll leave it in since I had already typed it. I hope that helps clarify some things. Best, ~G On Tue, Jun 29, 2021 at 10:13 AM Sebastian Martin Krantz < sebastian.krantz at graduateinstitute.ch> wrote:
Thanks both. Is there a suggested way I can get this speedup in a package? Or just leave it for now? Thanks also for the clarification Bill. The issue I have with that is that in my C code ALTREP(x) evaluates to true even after adding and removing dimensions (otherwise it would be handled by the normal sum method and I?d be fine). Also .Internal(inspect(x)) still shows the compact representation. -Sebastian On Tue 29. Jun 2021 at 19:43, Bill Dunlap <williamwdunlap at gmail.com> wrote:
Adding the dimensions attribute takes away the altrep-ness. Removing dimensions does not make it altrep. E.g.,
a <- 1:10
am <- a ; dim(am) <- c(2L,5L)
amn <- am ; dim(amn) <- NULL
.Call("is_altrep", a)
[1] TRUE
.Call("is_altrep", am)
[1] FALSE
.Call("is_altrep", amn)
[1] FALSE
where is_altrep() is defined by the following C code:
#include <R.h>
#include <Rinternals.h>
SEXP is_altrep(SEXP x)
{
return Rf_ScalarLogical(ALTREP(x));
}
-Bill
On Tue, Jun 29, 2021 at 8:03 AM Sebastian Martin Krantz <
sebastian.krantz at graduateinstitute.ch> wrote:
Hello together, I'm working on some custom (grouped, weighted) sum,
min
and max functions and I want them to support the special case of plain
integer
sequences using ALTREP. I thereby encountered some behavior I cannot explain to myself. The head of my fsum C function looks like this (g
is
optional grouping vector, w is optional weights vector):
SEXP fsumC(SEXP x, SEXP Rng, SEXP g, SEXP w, SEXP Rnarm) {
int l = length(x), tx = TYPEOF(x), ng = asInteger(Rng),
narm = asLogical(Rnarm), nprotect = 1, nwl = isNull(w);
if(ALTREP(x) && ng == 0 && nwl) {
switch(tx) {
case INTSXP: return ALTINTEGER_SUM(x, (Rboolean)narm);
case LGLSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
case REALSXP: return ALTLOGICAL_SUM(x, (Rboolean)narm);
default: error("ALTREP object must be integer or real typed");
}
}
// ...
}
when I let x <- 1:1e8, fsum(x) works fine and returns the correct
value.
If I now make this a matrix dim(x) <- c(1e2, 1e6) and subsequently turn
this
into a vector again, dim(x) <- NULL, fsum(x) gives NULL and a
warning
message 'converting NULL pointer to R NULL'. For functions fmin and
fmax
(similarly defined using ALTINTEGER_MIN/MAX), I get this error right
away
e.g. fmin(1:1e8) gives NULL and warning 'converting NULL pointer to R NULL'. So what is going on here? What do these functions return? And
how
do
I make this a robust implementation?
Best regards,
Sebastian Krantz
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel