I think the discussion of this issue has gotten more complicated than necessary. First, there really is a bug. You can see this also by the fact that delayed warning messages are wrong. For instance, in R-3.1.2: > lapply(c(-1,2,-1),sqrt) [[1]] [1] NaN [[2]] [1] 1.414214 [[3]] [1] NaN Warning messages: 1: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced 2: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced The first warning message should have "1L" rather than "3L". It doesn't because lapply made a destructive change to the R expression that was evaluated for the first element. Throughout the R interpreter, there is a general assumption that expressions that are or were evaluated are immutable, which lapply is not abiding by. The only question is whether the bugs from this are sufficiently obscure that it's worth keeping them for the gain in speed, but the speed cost of fixing it is fairly small (though it's not negligible when the function applied is something simple like sqrt). The fix in the C code for lapply, vapply, and eapply is easy: Rather than create an R expression such as FUN(X[[1L]]) for the first function call, and then modify it in place to FUN(X[[2L]]), and so forth, just create a new expression for each iteration. This requires allocating a few new CONS cells each iteration, which does have a cost, but not a huge one. It's certainly easier and faster than creating a new environment (and also less likely to cause incompatibilities). The R code for apply can be changed to use the same approach, rather than using expressions such as FUN(X[i,]), where i is an index variable, it can create expressions like FUN(X[1L,]), then FUN(X[2L,]), etc. The method for this is simple, like so: > a <- quote(FUN(X[i,])) # "i" could be anything > b <- a; b[[c(2,3)]] <- 1L # change "i" to 1L (creates new expr) This has the added advantage of making error messages refer to the actual index, not to "i", which has no meaning if you haven't looked at the source code for apply (and which doesn't tell you which element led to the error even if you know what "i" does). I've implemented this in the development version of pqR, on the development branch 31-apply-fix, at https://github.com/radfordneal/pqR/tree/31-apply-fix The changes are in src/main/apply.R, src/main/envir.R, and src/library/base/R/apply.R, plus a new test in tests/apply.R. You can compare to branch 31 to see what's changed. (Note rapply seems to not have had a problem, and that other apply functions just use these, so should be fixed as well.) There are also other optimizations in pqR for these functions but the code is still quite similar to R-3.1.2. Radford Neal
iterated lapply
4 messages · Radford Neal, Luke Tierney
On Sun, 1 Mar 2015, Radford Neal wrote:
I think the discussion of this issue has gotten more complicated than necessary.
The discussion has gotten no more complicated than it needs to be. There are other instances, such as Reduce where there is a bug report pending that amounts to the same issue. Performing surgery on expressions and calling eval is not good practice at the R level and probably not a good idea at the C level either. It is worth thinking this through carefully before a adopting a solution, which is what we will be doing. Best, luke
First, there really is a bug. You can see this also by the fact that delayed warning messages are wrong. For instance, in R-3.1.2:
> lapply(c(-1,2,-1),sqrt)
[[1]] [1] NaN [[2]] [1] 1.414214 [[3]] [1] NaN Warning messages: 1: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced 2: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced The first warning message should have "1L" rather than "3L". It doesn't because lapply made a destructive change to the R expression that was evaluated for the first element. Throughout the R interpreter, there is a general assumption that expressions that are or were evaluated are immutable, which lapply is not abiding by. The only question is whether the bugs from this are sufficiently obscure that it's worth keeping them for the gain in speed, but the speed cost of fixing it is fairly small (though it's not negligible when the function applied is something simple like sqrt). The fix in the C code for lapply, vapply, and eapply is easy: Rather than create an R expression such as FUN(X[[1L]]) for the first function call, and then modify it in place to FUN(X[[2L]]), and so forth, just create a new expression for each iteration. This requires allocating a few new CONS cells each iteration, which does have a cost, but not a huge one. It's certainly easier and faster than creating a new environment (and also less likely to cause incompatibilities). The R code for apply can be changed to use the same approach, rather than using expressions such as FUN(X[i,]), where i is an index variable, it can create expressions like FUN(X[1L,]), then FUN(X[2L,]), etc. The method for this is simple, like so:
> a <- quote(FUN(X[i,])) # "i" could be anything > b <- a; b[[c(2,3)]] <- 1L # change "i" to 1L (creates new expr)
This has the added advantage of making error messages refer to the actual index, not to "i", which has no meaning if you haven't looked at the source code for apply (and which doesn't tell you which element led to the error even if you know what "i" does). I've implemented this in the development version of pqR, on the development branch 31-apply-fix, at https://github.com/radfordneal/pqR/tree/31-apply-fix The changes are in src/main/apply.R, src/main/envir.R, and src/library/base/R/apply.R, plus a new test in tests/apply.R. You can compare to branch 31 to see what's changed. (Note rapply seems to not have had a problem, and that other apply functions just use these, so should be fixed as well.) There are also other optimizations in pqR for these functions but the code is still quite similar to R-3.1.2. Radford Neal
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
There are other instances, such as Reduce where there is a bug report pending that amounts to the same issue. Performing surgery on expressions and calling eval is not good practice at the R level and probably not a good idea at the C level either. It is worth thinking this through carefully before a adopting a solution, which is what we will be doing.
Surgery on expressions is what lapply does at the moment. My change
makes it no longer do that.
There is a general problem that lazy evaluation can have the effect
of making the internal details of how an R function like "apply" is
implemented leak into its semantics. That's what's going on with
the Reduce bug (16093) too.
I think one can avoid this by defining the following function for
calling a function with evaluation of arguments forced (ie, lazy
evaluation disabled):
call_forced <- function (f, ...) { list (...); f (...) }
(Of course, for speed one could make this a primitive function, which
wouldn't actually build a list.)
Then the critical code in Reduce could be changed from
for (i in rev(ind)) init <- f(x[[i]], init)
to
for (i in rev(ind)) init <- call_forced (f, x[[i]], init)
If one had a primitive (ie, fast) call_forced, a similar technique
might be better than the one I presented for fixing "apply" (cleaner,
and perhaps slightly faster). I don't see how it helps for functions
like lapply that are written in C, however (where help isn't needed,
since there's nothing wrong with the mod in my previous message).
Radford Neal
On Sun, 1 Mar 2015, Radford Neal wrote:
There are other instances, such as Reduce where there is a bug report pending that amounts to the same issue. Performing surgery on expressions and calling eval is not good practice at the R level and probably not a good idea at the C level either. It is worth thinking this through carefully before a adopting a solution, which is what we will be doing.
Surgery on expressions is what lapply does at the moment. My change
makes it no longer do that.
There is a general problem that lazy evaluation can have the effect
of making the internal details of how an R function like "apply" is
implemented leak into its semantics. That's what's going on with
the Reduce bug (16093) too.
I think one can avoid this by defining the following function for
calling a function with evaluation of arguments forced (ie, lazy
evaluation disabled):
call_forced <- function (f, ...) { list (...); f (...) }
(Of course, for speed one could make this a primitive function, which
wouldn't actually build a list.)
Then the critical code in Reduce could be changed from
for (i in rev(ind)) init <- f(x[[i]], init)
to
for (i in rev(ind)) init <- call_forced (f, x[[i]], init)
This is the option I was suggesting as a possibility in my reply to
Bill Dunlap -- I called it funcall. This may be the right way to
go. There are some subtleties to sort out, such as how missing
arguments should be handled (allowed or error), and whether the force
should stop at ... arguments as in turning
FUN(X[[i]], ...)
into
funcall(FUN, X[[i]], ...)
There is also a change in when the evaluation of X[[i]] happens, which
may or may not matter. Some testing against CRAN/BIOC packages should
reveal how much of an issue these are.
If one had a primitive (ie, fast) call_forced, a similar technique might be better than the one I presented for fixing "apply" (cleaner, and perhaps slightly faster). I don't see how it helps for functions like lapply that are written in C, however (where help isn't needed, since there's nothing wrong with the mod in my previous message).
If we adapt the funcall approach then it would be best if the C implementation stayed as close as possible to an R reference implementation. I do not think your proposed approach would do that. mapply has its own issues with the MoreArgs argument that would be nice to sort out at the same time if possible, as there are also a few more instances of this in several places. I will try to look into this more in the next week or so. Best, luke
Radford Neal
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu