Skip to content

iterated lapply

4 messages · Radford Neal, Luke Tierney

#
I think the discussion of this issue has gotten more complicated than
necessary.

First, there really is a bug.  You can see this also by the fact that
delayed warning messages are wrong.  For instance, in R-3.1.2:

  > lapply(c(-1,2,-1),sqrt)
  [[1]]
  [1] NaN
  
  [[2]]
  [1] 1.414214
  
  [[3]]
  [1] NaN
  
  Warning messages:
  1: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced
  2: In FUN(c(-1, 2, -1)[[3L]], ...) : NaNs produced
  
The first warning message should have "1L" rather than "3L".  It
doesn't because lapply made a destructive change to the R expression
that was evaluated for the first element.  Throughout the R
interpreter, there is a general assumption that expressions that are
or were evaluated are immutable, which lapply is not abiding by.  The
only question is whether the bugs from this are sufficiently obscure
that it's worth keeping them for the gain in speed, but the speed cost
of fixing it is fairly small (though it's not negligible when the
function applied is something simple like sqrt).

The fix in the C code for lapply, vapply, and eapply is easy: Rather
than create an R expression such as FUN(X[[1L]]) for the first
function call, and then modify it in place to FUN(X[[2L]]), and so
forth, just create a new expression for each iteration.  This requires
allocating a few new CONS cells each iteration, which does have a
cost, but not a huge one.  It's certainly easier and faster than
creating a new environment (and also less likely to cause
incompatibilities).

The R code for apply can be changed to use the same approach, 
rather than using expressions such as FUN(X[i,]), where i is an
index variable, it can create expressions like FUN(X[1L,]), then
FUN(X[2L,]), etc.  The method for this is simple, like so:

  > a <- quote(FUN(X[i,]))     # "i" could be anything
  > b <- a; b[[c(2,3)]] <- 1L  # change "i" to 1L (creates new expr)

This has the added advantage of making error messages refer to the
actual index, not to "i", which has no meaning if you haven't looked
at the source code for apply (and which doesn't tell you which element
led to the error even if you know what "i" does).

I've implemented this in the development version of pqR, on the
development branch 31-apply-fix, at

  https://github.com/radfordneal/pqR/tree/31-apply-fix

The changes are in src/main/apply.R, src/main/envir.R, and
src/library/base/R/apply.R, plus a new test in tests/apply.R.  You can
compare to branch 31 to see what's changed.  (Note rapply seems to not
have had a problem, and that other apply functions just use these, so
should be fixed as well.)  There are also other optimizations in pqR
for these functions but the code is still quite similar to R-3.1.2.

   Radford Neal
#
On Sun, 1 Mar 2015, Radford Neal wrote:

            
The discussion has gotten no more complicated than it needs to
be. There are other instances, such as Reduce where there is a bug
report pending that amounts to the same issue.  Performing surgery on
expressions and calling eval is not good practice at the R level and
probably not a good idea at the C level either.  It is worth thinking
this through carefully before a adopting a solution, which is what we
will be doing.

Best,

luke

  
    
#
Surgery on expressions is what lapply does at the moment.  My change
makes it no longer do that.  

There is a general problem that lazy evaluation can have the effect
of making the internal details of how an R function like "apply" is
implemented leak into its semantics.  That's what's going on with
the Reduce bug (16093) too.  

I think one can avoid this by defining the following function for
calling a function with evaluation of arguments forced (ie, lazy
evaluation disabled):

  call_forced <- function (f, ...) { list (...); f (...) }

(Of course, for speed one could make this a primitive function, which
wouldn't actually build a list.)

Then the critical code in Reduce could be changed from

  for (i in rev(ind)) init <- f(x[[i]], init)

to

  for (i in rev(ind)) init <- call_forced (f, x[[i]], init)

If one had a primitive (ie, fast) call_forced, a similar technique
might be better than the one I presented for fixing "apply" (cleaner,
and perhaps slightly faster).  I don't see how it helps for functions
like lapply that are written in C, however (where help isn't needed,
since there's nothing wrong with the mod in my previous message).

   Radford Neal
#
On Sun, 1 Mar 2015, Radford Neal wrote:

            
This is the option I was suggesting as a possibility in my reply to
Bill Dunlap -- I called it funcall. This may be the right way to
go. There are some subtleties to sort out, such as how missing
arguments should be handled (allowed or error), and whether the force
should stop at ... arguments as in turning

     FUN(X[[i]], ...)

into

      funcall(FUN, X[[i]], ...)

There is also a change in when the evaluation of X[[i]] happens, which
may or may not matter.  Some testing against CRAN/BIOC packages should
reveal how much of an issue these are.
If we adapt the funcall approach then it would be best if the C
implementation stayed as close as possible to an R reference
implementation. I do not think your proposed approach would do that.

mapply has its own issues with the MoreArgs argument that would be
nice to sort out at the same time if possible, as there are also a few
more instances of this in several places.

I will try to look into this more in the next week or so.

Best,

luke