Skip to content

wish list: generalized apply

4 messages · David Winsemius, John Nolan, Joshua Ulrich

#
Dear All,

I regularly want to "apply" some function to an array in a way that the arguments to the user function depend on the index on which the apply is working.  A simple example is:

A <- array( runif(160), dim=c(5,4,8) )
x <- matrix( runif(32), nrow=4, ncol=8 ) 
b <- runif(8)
f1 <- function( A, x, b ) { sum( A %*% x ) + b } 
result <- rep(0.0,8) 
for (i in 1:8) {
  result[i] <- f1( A[,,i], x[,i] , b[i] )
}

This works, but is slow.  I'd like to be able to do something like:
    generalized.apply( A, MARGIN=3, FUN=f1, list(x=x,MARGIN=2), list(b=b,MARGIN=1) ), where the lists tell generalized.apply to pass x[,i] and b[i] to FUN in addition to A[,,i].  

Does such a generalized.apply already exist somewhere?  While I can write a C function to do a particular case, it would be nice if there was a fast, general way to do this.  

John

............................................................................................

John P. Nolan
Math/Stat Dept., American University
Gray Hall, 4400 Massachusetts Ave, NW
Washington, DC 20016-8050
Phone: 202-885-3140
E-mail:  jpnolan at american.edu
Web:   http://fs2.american.edu/jpnolan/www/
#
I would have thought that this would achieve the same result:

result <- sapply( seq_along(b) , function(i) { f1( A[,,i], x[,i] , b[i] )} )

Or: 

result <- sapply( seq.int( dim(A)[3] ) , function(i) { f1( A[,,i], x[,i] , b[i] )} )

(I doubt it will be any faster, but if 'i' is large, parallelism might help. The inner function appears to be fairly efficient.)
#
-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net] 
Sent: Thursday, December 8, 2016 4:59 PM
To: John P. Nolan <jpnolan at american.edu>
Cc: Charles C. Berry <R-devel at r-project.org>
Subject: Re: [Rd] wish list: generalized apply
I would have thought that this would achieve the same result:

result <- sapply( seq_along(b) , function(i) { f1( A[,,i], x[,i] , b[i] )} )

Or: 

result <- sapply( seq.int( dim(A)[3] ) , function(i) { f1( A[,,i], x[,i] , b[i] )} )

(I doubt it will be any faster, but if 'i' is large, parallelism might help. The inner function appears to be fairly efficient.)
#
On Thu, Dec 8, 2016 at 3:59 PM, David Winsemius <dwinsemius at comcast.net> wrote:
You're right, it's slower.  Despite how often it's repeated that
"loops in R are slow", they're not *that* slow.  They're often faster
than the *apply functions, especially if they have been "compiled" by
compiler::cmpfun().

You really need to know *why* code is slow before trying to make it
faster.  I profiled an example that would have a loop with 1e6
iterations and 80%+ of the time was still spent inside f1().

set.seed(21)
nc <- 1e6
nr <- 10
A <- array( runif(5*nr*nc), dim=c(5,nr,nc) )
x <- matrix( runif(nr*nc), nrow=nr, ncol=nc )
b <- runif(nc)
f1 <- compiler::cmpfun(function( A, x, b ) { sum( A %*% x ) + b })
f2 <- compiler::cmpfun({
  function(A, x, b, FUN) {
    result <- numeric(length(b))
    for (i in seq_along(b)) {
      result[i] <- FUN( A[,,i], x[,i] , b[i] )
    }
    return(result)
  }
})
Rprof(interval=0.01)
result <- f2(A,x,b,f1)
Rprof(NULL)
summaryRprof()

$by.self
      self.time self.pct total.time total.pct
"FUN"      4.29    84.28       4.76     93.52
"%*%"      0.47     9.23       0.47      9.23
"f2"       0.33     6.48       5.09    100.00

$by.total
      total.time total.pct self.time self.pct
"f2"        5.09    100.00      0.33     6.48
"FUN"       4.76     93.52      4.29    84.28
"%*%"       0.47      9.23      0.47     9.23

$sample.interval
[1] 0.01

$sampling.time
[1] 5.09

In this case, almost all the time is spent evaluating f1() ("FUN"),
even after calling compiler::cmpfun on f1() and on a function
containing the loop.  Making the looping construct faster is not going
to improve the performance of this code by a significant amount.
I.e., dropping to compiled code will only help if you avoid the R
function call, but then that's not a general solution...