Dear all,
As far as I understand, the number of arguments in methods:::cbind is
limited by the "self recursive" construction of the function
which generates nested loops.
A workaround could be to use the internal cbind function on blocks of
non S4 objects. The limitation would then be reduced to the number of
consecutive S4 objects.
##### R code #####
dfr <- data.frame(matrix(0, nrow = 1 , ncol = 1000))
dfr2 <- is.na(dfr)
mlist <- rep(list(matrix(0, 2, 1)), 400)
cb1 <- do.call("cbind", c(mlist, mlist))
methods:::bind_activation(TRUE)
dfr2 <- is.na(dfr) # fails
cb2 <- do.call("cbind", mlist) # ok
cb3 <- do.call("cbind", c(mlist, mlist)) # fails
# This could be avoided by first checking that the arguments has no S4
# objects. If this is the case, the function falls back to the
# internal cbind function.
# But this would not be very helpful if the arguments are a mixture of
# S4 and non S4 objects
library(Matrix)
Mlist <- rep(list(Matrix(0, 2, 1)), 400)
cb4 <- do.call("cbind", Mlist) # ok
cb5 <- do.call("cbind", c(Mlist, Mlist)) # fails
cb6 <- do.call("cbind", c(Mlist, mlist)) # fails
# A workaround could be to use the internal cbind function on blocks of
# non S4 objects. The limitation would be reduced to the number of
# consecutive S4 objects
# After modifications
dfr2 <- is.na(dfr) # ok
cb7 <- do.call("cbind", mlist) # ok
cb8 <- do.call("cbind", c(mlist, mlist)) # ok
cb9 <- do.call("cbind", c(Mlist, mlist)) # ok
cb10 <- do.call("cbind", c(Mlist, Mlist)) # fails as expected
##### END #####
The code bellow gives an idea how to do it but was not fully tested!
Hope it helps,
Yohan
Index: methods/R/cbind.R
===================================================================
--- methods/R/cbind.R (revision 47045)
+++ methods/R/cbind.R (working copy)
@@ -39,11 +39,10 @@
## remove trailing 'NULL's:
while(na > 0 && is.null(argl[[na]])) { argl <- argl[-na]; na <- na - 1 }
if(na == 0) return(NULL)
- if(na == 1) {
- if(isS4(..1))
- return(cbind2(..1))
- else return(.Internal(cbind(deparse.level, ...)))
- }
+ if (!any(aS4 <- unlist(lapply(argl, isS4))))
+ return(.Internal(cbind(deparse.level, ...)))
+ if(na == 1)
+ return(cbind2(..1))
## else : na >= 2
@@ -64,6 +63,15 @@
else { ## na >= 3 arguments: -- RECURSION -- with care
## determine nrow(<result>) for e.g., cbind(diag(2), 1, 2)
## only when the last two argument have *no* dim attribute:
+ idx.aS4 <- 0
+ while (!rev(aS4)[idx.aS4+1])
+ idx.aS4 <- idx.aS4 + 1
+ if (idx.aS4 > 1) {
+ argl0 <- argl[(na-idx.aS4+1):na]
+ argl1 <- do.call(cbind, c(argl0, list(deparse.level=deparse.level)))
+ argl2 <- c(argl[1L:(na-idx.aS4)], list(argl1))
+ return(do.call(cbind, c(argl2, list(deparse.level=deparse.level))))
+ }
nrs <- unname(lapply(argl, nrow)) # of length na
iV <- sapply(nrs, is.null)# is 'vector'
fix.na <- identical(nrs[(na-1):na], list(NULL,NULL))
reduce limit number of arguments in methods:::cbind
3 messages · Yohan Chalabi, Jeff Ryan
My 2c: The real issue for me is that this approach to handling S4 objects by altering R functions for the worse is incorrect. (by calling bind_activation) m <- matrix(1:2e6L) # 2 million obs
system.time(cbind(m,m))
user system elapsed 0.027 0.017 0.044
methods:::bind_activation(TRUE)
[1] FALSE # the additional overhead of cbind is now damaging to cbind S3 methods
system.time(cbind(m,m))
user system elapsed 0.043 0.034 0.077 [~175% of the original time] Wouldn't a better near-term approach involve writing S3 methods to dispatch on.
methods:::bind_activation(FALSE) library(Matrix) M <- Matrix(1:10) cbind(M,M)
M M [1,] ? ?
cbind.dgeMatrix <- function(..., deparse.level=1) methods:::cbind(..., deparse.level=deparse.level) cbind(M,M)
10 x 2 Matrix of class "dgeMatrix"
[,1] [,2]
[1,] 1 1
[2,] 2 2
[3,] 3 3
[4,] 4 4
[5,] 5 5
[6,] 6 6
[7,] 7 7
[8,] 8 8
[9,] 9 9
[10,] 10 10
# this approach "does no harm" to regular S3 methods
system.time(cbind(m,m))
user system elapsed 0.028 0.017 0.045 Obviously this negates part of the S4 dispatch value, but that can be had by calling cbind2 directly. Jeff
Jeffrey Ryan jeffrey.ryan at insightalgo.com ia: insight algorithmics www.insightalgo.com
"JR" == "Jeff Ryan" <jeff.a.ryan at gmail.com> on Wed, 3 Dec 2008 15:22:24 -0600
JR> My 2c: JR> JR> The real issue for me is that this approach to handling S4 objects by JR> altering R functions for the worse is incorrect. (by calling JR> bind_activation) JR> JR> m <- matrix(1:2e6L) # 2 million obs JR> > system.time(cbind(m,m)) JR> user system elapsed JR> 0.027 0.017 0.044 JR> > methods:::bind_activation(TRUE) JR> [1] FALSE JR> JR> # the additional overhead of cbind is now damaging to cbind S3 methods JR> > system.time(cbind(m,m)) JR> user system elapsed JR> 0.043 0.034 0.077 [~175% of the original time] JR> JR> Wouldn't a better near-term approach involve writing S3 methods to dispatch on. JR> JR> > methods:::bind_activation(FALSE) JR> > library(Matrix) JR> > M <- Matrix(1:10) JR> > cbind(M,M) JR> M M JR> [1,] ? ? JR> JR> > cbind.dgeMatrix <- function(..., deparse.level=1) methods:::cbind(..., deparse.level=deparse.level) JR> > cbind(M,M) JR> 10 x 2 Matrix of class "dgeMatrix" JR> [,1] [,2] JR> [1,] 1 1 JR> [2,] 2 2 JR> [3,] 3 3 JR> [4,] 4 4 JR> [5,] 5 5 JR> [6,] 6 6 JR> [7,] 7 7 JR> [8,] 8 8 JR> [9,] 9 9 JR> [10,] 10 10 JR> JR> # this approach "does no harm" to regular S3 methods JR> > system.time(cbind(m,m)) JR> user system elapsed JR> 0.028 0.017 0.045 JR> JR> Obviously this negates part of the S4 dispatch value, but that can be JR> had by calling cbind2 directly. JR> JR> JR> Jeff There is no surprise that the default cbind function is faster because it calls a C routine in contrast with methods:::cbind which reduces the problem "self recursively" to two arguments. if you try the patch I sent in my previous message, you will notice that the problem you mentioned is essentially improved. regards, Yohan