Skip to content

reduce limit number of arguments in methods:::cbind

3 messages · Yohan Chalabi, Jeff Ryan

#
Dear all,

As far as I understand, the number of arguments in methods:::cbind is
limited by the "self recursive" construction of the function
which generates nested loops.

A workaround could be to use the internal cbind function on blocks of
non S4 objects. The limitation would then be reduced to the number of
consecutive S4 objects.

##### R code #####

dfr  <- data.frame(matrix(0, nrow = 1 , ncol = 1000))
dfr2 <- is.na(dfr)

mlist <- rep(list(matrix(0, 2, 1)), 400)
cb1 <- do.call("cbind", c(mlist, mlist))

methods:::bind_activation(TRUE)

dfr2 <- is.na(dfr) # fails
cb2 <- do.call("cbind", mlist) # ok
cb3 <- do.call("cbind", c(mlist, mlist)) # fails

# This could be avoided by first checking that the arguments has no S4
# objects. If this is the case, the function falls back to the
# internal cbind function.

# But this would not be very helpful if the arguments are a mixture of
# S4 and non S4 objects

library(Matrix)

Mlist <- rep(list(Matrix(0, 2, 1)), 400)

cb4 <- do.call("cbind", Mlist) # ok
cb5 <- do.call("cbind", c(Mlist, Mlist)) # fails
cb6 <- do.call("cbind", c(Mlist, mlist)) # fails

# A workaround could be to use the internal cbind function on blocks of
# non S4 objects. The limitation would be reduced to the number of
# consecutive S4 objects

# After modifications

dfr2 <- is.na(dfr) # ok
cb7 <- do.call("cbind", mlist) # ok
cb8 <- do.call("cbind", c(mlist, mlist)) # ok

cb9 <- do.call("cbind", c(Mlist, mlist)) # ok

cb10 <- do.call("cbind", c(Mlist, Mlist)) # fails as expected

##### END #####

The code bellow gives an idea how to do it but was not fully tested!

Hope it helps,
Yohan


Index: methods/R/cbind.R
===================================================================
--- methods/R/cbind.R	(revision 47045)
+++ methods/R/cbind.R	(working copy)
@@ -39,11 +39,10 @@
     ## remove trailing 'NULL's:
     while(na > 0 && is.null(argl[[na]])) { argl <- argl[-na]; na <- na - 1 }
     if(na == 0) return(NULL)
-    if(na == 1) {
-	if(isS4(..1))
-	    return(cbind2(..1))
-	else return(.Internal(cbind(deparse.level, ...)))
-    }
+    if (!any(aS4 <- unlist(lapply(argl, isS4))))
+        return(.Internal(cbind(deparse.level, ...)))
+    if(na == 1)
+        return(cbind2(..1))

     ## else :  na >= 2

@@ -64,6 +63,15 @@
     else { ## na >= 3 arguments: -- RECURSION -- with care
 	## determine nrow(<result>)  for e.g.,	cbind(diag(2), 1, 2)
 	## only when the last two argument have *no* dim attribute:
+        idx.aS4 <- 0
+        while (!rev(aS4)[idx.aS4+1])
+            idx.aS4 <- idx.aS4 + 1
+        if (idx.aS4 > 1) {
+            argl0 <- argl[(na-idx.aS4+1):na]
+            argl1 <- do.call(cbind, c(argl0, list(deparse.level=deparse.level)))
+            argl2 <- c(argl[1L:(na-idx.aS4)], list(argl1))
+            return(do.call(cbind, c(argl2, list(deparse.level=deparse.level))))
+        }
 	nrs <- unname(lapply(argl, nrow)) # of length na
 	iV <- sapply(nrs, is.null)# is 'vector'
 	fix.na <- identical(nrs[(na-1):na], list(NULL,NULL))
#
My 2c:

The real issue for me is that this approach to handling S4 objects by
altering R functions for the worse is incorrect. (by calling
bind_activation)

m <- matrix(1:2e6L)  # 2 million obs
user  system elapsed
  0.027   0.017   0.044
[1] FALSE

# the additional overhead of cbind is now damaging to cbind S3 methods
user  system elapsed
  0.043   0.034   0.077 [~175% of the original time]

Wouldn't a better near-term approach involve writing S3 methods to dispatch on.
M M
[1,] ? ?
10 x 2 Matrix of class "dgeMatrix"
      [,1] [,2]
 [1,]    1    1
 [2,]    2    2
 [3,]    3    3
 [4,]    4    4
 [5,]    5    5
 [6,]    6    6
 [7,]    7    7
 [8,]    8    8
 [9,]    9    9
[10,]   10   10

# this approach "does no harm" to regular S3 methods
user  system elapsed
  0.028   0.017   0.045

Obviously this negates part of the S4 dispatch value, but that can be
had by calling cbind2 directly.


Jeff
#
JR> My 2c:
   JR> 
   JR> The real issue for me is that this approach to handling S4 objects by
   JR> altering R functions for the worse is incorrect. (by calling
   JR> bind_activation)
   JR> 
   JR> m <- matrix(1:2e6L)  # 2 million obs
   JR> > system.time(cbind(m,m))
   JR>    user  system elapsed
   JR>   0.027   0.017   0.044
   JR> > methods:::bind_activation(TRUE)
   JR> [1] FALSE
   JR> 
   JR> # the additional overhead of cbind is now damaging to cbind S3 methods
   JR> > system.time(cbind(m,m))
   JR>    user  system elapsed
   JR>   0.043   0.034   0.077 [~175% of the original time]
   JR> 
   JR> Wouldn't a better near-term approach involve writing S3 methods to dispatch on.
   JR> 
   JR> > methods:::bind_activation(FALSE)
   JR> > library(Matrix)
   JR> > M <- Matrix(1:10)
   JR> > cbind(M,M)
   JR>      M M
   JR> [1,] ? ?
   JR> 
   JR> > cbind.dgeMatrix <- function(..., deparse.level=1) methods:::cbind(..., deparse.level=deparse.level)
   JR> > cbind(M,M)
   JR> 10 x 2 Matrix of class "dgeMatrix"
   JR>       [,1] [,2]
   JR>  [1,]    1    1
   JR>  [2,]    2    2
   JR>  [3,]    3    3
   JR>  [4,]    4    4
   JR>  [5,]    5    5
   JR>  [6,]    6    6
   JR>  [7,]    7    7
   JR>  [8,]    8    8
   JR>  [9,]    9    9
   JR> [10,]   10   10
   JR> 
   JR> # this approach "does no harm" to regular S3 methods
   JR> > system.time(cbind(m,m))
   JR>    user  system elapsed
   JR>   0.028   0.017   0.045
   JR> 
   JR> Obviously this negates part of the S4 dispatch value, but that can be
   JR> had by calling cbind2 directly.
   JR> 
   JR> 
   JR> Jeff


There is no surprise that the default cbind function is faster because
it calls a C routine in contrast with methods:::cbind  which reduces
the problem "self recursively" to two arguments.

if you try the patch I sent in my previous message, you will notice
that the problem you mentioned is essentially improved.

regards,
Yohan