using apply with sparse matrix from package Matrix
On Tue, Sep 4, 2012 at 10:58 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
Jennifer Lyon <jennifer.s.lyon at gmail.com>
on Fri, 31 Aug 2012 17:22:57 -0600 writes:
> Hi:
> I was trying to use apply on a sparse matrix from package Matrix,
> and I get the error:
> Error in asMethod(object) :
> Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 106
> Is there a way to apply a function to all the rows without bumping
> into this problem?
> Here is a simplified example:
>> dim(sm)
> [1] 72913 43052
>> class(sm)
> [1] "dgCMatrix"
> attr(,"package")
> [1] "Matrix"
>> str(sm)
> Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
> ..@ i : int [1:6590004] 789 801 802 1231 1236 11739 17817
> 17943 18148 18676 ...
> ..@ p : int [1:43053] 0 147 303 450 596 751 908 1053 1188 1347 ...
> ..@ Dim : int [1:2] 72913 43052
> ..@ Dimnames:List of 2
> .. ..$ : NULL
> .. ..$ : NULL
> ..@ x : num [1:6590004] 0.601 0.527 0.562 0.641 0.684 ...
> ..@ factors : list()
>> my.sum<-apply(sm, 1, sum)
> Error in asMethod(object) :
> Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 106
So, actually it would have worked (though not efficiently) if your sm matrix would have been much smaller. However, we provide rowSums(), rowMeans(), colSums(), colMeans() for all of our matrices, including the sparse ones. So your present problem can be solved using my.sum <- rowSums(sm) Best regards, Martin Maechler, ETH Zurich
Thank you for letting me know about rowSums(). Two points. First, sadly, I was unclear in my posting, and using "sum" was just an example. In the real case I am using my own function on each row. I guess the answer for this problem is that iteration is my friend. Good to know. Second, since I'm embarrassed to say I hadn't remembered rowSums(), for cases when I needed the sum of the rows, I had just been postmultiplying by a vector of 1's. Just FYI, I thought I should try rowSums(), so did a small timing trial, and it appears postmultiplying is faster than rowSums. Run is as follows:
str(sm)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:6590004] 721 926 1275 1791 2370 2755 3393 4638 5363 5566 ... ..@ p : int [1:43053] 0 147 303 450 596 751 908 1053 1188 1347 ... ..@ Dim : int [1:2] 72913 43052 ..@ Dimnames:List of 2 .. ..$ : NULL .. ..$ : NULL ..@ x : num [1:6590004] 0.0735 0.3206 0.1861 0.1604 0.197 ... ..@ factors : list()
library(rbenchmark)
#Just checking how expensive building a vector of 1's is - not very #at least for matrix of the size I'm interested in
benchmark(i1<-rep(1, ncol(sm)))
test replications elapsed relative user.self sys.self 1 i1 <- rep(1, ncol(sm)) 100 0.119 1 0.12 0 user.child sys.child 1 0 0 #Postmultiplying by 1's timing
benchmark(la<-sm %*% i1)
test replications elapsed relative user.self sys.self user.child 1 la <- sm %*% i1 100 5.993 1 5.993 0 0 sys.child 1 0 #rowSums timing
benchmark(la1<-rowSums(sm))
test replications elapsed relative user.self sys.self 1 la1 <- rowSums(sm) 100 28.117 1 28.114 0.004 user.child sys.child 1 0 0 #Make sure the results are the same
all(la==la1)
[1] TRUE The Matrix package is awesome, and I appreciate you taking the time to answer my questions. Jen
sessionInfo()
R version 2.15.1 (2012-06-22) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rbenchmark_0.3.1 Matrix_1.0-6 lattice_0.20-6 loaded via a namespace (and not attached): [1] grid_2.15.1