using apply with sparse matrix from package Matrix

Tue, Sep 4, 2012 4:57 PM

On Tue, Sep 4, 2012 at 10:58 AM, Martin Maechler

<maechler at stat.math.ethz.ch> wrote:

Jennifer Lyon <jennifer.s.lyon at gmail.com>
    on Fri, 31 Aug 2012 17:22:57 -0600 writes:

    > Hi:
    > I was trying to use apply on a sparse matrix from package Matrix,
    > and I get the error:

    > Error in asMethod(object) :
    > Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 106

    > Is there a way to apply a function to all the rows without bumping
    > into this problem?

    > Here is a simplified example:

    >> dim(sm)

    > [1] 72913 43052

    >> class(sm)

    > [1] "dgCMatrix"
    > attr(,"package")
    > [1] "Matrix"

    >> str(sm)

    > Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
    > ..@ i       : int [1:6590004] 789 801 802 1231 1236 11739 17817
    > 17943 18148 18676 ...
    > ..@ p       : int [1:43053] 0 147 303 450 596 751 908 1053 1188 1347 ...
    > ..@ Dim     : int [1:2] 72913 43052
    > ..@ Dimnames:List of 2
    > .. ..$ : NULL
    > .. ..$ : NULL
    > ..@ x       : num [1:6590004] 0.601 0.527 0.562 0.641 0.684 ...
    > ..@ factors : list()

    >> my.sum<-apply(sm, 1, sum)

    > Error in asMethod(object) :
    > Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 106

So, actually it would have worked (though not efficiently) if
your sm matrix would have been much smaller.

However,  we provide  rowSums(), rowMeans(), colSums(), colMeans()
for all of our matrices, including the sparse ones.

So your present problem can be solved using

my.sum <- rowSums(sm)

Best regards,
Martin Maechler, ETH Zurich

Thank you for letting me know about rowSums(). Two points.  First,
sadly, I was unclear in my posting, and using "sum" was just an
example. In the real case I am using my own function on each row. I
guess the answer for this problem is that iteration is my friend. Good
to know.

Second, since I'm embarrassed to say I hadn't remembered rowSums(), for
cases when I needed the sum of the rows, I had just been postmultiplying
by a vector of 1's.  Just FYI, I thought I should try rowSums(), so did
a small timing trial, and it appears postmultiplying is faster than
rowSums. Run is as follows:

Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
  ..@ i       : int [1:6590004] 721 926 1275 1791 2370 2755 3393 4638
5363 5566 ...
  ..@ p       : int [1:43053] 0 147 303 450 596 751 908 1053 1188 1347 ...
  ..@ Dim     : int [1:2] 72913 43052
  ..@ Dimnames:List of 2
  .. ..$ : NULL
  .. ..$ : NULL
  ..@ x       : num [1:6590004] 0.0735 0.3206 0.1861 0.1604 0.197 ...
  ..@ factors : list()

#Just checking how expensive building a vector of 1's is - not very
#at least for matrix of the size I'm interested in

test replications elapsed relative user.self sys.self
1 i1 <- rep(1, ncol(sm))          100   0.119        1      0.12        0
  user.child sys.child
1          0         0

#Postmultiplying by 1's timing

test replications elapsed relative user.self sys.self user.child
1 la <- sm %*% i1          100   5.993        1     5.993        0          0
  sys.child
1         0

#rowSums timing

test replications elapsed relative user.self sys.self
1 la1 <- rowSums(sm)          100  28.117        1    28.114    0.004
  user.child sys.child
1          0         0

#Make sure the results are the same

[1] TRUE

The Matrix package is awesome, and I appreciate you taking the
time to answer my questions.

Jen

R version 2.15.1 (2012-06-22)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] rbenchmark_0.3.1 Matrix_1.0-6     lattice_0.20-6

loaded via a namespace (and not attached):
[1] grid_2.15.1

using apply with sparse matrix from package Matrix

Thread (2 messages)