Sparse matrix performance question
On Mon, Dec 6, 2010 at 1:11 PM, scott white <distributedintel at gmail.com> wrote:
Btw, forgot to mention I am using the standard Matrix package and I am running version 2.10.1 of R. On Mon, Dec 6, 2010 at 11:04 AM, scott white <distributedintel at gmail.com>wrote:
I have a very sparse square matrix which is < 20K rows & columns and I am trying to row standardize the matrix for the rows that have non-missing value as follows: row_sums <- rowSums(M,na.rm=TRUE) nonzero_idxs <- which(row_sums>0) nonzero_M <- M[nonzero_idxs,]/row_sums[nonzero_idxs] M[nonzero_idxs,] <- nonzero_M
Assignment of submatrices in a sparse matrix can be slow because there is so much checking that needs to be done. It is probably easier to do the calculation directly on the data component of the matrix and generate a new one. The tricky bit to remember is that the indices in the sparse matrix representation are 0-based so you need to add 1 when using them in R. I enclose a transcript.
Each line completes well under a second except the last line which takes well over 10 seconds which is simply assigning the sub-matrix of rows that have non-missing values to the complete matrix. I am curious to know why it is so slow and how to speed it up. Should I be doing this differently or try a different sparse matrix library? Any feedback is appreciated. thanks, Scott
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-------------- next part -------------- R version 2.12.0 (2010-10-15) Copyright (C) 2010 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R.
library(Matrix)
Loading required package: lattice
Attaching package: 'Matrix'
The following object(s) are masked from 'package:base':
det
set.seed(1234) M <- sparseMatrix(i=sample(5000, 1000, replace=TRUE),
+ j=sample(5000, 1000, replace=TRUE), + x=rnorm(1000), dims=c(5000, 5000))
str(M)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:1000] 2014 549 1098 3137 130 1523 2198 3921 4323 931 ... ..@ p : int [1:5001] 0 0 0 0 0 0 0 0 0 0 ... ..@ Dim : int [1:2] 5000 5000 ..@ Dimnames:List of 2 .. ..$ : NULL .. ..$ : NULL ..@ x : num [1:1000] -0.4236 -0.5322 0.0675 -0.4105 -2.3708 ... ..@ factors : list()
range(M at i)
[1] 1 4996
str(rs <- rowSums(M, na.rm=TRUE))
num [1:5000] 0 0.501 0 0.598 -0.957 ...
res <- sparseMatrix(i=M at i, p=M at p, dims=M at Dim,
+ x=M at x/rs[M at i + 1L], index1=FALSE)
str(res)
Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ..@ i : int [1:1000] 2014 549 1098 3137 130 1523 2198 3921 4323 931 ... ..@ p : int [1:5001] 0 0 0 0 0 0 0 0 0 0 ... ..@ Dim : int [1:2] 5000 5000 ..@ Dimnames:List of 2 .. ..$ : NULL .. ..$ : NULL ..@ x : num [1:1000] 1 1 1 -0.655 1 ... ..@ factors : list()
table(rowSums(res))
0 1 4082 918
proc.time()
user system elapsed 3.010 0.120 3.612