Skip to content

[Rcpp-devel] Result of Rcpp Wrap() for Sparse Matrix

2 messages · Dmitriy Selivanov, Dirk Eddelbuettel

#
My 2 cents. Last couple of years I used sparse matrices a lot. Matrix
package is really great. I'm not sure I understand issue with wrapping - as
Doug said CSC format is main in both Armadillo and Matrix. Given matrix in
CSC format (dgCMatrix/CsparseMatrix) it is trivial to convert it to COO or
CSR with as(x, "TsparseMatrix") / as(x, "RsparseMatrix").

Second point is about slam package and COO format. I didn't use it, but
used scipy, Armadillo, Eigen. And none of these packages use COO format for
operations on matrices... I doubt it could be efficient.

Third point is that I have feeling that nowadays CSR format is more
mainstream. For instance Eigen implements multithreaded sparse - dense
multiplications and sparse solvers (
https://eigen.tuxfamily.org/dox/TopicMultiThreading.html). Same story about
sparse BLAS with Intel MKL - it works with CSR matrices. I realize that CSR
= transposed CSC, but still it is not convenient to transpose mind each
time. (Would be great to add more support for CSR matrices, but this is out
of scope of this discussion).

And last my observation - I agree with Doug that it seems that Eigen has
much stronger support for operations with sparse matrices.

14 ???. 2017 ?. 19:55 ???????????? <rcpp-devel-request at lists.r-
forge.r-project.org> ???????:

Send Rcpp-devel mailing list submissions to
        rcpp-devel at lists.r-forge.r-project.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo
/rcpp-devel

or, via email, send a message with subject or body 'help' to
        rcpp-devel-request at lists.r-forge.r-project.org

You can reach the person managing the list at
        rcpp-devel-owner at lists.r-forge.r-project.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Rcpp-devel digest..."


Today's Topics:

   1. Re: [RcppArmadillo] Result of Rcpp Wrap() for Sparse      Matrix
      (Douglas Bates)
   2. Re: [RcppArmadillo] Result of Rcpp Wrap() for Sparse Matrix
      (Dirk Eddelbuettel)
   3. Re: [RcppArmadillo] Result of Rcpp Wrap() for Sparse Matrix
      (Serguei Sokol)
   4. Re: [RcppArmadillo] Result of Rcpp Wrap() for Sparse      Matrix
      (Douglas Bates)
   5. Re: [RcppArmadillo] Result of Rcpp Wrap() for Sparse Matrix
      (Serguei Sokol)


----------------------------------------------------------------------

Message: 1
Date: Wed, 14 Jun 2017 13:21:54 +0000
From: Douglas Bates <bates at stat.wisc.edu>
To: serguei.sokol at gmail.com, Binxiang Ni <binxiangni at gmail.com>,
        rcpp-devel at lists.r-forge.r-project.org
Subject: Re: [Rcpp-devel] [RcppArmadillo] Result of Rcpp Wrap() for
        Sparse  Matrix
Message-ID:
        <CAO7JsnTo8bA6LTsHz0udyjF-KAaE2kp-rKb13tyudg6gV=JiAQ at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

On Wed, Jun 14, 2017 at 3:59 AM Serguei Sokol <serguei.sokol at gmail.com>
wrote:
That is the format of the dgTMatrix class from the Matrix package for R but
not, as far as I can tell, in Armadillo.  A brief glance at the Armadillo
documentation indicates that sparse matrices are always in the compressed
sparse column (CSC) format.

I would point out that the sparse matrix facilities in Eigen and RcppEigen
are much more extensive than those in Armadillo.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/
attachments/20170614/bc4c3163/attachment-0001.html>

------------------------------

Message: 2
Date: Wed, 14 Jun 2017 09:01:30 -0500
From: Dirk Eddelbuettel <edd at debian.org>
To: serguei.sokol at gmail.com
Cc: rcpp-devel at lists.r-forge.r-project.org, Binxiang Ni
        <binxiangni at gmail.com>
Subject: Re: [Rcpp-devel] [RcppArmadillo] Result of Rcpp Wrap() for
        Sparse Matrix
Message-ID: <22849.16826.22396.339250 at max.eddelbuettel.com>
Content-Type: text/plain; charset=iso-8859-1
On 14 June 2017 at 11:00, Serguei Sokol wrote:
| Le 13/06/2017 ? 18:24, Douglas Bates a ?crit :
| > On Tue, Jun 13, 2017 at 10:56 AM Binxiang Ni <binxiangni at gmail.com
<mailto:binxiangni at gmail.com>> wrote:
| >
| >     Hi,
| >
| >     I am working on fixing sparse matrix conversion for RcppArmadillo.
Now a problem comes up to me: what kind of sparse matrix is expected to
pass from
| >     Armadillo to R? That is, what should the result of wrap() be?
dgCMatrix(if logical, lgCMatrix or ngCMatrix)  or their original type?
| >
| >
| > What do you mean by "their original type"?
| >
| > It seems that the correspondence is
| > Armadillo           Matrix package
| > sp_mat       <=> dgCMatrix
| > sp_cx_mat <=> zgCMatrix
| > sp_imat      <=> igCMatrix
| I would also consider the format used in a package slam.
| It simply stores the indexes and non-zero values in a triplet (i,j,v).

There is more here:  https://en.wikipedia.org/wiki/Sparse_matrix

But it would probably be good to hear from some actual users of sparse
matrices such as Doug (thanks for piping in already!), Soren or anybody else
with exposure to sparse matrices, ideally via CRAN packages we can wire up
for testing.


Dirk

--
http://dirk.eddelbuettel.com | @eddelbuettel | edd at debian.org


------------------------------

Message: 3
Date: Wed, 14 Jun 2017 16:06:58 +0200
From: Serguei Sokol <serguei.sokol at gmail.com>
To: Douglas Bates <bates at stat.wisc.edu>, Binxiang Ni
        <binxiangni at gmail.com>, rcpp-devel at lists.r-forge.r-project.org
Subject: Re: [Rcpp-devel] [RcppArmadillo] Result of Rcpp Wrap() for
        Sparse Matrix
Message-ID: <1ade20d8-f08c-df7b-e147-539e4b1babff at gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed

Le 14/06/2017 ? 15:21, Douglas Bates a ?crit :
<mailto:serguei.sokol at gmail.com>> wrote:
<mailto:binxiangni at gmail.com> <mailto:binxiangni at gmail.com
RcppArmadillo. Now a problem comes up to me: what kind of sparse matrix is
expected to pass from
dgCMatrix(if logical, lgCMatrix or ngCMatrix)  or their original type?
but not, as far as I can tell, in Armadillo.  A brief glance at the
Armadillo
sparse column (CSC) format.
Indeed, but nothing prevents Binxiang to develop a wrap() that will convert
armadillo format to one or many of R formats, right?


------------------------------

Message: 4
Date: Wed, 14 Jun 2017 15:33:05 +0000
From: Douglas Bates <bates at stat.wisc.edu>
To: serguei.sokol at gmail.com, Binxiang Ni <binxiangni at gmail.com>,
        rcpp-devel at lists.r-forge.r-project.org
Subject: Re: [Rcpp-devel] [RcppArmadillo] Result of Rcpp Wrap() for
        Sparse  Matrix
Message-ID:
        <CAO7JsnT7cj3pAqF7rEJ-EV_qke+Lr2suZu1AVpkPNT2O4bjVcg at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

On Wed, Jun 14, 2017 at 9:06 AM Serguei Sokol <serguei.sokol at gmail.com>
wrote:
convert
Why? Is there a reason for doing type conversion from the dgCMatrix format
to another format in an Rcpp wrap function instead of with the existing
functions from the Matrix package?

Bear in mind that dgCMatrix is an efficient format both in terms of the
amount of memory required  (that's the "compressed" part of the name) and
in terms of performing operations with the matrix.  Most operations on
sparse matrices stored in the triplet format start by creating a CSC or CSR
(compressed sparse row) form of the matrix anyway.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/
attachments/20170614/7a9fef46/attachment-0001.html>

------------------------------

Message: 5
Date: Wed, 14 Jun 2017 17:55:49 +0200
From: Serguei Sokol <serguei.sokol at gmail.com>
To: Douglas Bates <bates at stat.wisc.edu>, Binxiang Ni
        <binxiangni at gmail.com>, rcpp-devel at lists.r-forge.r-project.org
Subject: Re: [Rcpp-devel] [RcppArmadillo] Result of Rcpp Wrap() for
        Sparse Matrix
Message-ID: <4cbe9f66-6755-70bc-3a59-2a660015b596 at gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed

Le 14/06/2017 ? 17:33, Douglas Bates a ?crit :
<mailto:serguei.sokol at gmail.com>> wrote:
serguei.sokol at gmail.com <mailto:serguei.sokol at gmail.com> <mailto:
serguei.sokol at gmail.com
binxiangni at gmail.com <mailto:binxiangni at gmail.com> <mailto:
binxiangni at gmail.com
binxiangni at gmail.com>
wrote:
RcppArmadillo. Now a problem comes up to me: what kind of sparse matrix is
expected to
wrap() be? dgCMatrix(if logical, lgCMatrix or ngCMatrix)  or their original
type?
(i,j,v).
for R but not, as far as I can tell, in Armadillo.  A brief glance at the
Armadillo
compressed sparse column (CSC) format.
convert
Sure, Matrix is very versatile and rich in features but the price for this
is its heavy weight.
It can take several seconds to load it up. On my rather mighty PC (Intel
Xeon E5-2609 v2 @ 2.50GHz with 16 GB of memory),
I have:
 > system.time(library(Matrix))
utilisateur     syst?me      ?coul?
       1.427       0.052       1.619

I don't have my laptop here but the load time can be longer.
While for slam it takes only a fraction of second:

 > system.time(library(slam))
utilisateur     syst?me      ?coul?
       0.012       0.000       0.011
When slam can suffice, why not to use it?
another format in an Rcpp wrap function instead of with the existing
functions
amount of memory required  (that's the "compressed" part of the name) and
in terms of
matrices stored in the triplet format start by creating a CSC or CSR
(compressed sparse row)
In Matrix package, I presume?
Few basic operations that I have seen in slam, stay with triplet format.
So if a user did not load Matrix package and want to use e.g. slam format,
it would be great if wrap() could give him expected format.


------------------------------

_______________________________________________
Rcpp-devel mailing list
Rcpp-devel at lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

End of Rcpp-devel Digest, Vol 92, Issue 12
******************************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20170614/7e595cd2/attachment-0001.html>
#
On 14 June 2017 at 20:44, Dmitriy Selivanov wrote:
| My 2 cents. Last couple of years I used sparse matrices a lot. Matrix package
| is really great. I'm not sure I understand issue with wrapping - as Doug said
| CSC format is main in both Armadillo and Matrix. Given matrix in CSC format
| (dgCMatrix/CsparseMatrix) it is trivial to convert it to COO or CSR with as(x,
| "TsparseMatrix") / as(x, "RsparseMatrix").
| 
| Second point is about slam package and COO format. I didn't use it, but used
| scipy, Armadillo, Eigen. And none of these packages use COO format for
| operations on matrices... I doubt it could be efficient.
| 
| Third point is that I have feeling that nowadays CSR format is more mainstream.
| For instance Eigen implements multithreaded sparse - dense multiplications and
| sparse solvers (https://eigen.tuxfamily.org/dox/TopicMultiThreading.html). Same
| story about sparse BLAS with Intel MKL - it works with CSR matrices. I realize
| that CSR = transposed CSC, but still it is not convenient to transpose mind
| each time. (Would be great to add more support for CSR matrices, but this is
| out of scope of this discussion).

Really nice summary, and very helpful because ...
 
| And last my observation - I agree with Doug that it seems that Eigen has much
| stronger support for operations with sparse matrices.

... this year we have Binxiang trying to get (Rcpp)Armadillo closer to
(Rcpp)Eigen in terms of sparse matrix support.  It is a valid goal because
many of really like Armadillo yet have more needs for sparse matrix support
with Armadillo and eg MLPACK or other things built on top of Armadillo.

I also want to add that the load-time critique with respect to Matrix hits
more on whether S4 is a good or bad idea (and for something as complex and
feature-rich as Matrix it is almost certainly a good one) and has little to
do with the representation of sparse matrix indices.

Dirk

PS Please consider removing quoted text. This message I am replying to
exceeded the size limit so I had to manually approve it (and I also
incremented the limit from the old, small default, but still...)