Skip to content

[Rcpp-devel] Sparse matrices with RcppArmadillo

11 messages · Søren Højsgaard, c s, Douglas Bates +2 more

#
Dear all,

I want to use a matrix (of type "dgCMatrix" from the Matrix package) in RcppArmadillo, so I do:

library(inline)
src <- '
using namespace arma;
using namespace Rcpp;
SpMat<double> X = as<SpMat<double> >(XX_);
'
foo <- cxxfunction(signature(XX_=""), body=src, plugin="RcppArmadillo")

- but this fails. It seems to me (browsing the web) that SpMat are supported, but I might be wrong here. I have no indication that dgCMatrix matrices can be "converted" to SpMat's. I know that I can work with dgCMatrix matrices with RcppEigen, but what I need is to extract submatrices and that is very easy to do with RcppArmadillo.

Any thoughts? Thanks in advance.
Regards
S?ren
#
Hi S?ren,
On 8 December 2012 at 08:45, S?ren H?jsgaard wrote:
| I want to use a matrix (of type "dgCMatrix" from the Matrix package) in RcppArmadillo, so I do:
| 
| library(inline)
| src <- '
| using namespace arma;
| using namespace Rcpp;
| SpMat<double> X = as<SpMat<double> >(XX_);
| '
| foo <- cxxfunction(signature(XX_=""), body=src, plugin="RcppArmadillo")
| 
| - but this fails. It seems to me (browsing the web) that SpMat are supported, but I might be wrong here. I have no indication that dgCMatrix matrices can be "converted" to SpMat's. I know that I can work with dgCMatrix matrices with RcppEigen, but what I need is to extract submatrices and that is very easy to do with RcppArmadillo.
| 
| Any thoughts? Thanks in advance.

I fear that we only have as<>() and wrap() converters for dense Armadillo
matrices.

Dirk
#
Le 08/12/12 09:45, S?ren H?jsgaard a ?crit :
Doug might know better about the internals of Matrix types. This is just 
following the recipee from how these are handled in RcppEigen:

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp ;

// [[Rcpp::export]]
void convert(S4 mat){
     IntegerVector dims = mat.slot( "Dim" ) ;
     IntegerVector i = mat.slot( "i" ) ;
     IntegerVector p = mat.slot( "p" ) ;
     NumericVector x = mat.slot( "x" ) ;

     int nrow = dims[0], ncol = dims[1] ;
     arma::sp_mat res( nrow, ncol) ;
     for(int j = 0; j < ncol; ++j) {
         for (int k = p[j]; k < p[j + 1]; ++k) res( i[k], j ) = x[k];
     }
     std::cout << res << std::endl ;

}


/*** R
     require(Matrix)
     i <- c(1,3:8); j <- c(2,9,6:10); x <- 7 * (1:7)
     ( A <- sparseMatrix(i, j, x = x) )

     convert(A)
***/

I don't think there is a better way to fill multiple values ina SpMat, 
maybe Conrad has insights.

Romain
c s
#
Armadillo sparse matrices are stored in Compressed Sparse Column format:

http://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_.28CSC_or_CCS.29

This layout is used by a majority of external solvers.

It would be far more efficient to take this layout into account when
copying matrices, instead of blindly (and slowly) copying element by
element.


On Sunday, December 9, 2012, Romain Francois <romain at r-enthusiasts.com>
wrote:
RcppArmadillo, so I do:
supported, but I might be wrong here. I have no indication that dgCMatrix
matrices can be "converted" to SpMat's. I know that I can work with
dgCMatrix matrices with RcppEigen, but what I need is to extract
submatrices and that is very easy to do with RcppArmadillo.
following the recipee from how these are handled in RcppEigen:
maybe Conrad has insights.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121209/1101076b/attachment.html>
#
That looks very similar, if not identical to what dgCMatrix uses. Can 
you direct me to a constructor where I could feed such information ?

Romain

Le 08/12/12 17:35, c s a ?crit :

  
    
#
Ah. One thing I could do is leverage this:

#ifdef ARMA_EXTRA_SPMAT_PROTO
   #include ARMA_INCFILE_WRAP(ARMA_EXTRA_SPMAT_PROTO)
#endif

Not sure how yet.

Le 08/12/12 19:17, Romain Francois a ?crit :

  
    
#
On Sat, Dec 8, 2012 at 10:35 AM, c s <conradsand.arma at gmail.com> wrote:

            
dgCMatrix objects also use compressed sparse column format with zero-based
indices so it is likely that it would only be necessary to copy the
contents of the arrays of column pointers (the "p" slot), row indices (the
"i" slot) and values of the non-zeros (the "x" slot).  It may even be
possible to just map the pointers for a read-only sparse matrix.

Romain:
 In general it is very slow to insert non-zero elements into this format.
 In the worst case the entire structure must be copied and extended for
each insertion.  We have to keep telling people about this when they tell
us that sparse matrices in R are so slow to work with.

 You have seen what the layout of a dgCMatrix is so it is only necessary to
find the desired components of a arma::sp_mat object.  I'll take a look
once I get some things installed.  If it is desirable the as and wrap
methods for the Eigen::SparseMatrix<T> and Eigen::MappedSparseMatrix<T>
classes should be fairly easy to adapt for arma::sp_mat class.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121208/bb116647/attachment.html>
#
Le 08/12/12 19:54, Douglas Bates a ?crit :
Alright
Yes. Conrad left a back door open so that we can add things **inside** 
the SpMat template, similar to what we have done in dense matrix classes.

Going from there I was looking at the copy constructor of a SpMat, which 
eventually is a call to init:

/**
  * Copy from another matrix.
  */
template<typename eT>
inline
void
SpMat<eT>::init(const SpMat<eT>& x)
   {
   arma_extra_debug_sigprint();

   // Ensure we are not initializing to ourselves.
   if (this != &x)
     {
     init(x.n_rows, x.n_cols);

     // values and row_indices may not be null.
     if (values != NULL)
       {
       memory::release(values);
       memory::release(row_indices);
       }

     access::rw(values)      = memory::acquire_chunked<eT> 
(x.n_nonzero + 1);
     access::rw(row_indices) = 
memory::acquire_chunked<uword>(x.n_nonzero + 1);

     // Now copy over the elements.
     arrayops::copy(access::rwp(values),      x.values,      x.n_nonzero 
+ 1);
     arrayops::copy(access::rwp(row_indices), x.row_indices, x.n_nonzero 
+ 1);
     arrayops::copy(access::rwp(col_ptrs),    x.col_ptrs,    x.n_cols + 1);

     access::rw(n_nonzero) = x.n_nonzero;
     }
   }

So it all looks very doable.

Romain

  
    
c s
#
Hi Romain & Doug,

Two points to note:

(1)
When poking around internal Armadillo pointers and arrays for sparse
matrices, please take into account that the col_ptrs array is slightly
longer than specified by the Compressed Sparse Column (CSC) format.

However, this should not cause any problems when interfacing with
external libraries. If you use the set_size() function before writing
to the values[], row_indices[] and col_ptrs[] arrays, everything
should be okay.

The col_ptrs array is slightly longer in order to allow robust
iterators.  Specifically, instead of the col_ptrs array having a size
of ncols+1 (as specified by CSC), it has a size of ncols+2.  The extra
element is always set to std::numeric_limits<uword>::max().  As such,
be careful not to overwrite this magic number.

For more details within the code, see the function
SpMat<eT>::init(uword in_rows, uword in_cols), on line 3635 in
"SpMat_meat.hpp".

(2)
If you need to allocate memory for the values[] and row_indices[]
arrays, you need to use the memory::acquire_chunked() function.  This
is an alternative memory allocation mechanism designed to reduce the
burden of inserting or removing a non-zero element.  The rest of the
sparse matrix memory management code relies on this.  See the code in
file "memory.hpp" for more details.


On Sun, Dec 9, 2012 at 5:55 AM, Romain Francois
<romain at r-enthusiasts.com> wrote:
#
Thanks for the valuable information.

I will look into it more seriously at some point when I have some more 
time.

Romain

Le 09/12/12 14:40, c s a ?crit :

  
    
14 days later
#
On 9 December 2012 at 14:56, Romain Francois wrote:
| Thanks for the valuable information.
| 
| I will look into it more seriously at some point when I have some more 
| time.

I have a minor improvement now, based on poking around in SpMat_meat.hpp and
the previous discussion, particularly Conrad's hints still cited below.  

It is still pretty raw.  The main problem I have right is that when coming in
as R objects (even with the internal Compressed Sparse Column format), the
row and col vectors are R integers and hence unsigned.  So the 'block copy'
that Conrad uses does not work.  I'm sure Romain already has a trick for that
in other places.

But this demonstrates that we're getting close, and that it probably would
not take much in terms of resources to provide more solid sparse matrix
support. 

Dirk

Code below:

// -*- mode: C++; c-indent-level: 4; c-basic-offset: 4; indent-tabs-mode: nil; -*-

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

using namespace Rcpp ;

// [[Rcpp::export]]
void convertA(S4 mat) {         // Romain's version from Dec 8
    IntegerVector dims = mat.slot("Dim");
    IntegerVector i = mat.slot("i");
    IntegerVector p = mat.slot("p");
    NumericVector x = mat.slot("x");

    int nrow = dims[0], ncol = dims[1];
    arma::sp_mat res(nrow, ncol);

    // this is not the proper way to initialize a sparse arma matrix
    // better code should be forthcoming at some point in the future
    for(int j = 0; j < ncol; ++j) {
        for (int k = p[j]; k < p[j + 1]; ++k) 
            res(i[k], j) = x[k];
    }
    std::cout << res << std::endl;
}


// [[Rcpp::export]]
void convertB(S4 mat) {         // slight improvement with two non-nested loops
    IntegerVector dims = mat.slot("Dim");
    IntegerVector i = mat.slot("i");
    IntegerVector p = mat.slot("p");
    NumericVector x = mat.slot("x");

    int nrow = dims[0], ncol = dims[1];
    arma::sp_mat res(nrow, ncol);

    // create space for values, and copy
    arma::access::rw(res.values) = arma::memory::acquire_chunked<double>(x.size() + 1);
    arma::arrayops::copy(arma::access::rwp(res.values), x.begin(), x.size() + 1);

    // create space for row_indices, and copy -- so far in a lame loop
    arma::access::rw(res.row_indices) = arma::memory::acquire_chunked<arma::uword>(x.size() + 1);
    for (int j=0; j<i.size(); j++) arma::access::rwp(res.row_indices)[j] = i[j];

    // create space for col_ptrs, and copy -- so far in a lame loop
    arma::access::rw(res.col_ptrs) = arma::memory::acquire<arma::uword>(p.size() + 2);
    for (int j=0; j<p.size(); j++) arma::access::rwp(res.col_ptrs)[j] = p[j];

    // important: set the sentinel as well
    arma::access::rwp(res.col_ptrs)[p.size()+1] = std::numeric_limits<arma::uword>::max();

    // set the number of non-zero elements
    arma::access::rw(res.n_nonzero) = x.size();

    std::cout << "SpMat res:\n" << res << std::endl;
}



/*** R
    require(Matrix)
    i <- c(1,3:8) 
    j <- c(2,9,6:10) 
    x <- 7 * (1:7)
    A <- sparseMatrix(i, j, x = x) 
    print(A)

    convertA(A)
    convertB(A)
***/



Output below:


R> sourceCpp("/tmp/armaSparse.cpp")

R>     require(Matrix)

R>     i <- c(1,3:8) 

R>     j <- c(2,9,6:10) 

R>     x <- 7 * (1:7)

R>     A <- sparseMatrix(i, j, x = x) 

R>     print(A)
8 x 10 sparse Matrix of class "dgCMatrix"
                             
[1,] . 7 . . .  .  .  .  .  .
[2,] . . . . .  .  .  .  .  .
[3,] . . . . .  .  .  . 14  .
[4,] . . . . . 21  .  .  .  .
[5,] . . . . .  . 28  .  .  .
[6,] . . . . .  .  . 35  .  .
[7,] . . . . .  .  .  . 42  .
[8,] . . . . .  .  .  .  . 49

R>     convertA(A)
[matrix size: 8x10; n_nonzero: 7; density: 8.75%]

     (0, 1)          7.0000
     (3, 5)         21.0000
     (4, 6)         28.0000
     (5, 7)         35.0000
     (2, 8)         14.0000
     (6, 8)         42.0000
     (7, 9)         49.0000


NULL

R>     convertB(A)
SpMat res:
[matrix size: 8x10; n_nonzero: 7; density: 8.75%]

     (0, 1)          7.0000
     (3, 5)         21.0000
     (4, 6)         28.0000
     (5, 7)         35.0000
     (2, 8)         14.0000
     (6, 8)         42.0000
     (7, 9)         49.0000


NULL
R> 


| 
| Romain
| 
| Le 09/12/12 14:40, c s a ?crit :
| > Hi Romain & Doug,
| >
| > Two points to note:
| >
| > (1)
| > When poking around internal Armadillo pointers and arrays for sparse
| > matrices, please take into account that the col_ptrs array is slightly
| > longer than specified by the Compressed Sparse Column (CSC) format.
| >
| > However, this should not cause any problems when interfacing with
| > external libraries. If you use the set_size() function before writing
| > to the values[], row_indices[] and col_ptrs[] arrays, everything
| > should be okay.
| >
| > The col_ptrs array is slightly longer in order to allow robust
| > iterators.  Specifically, instead of the col_ptrs array having a size
| > of ncols+1 (as specified by CSC), it has a size of ncols+2.  The extra
| > element is always set to std::numeric_limits<uword>::max().  As such,
| > be careful not to overwrite this magic number.
| >
| > For more details within the code, see the function
| > SpMat<eT>::init(uword in_rows, uword in_cols), on line 3635 in
| > "SpMat_meat.hpp".
| >
| > (2)
| > If you need to allocate memory for the values[] and row_indices[]
| > arrays, you need to use the memory::acquire_chunked() function.  This
| > is an alternative memory allocation mechanism designed to reduce the
| > burden of inserting or removing a non-zero element.  The rest of the
| > sparse matrix memory management code relies on this.  See the code in
| > file "memory.hpp" for more details.
| >
| >
| > On Sun, Dec 9, 2012 at 5:55 AM, Romain Francois
| > <romain at r-enthusiasts.com> wrote:
| >> Yes. Conrad left a back door open so that we can add things **inside** the
| >> SpMat template, similar to what we have done in dense matrix classes.
| >>
| >> Going from there I was looking at the copy constructor of a SpMat, which
| >> eventually is a call to init:
| >>
| >> /**
| >>   * Copy from another matrix.
| >>   */
| >> template<typename eT>
| >> inline
| >> void
| >> SpMat<eT>::init(const SpMat<eT>& x)
| >>    {
| >>    arma_extra_debug_sigprint();
| >>
| >>    // Ensure we are not initializing to ourselves.
| >>    if (this != &x)
| >>      {
| >>      init(x.n_rows, x.n_cols);
| >>
| >>      // values and row_indices may not be null.
| >>      if (values != NULL)
| >>        {
| >>        memory::release(values);
| >>        memory::release(row_indices);
| >>        }
| >>
| >>      access::rw(values)      = memory::acquire_chunked<eT> (x.n_nonzero + 1);
| >>      access::rw(row_indices) = memory::acquire_chunked<uword>(x.n_nonzero +
| >> 1);
| >>
| >>      // Now copy over the elements.
| >>      arrayops::copy(access::rwp(values),      x.values,      x.n_nonzero +
| >> 1);
| >>      arrayops::copy(access::rwp(row_indices), x.row_indices, x.n_nonzero +
| >> 1);
| >>      arrayops::copy(access::rwp(col_ptrs),    x.col_ptrs,    x.n_cols + 1);
| >>
| >>      access::rw(n_nonzero) = x.n_nonzero;
| >>      }
| >>    }
| >>
| >> So it all looks very doable.
| >>
| >> Romain
| >
| 
| 
| -- 
| Romain Francois
| Professional R Enthusiast
| +33(0) 6 28 91 30 30
| 
| R Graph Gallery: http://gallery.r-enthusiasts.com
| 
| blog:            http://romainfrancois.blog.free.fr
| |- http://bit.ly/RE6sYH : OOP with Rcpp modules
| `- http://bit.ly/Thw7IK : Rcpp modules more flexible
| 
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-devel at lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel