Skip to content

[Rcpp-devel] Joining each row of CharacterMatrix to return a CharacterVector?

7 messages · Dirk Eddelbuettel, Steve Lianoglou, Romain Francois +1 more

#
I preface this by stating that I'm very much a Rcpp beginner who is comfortable in R but I've never before used C++. I'm working through the Rcpp documentation but haven't been able to answer my question.

I've written an Rcpp (v0.10.1) function f that takes as input a CharacterMatrix X. X has 20 million rows and 100 columns. For each row of X the function alters certain entries of that row according to rules governed by some other input variables. f returns the updated version of X. This function works as I'd like it to: 
# a toy example with nrow = 2, ncol = 2
[,1] [,2]
[1,] "A"  "A" 
[2,] "A"  "A"
[,1] [,2]
[1,] "Z"  "A" 
[2,] "z"  "A" 

However, instead of f returning a CharacterMatrix as it currently does, I'd like to return a CharacterVector Y, where each element of Y is a "collapsed" row of the updated X.

I can achieve the desired result in R by using: 
Y <- apply(X=X, MARGIN = 1, FUN = function(x){paste0(x, collapse = '')})
[1] "ZA" "zA"

but I wondered whether this "joining" is likely to be more efficiently performed within my function f? If so, how do I join the 100 individual character entries of a row of the CharacterMatrix X into a single string that will then comprise an element of the returned CharacterVector Y?

Many thanks,
Pete
--------------------------------
Peter Hickey,
PhD Student/Research Assistant,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Ph: +613 9345 2324

hickey at wehi.edu.au
http://www.wehi.edu.au


______________________________________________________________________
The information in this email is confidential and intended solely for the addressee.
You must not disclose, forward, print or use it without the permission of the sender.
______________________________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121211/3e795c57/attachment.html>
#
Hi Pete,
On 11 December 2012 at 09:43, hickey at wehi.EDU.AU wrote:
| I preface this by stating that I'm very much a Rcpp beginner who is comfortable
| in R but I've never before used C++. I'm working through the Rcpp documentation
| but haven't been able to answer my question.
| 
| I've written an Rcpp (v0.10.1) function f that takes as input a CharacterMatrix
| X. X has 20 million rows and 100 columns. For each row of X the function alters
| certain entries of that row according to rules governed by some other input
| variables. f returns the updated version of X. This function works as I'd like
| it to: 
| # a toy example with nrow = 2, ncol = 2
| > X <- matrix('A', ncol = 2, nrow = 2)
| > X
|      [,1] [,2]
| [1,] "A"  "A" 
| [2,] "A"  "A" 
| > X <- f(X, other_input_variables)
| > X
|      [,1] [,2]
| [1,] "Z"  "A" 
| [2,] "z"  "A" 
| 
| However, instead of f returning a CharacterMatrix as it currently does, I'd
| like to return a CharacterVector Y, where each element of Y is a "collapsed"
| row of the updated X.
| 
| I can achieve the desired result in R by using: 
| Y <- apply(X=X, MARGIN = 1, FUN = function(x){paste0(x, collapse = '')}) 
| > Y
| [1] "ZA" "zA"
| 
| but I wondered whether this "joining" is likely to be more efficiently
| performed within my function f? If so, how do I join the 100 individual
| character entries of a row of the CharacterMatrix X into a single string that
| will then comprise an element of the returned CharacterVector Y?

Ah, the joy of working with character strings/vectors/pointers :)  

You certainly can. And there will be a lot of old, bad, ... tutorials out
there.  I can't right now think of a good tutorial to point you to -- other
than the perennial "C++ Annotations" by Brokken which is at the same time
good, current, up-to-date and free (!!) -- so maybe you shoud continue with
the little 2 x 2 and 3 x 3 examples:

 i)   loop over a row, first init the target string to be ""
 ii)  assign each element of the matrix to a string
 iii) append, which can be as easy as using the   +   for two strings
 iv)  accumulate the result strings in a vector of strings

That should work, does not require pointers, free, malloc, ...  You can
optimize later.

Hope this helps,  Dirk
#
Hi,
On Mon, Dec 10, 2012 at 5:43 PM, <hickey at wehi.edu.au> wrote:
You can do it more (speed) efficiently in R, too, if memory is no
object, since you can just R-loop over the far fewer columns:

R> X <- matrix(c("Z", "z", "A", "A"), nrow=2)
R> Y <- do.call(paste0, lapply(1:ncol(X), function(i) X[,i]))
R> Y
[1] "ZA" "zA"

but doing it in C(++) will definitely be more memory efficient, and
likely speed efficient, too, so it will be a good exercise, and for
that Dirk has given you a good head start :-)

HTH,
-steve
#
Thanks very much, Dirk and Steve. 

Always slightly fear-inducing when someone starts their reply with "Ah, the joy of working with X" :) I'll have a go at implementing your suggestion on my two examples, Dirk. 

I think learning more about Rcpp will become my Christmas-holiday project. It's already saved me buckets of computational time in this past week and that's without even really knowing what I'm doing :)
Pete
On 11/12/2012, at 10:09 AM, Dirk Eddelbuettel wrote:

            
--------------------------------
Peter Hickey,
PhD Student/Research Assistant,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Ph: +613 9345 2324

hickey at wehi.edu.au
http://www.wehi.edu.au


______________________________________________________________________
The information in this email is confidential and intended solely for the addressee.
You must not disclose, forward, print or use it without the permission of the sender.
______________________________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121211/cb98c9fa/attachment-0001.html>
#
Hello,

We don't know your function f, so this is hard to say. Anyway, this 
below implements something similar to apply(.,1,paste0) in rcpp (current 
devl version):

#include <Rcpp.h>
using namespace Rcpp ;

// [[Rcpp::export]]
CharacterVector pasteColumns(CharacterMatrix m){
     String buffer ;
     int nc = m.ncol(), nr = m.nrow() ;
     CharacterVector out(nr) ;
     for( int i=0; i<nr; i++){
         CharacterMatrix::Row row = m(i,_) ;
         buffer = "" ;
         for( int j=0; j<nc; j++){
             buffer += row[j] ;
         }
         out[i] = buffer ;
     }
     return out ;
}

With this, I get these timings:

     nc <- 100; nr <- 2e4
     M <- matrix( sample(letters, nc*nr, replace = TRUE) , ncol = nc )

     require(microbenchmark)
     microbenchmark(
         pasteColumns(M),
         apply(M, 1, paste0)
         )
     Unit: milliseconds
                  expr       min        lq    median        uq      max
     1 apply(M, 1, paste0) 451.39975 484.41435 495.92757 501.58728 714.1418
     2     pasteColumns(M)  67.91322  68.29269  70.34704  77.09383 145.9161



Le 10/12/12 23:43, hickey at wehi.EDU.AU a ?crit :

  
    
#
Or (from svn rev 4144), you can use the collapse funtion:

// [[Rcpp::export]]
CharacterVector pasteColumns2(CharacterMatrix m){
     int nr = m.nrow() ;
     CharacterVector out(nr) ;
     for( int i=0; i<nr; i++)
         out[i] = collapse( m(i,_) ) ;
     return out ;
}

Romain

Le 11/12/12 08:45, Romain Francois a ?crit :

  
    
#
Romain -

Wonderful! Thank you so much for your examples. I'll add this to my function tomorrow.
Pete
--------------------------------
Peter Hickey,
PhD Student/Research Assistant,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Ph: +613 9345 2324

hickey at wehi.edu.au
http://www.wehi.edu.au


______________________________________________________________________
The information in this email is confidential and intended solely for the addressee.
You must not disclose, forward, print or use it without the permission of the sender.
______________________________________________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20121211/6400fb29/attachment.html>