Skip to content
Prev 5539 / 10988 Next

[Rcpp-devel] Performance/memory management question

I have an initial (10 ^5, 20) matrix including observations for a set of individuals (individual column in the matrix)
I want to "sample with replacement the list of individus (unique)" and get the list of observations (with eventual repetitions)
Simplified Ex : m( 5 , 2)
given m :
Ind? Obs
1?? 3.4
1?? 3.6
2?? 5
3?? 6
4?? 7

resample(m) may give
1 3.4
1 3.6
2 5
1 3.4
1 3.6
1 3.4
1 3.6
if 1 2 1 1 were sampled from the 1 2 3 4? inds.

I'm trying to do it via Rcpp and here is some code 

// [[Rcpp::export]]
void resample(NumericMatrix mat) {

??? int nrow = mat.nrow();
??? IntegerVector d1(nrow);
??? for (int i = 0; i < nrow; i++) {
??? ??? d1[i] = mat(i, 0);
??? }
??? std::cout << "Number of elements in mat:? " << d1.length() << std::endl;
??? std::multimap<int, NumericVector> m;
??? for (int i = 0; i < nrow; i++) {
??? ??? NumericVector d = mat.row(i);
??? ??? m.insert(std::pair<int, NumericVector>(d1[i], d));
??? }

??? // Create vector of deduplicated entries: 
??? std::set<int> keys_dedup;
??? for (int i = 0; i < nrow; ++i) keys_dedup.insert(d1[i]);
??? std::cout << "Number of elements in set :? " << keys_dedup.size() << std::endl;
??? std: vector<int> vec;
??? vec.assign(keys_dedup.begin(), keys_dedup.end());
??? std::cout << "Number of elements in vec :? " << vec.size() << std::endl;

??? //sampling among the unique keys
??? Engine eng;
??? eng.seed((unsigned int) 123);
??? std::tr1::uniform_int<int> unif(0, vec.size() - 1);
??? std::list<NumericVector> samples;
??? for (int i = 0; i < vec.size(); ++i) {
??? ??? int u = unif(eng);
??? ??? std::cout << u << " : " << vec[u] << std::endl;

??? ??? std::pair<std::multimap<int, NumericVector>::iterator,
??? ??? ??? ??? std::multimap<int, NumericVector>::iterator> ret =
??? ??? ??? ??? m.equal_range(vec[u]);
??? ??? for (std::multimap<int, NumericVector>::iterator it = ret.first;
??? ??? ??? ??? it != ret.second; ++it) {
??? ??? ??? samples.push_back(it->second);
??? ??? }
??? }
??? std::cout << "Number of elements in samples :? " << samples.size() << std::endl;

??? //??? NumericMatrix matR(samples.size(), mat.ncol());
??? //??? ??? for (int i = 0; i < samples.size(); ++i) {
??? //??? ??? ??? matR.row(i) = Rcpp::as(samples[i]);
??? //??? ??? }
//??? return matR; 
}

I have a performance related question : 
m is a 10^? * 20 matrix
if i submit : system.time(m <- resample(m))

I see: 
Number of elements in mat:? 100000
Number of elements in set :? 939
Number of elements in vec :? 939
Number of elements in samples :? 99008? !!!!( here in the console it takes less than 1 sec to get there)
utilisateur???? syst?me????? ?coul? 
???? 38.531?????? 0.004????? 38.631 


I would like to know if possible how to decrease the 38 seconds between the std::cout (in the c++ code) and the end of the execution in R. 
Could this be due to memory management/garbage collection, as I can see the last cout in less than 1 sec in the R console ?

Please advise
Toki
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20130327/37e2cfc0/attachment.html>