Skip to content

[Rcpp-devel] Rcpp can not return big DataFrame

4 messages · Dirk Eddelbuettel, 该走了, Romain Francois

#
Dear Rcpp developer,
  I am tried return a big DataFrame from Rcpp to R, but met some problem!

### begin dataframetest.cpp

#include <Rcpp.h>
using namespace Rcpp;
using namespace std;

// [[Rcpp::export]]
DataFrame dataframetest(NumericVector close){
  int nrow = close.size();
  vector<double>  txn_qty = vector<double>(nrow);
  vector<double> txn_prc = vector<double>(nrow);
  vector<double>  txn_fee = vector<double>(nrow);
  vector<double>  pos_qty = vector<double>(nrow);
  vector<double>  close_prc = as<vector<double> >(close);
  vector<double>  PL = vector<double>(nrow);
  DataFrame PLrecord = DataFrame::create(Named("txn.qty", txn_qty),
 Named("txn.prc", txn_prc),
 Named("txn.fee", txn_fee),
 Named("pos.qty", pos_qty),
 Named("close.prc", close_prc),
 Named("PL", PL));
  return PLrecord;
}
#### end  dataframetest.cpp

### R code
n <- 4e5
x.prc <- 1:n
library(Rcpp)
sourceCpp("./dataframetest.cpp")
aa <- dataframetest(x.prc)

##### end R code

 When n is big, like 4e5, then it will exhaust the memory or crash; when n
is small, like  4e3, it can return the correct DataFrame. I was wondering
if Rcpp::DataFrame can handle so big DataFrame. In my opinion, n = 4e5 is
not big, I can create such a long data.frame from R code easily, without
any problem. Why Rcpp can not? Or I miss something?

### R code
n <- 4e5
x.prc <- rnorm(n)
a <- data.frame(x = x.prc,
        y = x.prc,
                d = x.prc,
                e = x.prc,
                f = x.prc,
                k = x.prc)
head(a)
            x           y           d           e           f           k
1 -0.45145433 -0.45145433 -0.45145433 -0.45145433 -0.45145433 -0.45145433
2 -0.55851370 -0.55851370 -0.55851370 -0.55851370 -0.55851370 -0.55851370
3  0.18209145  0.18209145  0.18209145  0.18209145  0.18209145  0.18209145
4 -0.56092768 -0.56092768 -0.56092768 -0.56092768 -0.56092768 -0.56092768
5  0.25689622  0.25689622  0.25689622  0.25689622  0.25689622  0.25689622
6 -0.04558792 -0.04558792 -0.04558792 -0.04558792 -0.04558792 -0.04558792

#### sessionInfo
sessionInfo()
R version 2.15.3 (2013-03-01)
Platform: x86_64-suse-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Rcpp_0.10.3      data.table_1.8.8

loaded via a namespace (and not attached):
[1] compiler_2.15.3 tools_2.15.3
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20130327/e3587f04/attachment.html>
#
Hi,
On 27 March 2013 at 21:19, ??? wrote:
| Dear Rcpp developer,
| ? I am tried return a big DataFrame from Rcpp to R, but met some problem!

If you check the list archives you will see that has been discussed before.
 
| ### begin dataframetest.cpp
| 
| #include <Rcpp.h>
| using namespace Rcpp;
| using namespace std;
| 
| // [[Rcpp::export]]
| DataFrame dataframetest(NumericVector close){
| ? int nrow = close.size();
| ? vector<double> ?txn_qty = vector<double>(nrow);
| ? vector<double> txn_prc = vector<double>(nrow);
| ? vector<double> ?txn_fee = vector<double>(nrow);
| ? vector<double> ?pos_qty = vector<double>(nrow);
| ? vector<double> ?close_prc = as<vector<double> >(close);
| ? vector<double> ?PL = vector<double>(nrow);
| ? DataFrame PLrecord = DataFrame::create(Named("txn.qty", txn_qty),
| Named("txn.prc", txn_prc),
| Named("txn.fee", txn_fee),
| Named("pos.qty", pos_qty),
| Named("close.prc", close_prc),
| Named("PL", PL));
| ? return PLrecord;
| }
| #### end ?dataframetest.cpp
| 
| ### R code?
| n <- 4e5
| x.prc <- 1:n
| library(Rcpp)
| sourceCpp("./dataframetest.cpp")
| aa <- dataframetest(x.prc)
| 
| ##### end R code?
| 
| ?When n is big, like 4e5, then it will exhaust the memory or crash; when n is
| small, like ?4e3, it can return the correct DataFrame. I was wondering if

I agree. 

But it probably "just" has to do with temp objects, which are co-managed by
R, so this is hard to sort out.

| Rcpp::DataFrame can handle so big DataFrame. In my opinion, n = 4e5 is not big,
| I can create such a long data.frame from R code easily, without any problem.
| Why Rcpp can not? Or I miss something??

You are welcome to debug it.  Maybe valgrind will help.

Or if you don't want to or can't, just return a list of vectors and call
as.data.frame() on it when you back in R.  

That's what we used to do anyway before we added the convenience wrapping. 

Dirk

| 
| ### R code
| n <- 4e5
| x.prc <- rnorm(n)
| a <- data.frame(x = x.prc,?
| ? ? ? ?y = x.prc,?
| ? ? ? ? ? ? ? ? d = x.prc,
| ? ? ? ? ? ? ? ? e = x.prc,?
| ? ? ? ? ? ? ? ? f = x.prc,?
| ? ? ? ? ? ? ? ? k = x.prc)
| head(a)
| ? ? ? ? ? ? x ? ? ? ? ? y ? ? ? ? ? d ? ? ? ? ? e ? ? ? ? ? f ? ? ? ? ? k
| 1 -0.45145433 -0.45145433 -0.45145433 -0.45145433 -0.45145433 -0.45145433
| 2 -0.55851370 -0.55851370 -0.55851370 -0.55851370 -0.55851370 -0.55851370
| 3 ?0.18209145 ?0.18209145 ?0.18209145 ?0.18209145 ?0.18209145 ?0.18209145
| 4 -0.56092768 -0.56092768 -0.56092768 -0.56092768 -0.56092768 -0.56092768
| 5 ?0.25689622 ?0.25689622 ?0.25689622 ?0.25689622 ?0.25689622 ?0.25689622
| 6 -0.04558792 -0.04558792 -0.04558792 -0.04558792 -0.04558792 -0.04558792
| 
| #### sessionInfo
| sessionInfo()
| R version 2.15.3 (2013-03-01)
| Platform: x86_64-suse-linux-gnu (64-bit)
| 
| locale:
| ?[1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C ? ? ? ? ? ? ?
| ?[3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8 ? ?
| ?[5] LC_MONETARY=en_US.UTF-8 ? ?LC_MESSAGES=en_US.UTF-8 ??
| ?[7] LC_PAPER=C ? ? ? ? ? ? ? ? LC_NAME=C ? ? ? ? ? ? ? ??
| ?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C ? ? ? ? ? ?
| [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C ? ? ??
| 
| attached base packages:
| [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base ? ??
| 
| other attached packages:
| [1] Rcpp_0.10.3 ? ? ?data.table_1.8.8
| 
| loaded via a namespace (and not attached):
| [1] compiler_2.15.3 tools_2.15.3 ??
| 
| 
| ----------------------------------------------------------------------
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-devel at lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
#
Hi Dirk,

Thank you for your prompt reply and suggestion. I tried a lot of times,
sometimes I got segfaults and sometimes I got an error messeger "Error:
error calling the data.frame function", sometimes I got the DataFrame
returned, but the elements is not correct.

2013/3/27 Dirk Eddelbuettel <edd at debian.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20130327/933ba812/attachment.html>
#
Hmm. This does fix the problem:

   DataFrame PLrecord = DataFrame::create(
         Named("txn.qty"  , wrap( txn_qty  ) ),
         Named("txn.prc"  , wrap( txn_prc  ) ),
         Named("txn.fee"  , wrap( txn_fee  ) ),
         Named("pos.qty"  , wrap( pos_qty  ) ),
         Named("close.prc", wrap( close_prc) ),
         Named("PL"       , wrap( PL       ) )
   );

So we might do something wrong with copying objects.

Le 27/03/13 14:19, ??? a ?crit :