[Rcpp-devel] Performance question about DataFrame
I agree that this is not a complete implementation; it isn't meant to be, although it might still be a worth incorporating this into Rcpp with the appropriate fixes in place. For instance, the vector recycling issue is far from the greatest limitation of this code: it handles character vectors wrong, The R routine converts character vectors into factors unless overridden; I wrote the precursor of this particular routine because I wanted to handle strings faithfully, and so writing a stupid R routine to coerce lists of lists of constant length to data frames. On Thu, Jan 31, 2013 at 3:21 AM, Romain Francois
<romain at r-enthusiasts.com>wrote:
Le 15/01/13 16:20, John Merrill a ?crit : It appears that DataFrame::create is a thin layer on top of the R
data.frame call. The guarantee correctness, but also means the performance of an Rcpp routine which returns a large data frame is limited by the performance of data.frame -- which is utterly horrible. In the current version of R, there's a trivial, but borderline evil, work around: build a list of lists meeting the basic requirements of a data frame (they all need to be of the same length, and each component list needs to be named) and set the type of the object to "data.frame". I have two questions: (1) Is it reasonable to anticipate that this hack will continue to work for the near future in R? (2) If so, would a patch to that effect be of interest to the developers?
The reason we used a callback to data.frame is close to lazyness on our part. With the R function, for example we know that columns of different sizes will be handled properly, with recylcling, etc ... Just making a named list of vectors is not enough. We have to make sure they all have the same length. Perhaps it would be worth checking this and make better DataFrame::create functions. Also, you can use a shortcut to assign row names, i.e. mimic this in C++ (the second line contains the magic):
d <- list( x = 1:10, y = 1:10 ) attr( d, "row.names" ) <- c( NA, -10L ) attr( d, "class" ) <- "data.frame" d
x y 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 10 Romain -- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 R Graph Gallery: http://gallery.r-enthusiasts.**com<http://gallery.r-enthusiasts.com> blog: http://romainfrancois.blog.**free.fr<http://romainfrancois.blog.free.fr> |- http://bit.ly/RE6sYH : OOP with Rcpp modules `- http://bit.ly/Thw7IK : Rcpp modules more flexible
-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.r-forge.r-project.org/pipermail/rcpp-devel/attachments/20130131/c7fac5c7/attachment-0001.html>