Skip to content

[Rcpp-devel] Passing large data frame

3 messages · R_help Help, Romain Francois, Dirk Eddelbuettel

#
Hi,

I have a doubt regarding passing large data frame into Rcpp. If we
consider the following function

foo(SEXP myframe) {

    RcppFrame &fr_ref = (RcppFrame &) myframe;
}

Somehow seems to work without a need to call a constructor and thus
causes copy of large data frame to RcppFrame object. However, you can
see that the code is not safe. there's no guarantee that myframe is a
data frame. This is my first question, is there any way to check type
of the input SEXP? Or is there any better way to do this?

Secondly, I'm wondering why the POSIXct column in my data frame
appears as double when I pass a data frame as an argument into a
function or when I read it out from global environment map? Is there
anyway to ensure it appears as RcppDatetime? Thank you.

Robert
#
Hi,

Le 14/06/10 05:38, R_help Help a ?crit :
This is very wrong code, you are just getting lucky about the internal 
representation of RcppFrame.

Consider:

require( Rcpp )
require( inline )

inc <- '
class Foo{
public:
	Foo( SEXP x) : y(5), xx(x) {
		Rprintf( "hello" ) ;
	}
	Foo( ) : y(6), xx(R_NilValue) {
		Rprintf( "hello from default" );
	}

	inline SEXP gety(){
		return IntegerVector::create( y ) ;
	}

private:
	int y  ;
	SEXP xx ;

} ;
'
code <- '
	Foo& foo = (Foo&) x ;
	return foo.gety() ;
'

df <- data.frame( x = 1:5, y = 1:5 )
fx <- cxxfunction( signature( x = "data.frame" ), code, include = inc, 
plugin = "Rcpp" )

I get :

 > fx( df )
[1] 35966160
 > fx( df )
[1] 35966160


Using C++ cast "static_cast", the compiler would tell you the error.

file10d63af1.cpp: In function ?SEXPREC* file10d63af1(SEXPREC*)?:
file10d63af1.cpp:49: error: invalid static_cast from type ?SEXPREC*? to 
type ?Foo&?
make: *** [file10d63af1.o] Error 1

ERROR(s) during compilation: source code errors or compiler 
configuration errors!
It is more than "not safe", it is just plain wrong.
RcppFrame is a class of what we call the "classic" api, which indeed is 
largely inefficient because it copies data all the time.

The new api, and in particular the class Rcpp::DataFrame is much more 
efficient. For example the constructor

Rcpp::DataFrame( SEXP )

will not make a copy of the SEXP you pass in.

You can find example code of Rcpp::DataFrame in the unit test:

 > system.file( "unitTests", "runit.DataFrame.R", package = "Rcpp" )
Someone else will pick this up.
#
Robert,
On 14 June 2010 at 12:36, Romain Francois wrote:
| Le 14/06/10 05:38, R_help Help a ?crit :
| [...]
| > Secondly, I'm wondering why the POSIXct column in my data frame
| > appears as double when I pass a data frame as an argument into a
| > function or when I read it out from global environment map? Is there
| > anyway to ensure it appears as RcppDatetime? Thank you.
| >
| > Robert
| 
| Someone else will pick this up.

a) POSIXct really is a double and nothing more, so you could re-create a 
   RcppDatetimeVector from the double vector -- no information lossage

b) RcppFrame is a data structure for _creating data frame in C++ for return_
   rather than for retrieving a data frame from R

c) As Romain said, you are better off with Rcpp::DataFrame anyway

d) But that class (and the new API in general) do not have a datetime class
   yet so see point a) 

I have been mulling over what to do about a simple datetime time class. So
far, I haven't needed one (comparison between doubles work fine) so I had no
real motivation. Eventually we should have one.  For now you can just use
doubles and/or the old class.