Skip to content

[Rcpp-devel] getting ncol(DF) in Rcpp

3 messages · Douglas Bates, Silkworth,David J.

#
You guys know I am here just to give you a chuckle. 
 
I wanted to build a function passing just a dataframe to Rcpp.  In order
to use this dataframe, I need to know how many columns it has at
runtime.  My attempts at getting this ncol information were thwarted on
several counts.  The Dimension class appears to only work on STL
containers, which Rcpp::DataFrame is not.  I resorted to the Environment
facility to attempt a feeble-minded RInside, (since I can't understand
RInside anyway).

Environment base("package:base");
Function ncol = base["ncol"];
Rcpp::NumericVector test(1);
test[0]=ncol(myDF);

This fails to compile with the following error:
error: cannot convert 'SEXPREC*' to
'Rcpp::traits::storage_type<14>::type'

However, just short of sending another single element vector with this
information as an argument to Rcpp I tried the following, AND IT WORKED!

(My debug technique is to send items back to R for inspection.  This is
just some test code to show that an integer value of myNames.size() will
be useful as a proxy for ncol(DF) in further code development.)

src <- '
Rcpp::DataFrame myDF=(arg1);
Environment base("package:base");
Function names = base["names"];
Rcpp::CharacterVector myNames(names(myDF));
Rcpp::NumericVector ncol(1);
ncol[0]=myNames.size();
return(ncol);
'

 fun <- cxxfunction(signature(arg1 = "numeric"),
 src, plugin = "Rcpp")

vec1<-rep(5,5)
vec2<-c(1:5)
DF<-data.frame(vec1,vec2)
test<-fun(DF)

Okay, how's that for a laugher.

In my real case I am using the same dataframe that I needed to clean up
in my 'redimension' chain.  My solution there works quite fine.  Now I
have yet to decompose this dataframe back into vectors and a matrix to
enable entries to be accessed in Rcpp.  But at least I have the
dimensions for the matrix now.

It takes about 3 seconds for R to extract a matrix based on
DF[,3:ncol(DF)] on a dataframe with 46,000 rows.  I am counting on Rcpp
code to execute this more efficiently.  One could argue that I should
never have left Rcpp in the first place.  But that is another story.
#
A data.frame in R is a curious object that is really a list of the columns.  So

myDF.size()

returns the number of columns.

Try the enclosed R source file.

On Mon, Jun 27, 2011 at 12:30 PM, Silkworth,David J.
<SILKWODJ at airproducts.com> wrote:
-------------- next part --------------
library(inline)
library(Rcpp)
src <- '
   Rcpp::DataFrame  foo(foo_);
   return wrap(foo.size());
'
ff <- cxxfunction(signature(foo_ = "data.frame"), src, "Rcpp")
ncol(datasets::trees)
ff(datasets::trees)
#
I figured that getting the ncol(DF) information would be something simpler than I resorted to.

As it turned out, my impression of the time it took to convert the dataframe to a matrix was confused with running it through Excel, using RExcel.  In the R console this was momentary even for the 43,000 line dataframe.  It turned out that no matter how I would try to work on this, such conversion was necessary and R could not be beaten.  Then, by access to matrix math functions in R, and use of some sapply functions, everything I wanted to do ended up best done in R.

Just goes to show, if you can avoid an explicit loop in R, there is a chance that even the interpreted language can do you many favors.

-----Original Message-----
From: dmbates at gmail.com [mailto:dmbates at gmail.com] On Behalf Of Douglas Bates
Sent: Monday, June 27, 2011 1:47 PM
To: Silkworth,David J.
Cc: rcpp-devel at r-forge.wu-wien.ac.at
Subject: Re: [Rcpp-devel] getting ncol(DF) in Rcpp

A data.frame in R is a curious object that is really a list of the columns.  So

myDF.size()

returns the number of columns.

Try the enclosed R source file.

On Mon, Jun 27, 2011 at 12:30 PM, Silkworth,David J.
<SILKWODJ at airproducts.com> wrote: