Skip to content

[Rcpp-devel] Returning a named (and classed?) list from an RcppExport SEXP function

8 messages · Douglas Bates, Dirk Eddelbuettel, Romain Francois

#
I have looked at your (i.e. Dirk and Roman's) submitted Rjournal
article on Rcpp and decided that I should take the plunge.

To get my feet wet I am starting with a relatively simple example
interfacing to (God help us) 1960's style Fortran written by Mike
Powell and incorporated in the minqa package.

The value to be returned is a named list of heterogeneous values.  I
can see how to construct such a list in the classic Rcpp API using a
RcppResultSet object but I don't know the best way to accomplish this
in the more modern Rcpp namespace.  Can someone point me in the right
direction?  Also, is there a convenient idiom for attaching an S3
class name to a list to be returned?

An unrelated question -- has anyone looked at accessing S4 classed
objects in Rcpp?  I have had considerable experience with S4 classes
and objects and could keep such a project in mind as I begin to
explore Rcpp.

Yet another question, if i may.   Does coercion take place in
constructing an object like

Rcpp::NumericVector aa(a);

where a is an SEXP, say an SEXP argument to a C++ function, where I
accidentally passed an integer vector instead of a double vector?  If
so, what happens if there is no coercion?  Are the contents of the
SEXP copied to newly allocated storage or not?

The reason that I ask is because there are some places in the lme4
code where I want to make sure that I have a pointer to the contents
of the original vector because I am want to change the contents
without copying.
#
Hi Doug,

Welcome! We're honoured to have you here. And glad you picked the list.
On 12 March 2010 at 10:14, Douglas Bates wrote:
| I have looked at your (i.e. Dirk and Roman's) submitted Rjournal
| article on Rcpp and decided that I should take the plunge.
| 
| To get my feet wet I am starting with a relatively simple example
| interfacing to (God help us) 1960's style Fortran written by Mike
| Powell and incorporated in the minqa package.
| 
| The value to be returned is a named list of heterogeneous values.  I
| can see how to construct such a list in the classic Rcpp API using a
| RcppResultSet object but I don't know the best way to accomplish this
| in the more modern Rcpp namespace.  Can someone point me in the right
| direction?  Also, is there a convenient idiom for attaching an S3
| class name to a list to be returned?

Yes. See eg the examples I put on my blog, eg inside of RcppArmadillo we
simply do (including actual fastLm source here modulu two ifdefs dealing with
older versions -- which means you need Armadillo 0.9.0 or later)

#include <RcppArmadillo.h>

extern "C" SEXP fastLm(SEXP ys, SEXP Xs) {

    Rcpp::NumericVector yr(ys);			// creates Rcpp vector from SEXP
    Rcpp::NumericMatrix Xr(Xs);			// creates Rcpp matrix from SEXP
    int n = Xr.nrow(), k = Xr.ncol();

    arma::mat X(Xr.begin(), n, k, false);   	// reuses memory and avoids extra copy
    arma::colvec y(yr.begin(), yr.size(), false);

    arma::colvec coef = solve(X, y);            // fit model y ~ X

    arma::colvec resid = y - X*coef; 		// residuals

    double sig2 = arma::as_scalar( trans(resid)*resid/(n-k) );
    						// std.error of estimate 
    arma::colvec stderrest = sqrt( sig2 * diagvec( arma::inv(arma::trans(X)*X)) );

    Rcpp::Pairlist res(Rcpp::Named( "coefficients", coef),
                       Rcpp::Named( "stderr", stderrest));
    return res;
}

[ By the way, I still think I should try to implement all four of your "how
to do lm" solutions from the older R News / R Journal piece in Armadillo one
day :-)   And see what the timing impacts would be. ]

That works "on the fly" with Pairlist and named.  We also have Rcpp::List
examples, but AFAICR you need to 'predeclare' how long your list is and the
set each element.  The 200+ unit tests will have examples. Better docs will
appear one day...

| An unrelated question -- has anyone looked at accessing S4 classed
| objects in Rcpp?  I have had considerable experience with S4 classes
| and objects and could keep such a project in mind as I begin to
| explore Rcpp.

I'd be happy to go there (time permitting) but very gently as I am still much
more firmly in S3 land.  Romain may be more eager :)
 
| Yet another question, if i may.   Does coercion take place in
| constructing an object like
| 
| Rcpp::NumericVector aa(a);
| 
| where a is an SEXP, say an SEXP argument to a C++ function, where I
| accidentally passed an integer vector instead of a double vector?  If

Very much so.  Again, see my blog and the RcppVectorExample code.  Int in,
Double out. All magic hidden.

And of course you can just try it:

R> library(RcppExamples)
Loading required package: Rcpp
R> RcppVectorExample(c(1,4,9), api="new")
$result
[1] 1 2 3

$original
[1] 1 4 9

R> RcppVectorExample(c(2,5,10), api="new")
$result
[1] 1.414 2.236 3.162

$original
[1]  2  5 10

R> 

At this level, behaviour is the same with 'classic' API but we know of at
least one case where 'classic' doesn't cast a Matrix right and 'new' does.

| so, what happens if there is no coercion?  Are the contents of the
| SEXP copied to newly allocated storage or not?

It depends. In the new API we do fewer copies.
 
| The reason that I ask is because there are some places in the lme4
| code where I want to make sure that I have a pointer to the contents
| of the original vector because I am want to change the contents
| without copying.

I think we do that. If you look at the vignette and the timing example -- but
using pointer accessors (eg double *x = foo.begin()) and pointer ops (eg x++)
we get essentially the same speed as pure and ugly C code.  Because we do
away with the copy, and the operator[], for purest speed.

Hope that answer the first batch of questions.  Keep'em coming!

Dirk
#
On Fri, Mar 12, 2010 at 10:47 AM, Dirk Eddelbuettel <edd at debian.org> wrote:
Nice approach.  I presume that the Pairlist object is converted to an
VECSXP in the process of being packaged for return.  I will play
around a bit with expressions like

Rcpp::List res(Rcpp::PairList(Rcpp::Named(...), ...))

I would like to have the capability of attaching an attribute named
"class" to the VECSXP before returning it.
One of the nice properties of S4 classed objects is that you can
define validity checks for classes and bypass some of the code that
does validity checking of arguments.
Perhaps you are assuming that c(1,4,9) generates an integer vector in
this example.  In fact, c(1,4,9) is a numeric (in the sense of double
precision) vector.  You need to write c(1L, 4L, 9L) to ensure you have
an integer vector or coerce it with as.integer().
Yes, I like that idea of being able to use foo.begin() to access the
pointer.  Part of my question related to whether I could count on
getting the pointer to the original contents from R.  In some way I
would need to check whether the R object had been coerced, and hence
allocated new storage for its contents, or not.  That's not an
immediate problem, though.

Thanks for the response.
#
On 12 March 2010 at 11:23, Douglas Bates wrote:
| Nice approach.  I presume that the Pairlist object is converted to an
| VECSXP in the process of being packaged for return.  I will play
| around a bit with expressions like
| 
| Rcpp::List res(Rcpp::PairList(Rcpp::Named(...), ...))

Not sure if that works with Rcpp::List as a returned object.
 
| I would like to have the capability of attaching an attribute named
| "class" to the VECSXP before returning it.

We do see attr() on RObject types as a quick grep on unitTests/* revelas
 
| > | An unrelated question -- has anyone looked at accessing S4 classed
| > | objects in Rcpp? ?I have had considerable experience with S4 classes
| > | and objects and could keep such a project in mind as I begin to
| > | explore Rcpp.
| >
| > I'd be happy to go there (time permitting) but very gently as I am still much
| > more firmly in S3 land. ?Romain may be more eager :)
| 
| One of the nice properties of S4 classed objects is that you can
| define validity checks for classes and bypass some of the code that
| does validity checking of arguments.

I include the S4 unit test file below. That's the closest we've come to docs :) 
 
| Perhaps you are assuming that c(1,4,9) generates an integer vector in
| this example.  In fact, c(1,4,9) is a numeric (in the sense of double
| precision) vector.  You need to write c(1L, 4L, 9L) to ensure you have
| an integer vector or coerce it with as.integer().

Yes, I am still trying to get used to that. So in that sense my test will
have been less rigid.  In this particular example it does work for both class
and new without fail when I submit c(1L, 4L, 9L)
 
| Yes, I like that idea of being able to use foo.begin() to access the
| pointer.  Part of my question related to whether I could count on
| getting the pointer to the original contents from R.  In some way I
| would need to check whether the R object had been coerced, and hence
| allocated new storage for its contents, or not.  That's not an
| immediate problem, though.

A fairly firm 'yes'. The classic API did more copying, the new API does a lot
more SEXP preservation.  You should be good.

Dirk

inst/unitTests/runit.S4.R below:


#!/usr/bin/r -t
#
# Copyright (C) 2010	Dirk Eddelbuettel and Romain Francois
#
# This file is part of Rcpp.
#
# Rcpp is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 2 of the License, or
# (at your option) any later version.
#
# Rcpp is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with Rcpp.  If not, see <http://www.gnu.org/licenses/>.

.setUp <- function(){
	suppressMessages( require( inline ) )
}

test.S4 <- function(){
	funx <- cfunction(signature(x = "ANY" ), '
	RObject y(x) ;
	List res(5) ;
	res[0] = y.isS4() ;
	res[1] = y.hasSlot("x") ;
	res[2] = y.hasSlot("z") ;
	res[3] = y.slot("x") ;
	res[4] = y.slot("y") ;
	return res ;
	', Rcpp=TRUE, verbose=FALSE, includes = "using namespace Rcpp;" )
	setClass("track",
           representation(x="numeric", y="numeric"))
	tr <- new( "track", x = 2, y = 2 )
	checkEquals( funx(tr),
		list( TRUE, TRUE, FALSE, 2.0, 2.0 )
	, msg = "slot management" )
	
	funx <- cfunction(signature(x = "ANY" ), '
	RObject y(x) ;
	y.slot( "x" ) = 10.0 ;
	y.slot( "y" ) = 20.0 ;
	return R_NilValue ;
	', Rcpp=TRUE, verbose=FALSE, includes = "using namespace Rcpp;" )
	funx( tr )
	checkEquals( tr at x, 10.0 , msg = "slot('x') = 10" )
	checkEquals( tr at y, 20.0 , msg = "slot('y') = 20" )
	
	funx <- cfunction(signature(x = "ANY" ), '
	RObject y(x) ;
	y.slot( "foo" ) = 10.0 ;
	return R_NilValue ;
	', Rcpp=TRUE, verbose=FALSE, includes = "using namespace Rcpp;" )
	checkException( funx( tr ), msg = "slot does not exist" )
	
	funx <- cfunction(signature(x = "ANY" ), '
	RObject y(x) ;
	y.slot( "foo" ) ;
	return R_NilValue ;
	', Rcpp=TRUE, verbose=FALSE, includes = "using namespace Rcpp;" )
	checkException( funx( tr ), msg = "slot does not exist" )
	
	
}
#
Hello Doug,

Le 12/03/10 17:14, Douglas Bates a ?crit :
That is tricky. We have Pairlist has Dirk suggested, but as the name 
implies they manage LISTSXP objects and not VECSXP.

Rcpp::List (which manages VECSXP) has less convenient semantics (but 
this is on my todo list), you can do :

List x ;
x["foo"]  = 1 ;
x["bar"]  = "something" ;

which is kind of convenient, but that is not cheap as the VECSXP is 
reallocated each time.

Otherwise, it is possible to allocate it first and manage the names 
separately:

List x(2);
x[0] = 1 ;
x[1] = "something" ;
CharacterVector y(2) ;
y = "foo", "bar" ;
x.names() = y ;

but this is too verbose.

what we need is something like this:

List x(
   Named( "foo" ) = 1,
   Named( "bar" ) = "something" ) ;
We need to get better at documenting Rcpp, now that the code is getting 
stable. For now I think the best place to look for examples is the unit 
tests.

 > system.file( "unitTests", package = "Rcpp" )

They are vaguely organized in terms of classes.
yes.

x.attr( "class" ) = "whatever" ;
The slot method of the RObject class (which is the parent class of all 
our classes) allows read/write of slots.

x.slot( "foo" ) = 3 ;

this throws exception if x is not an S4 object or if its class does not 
have a "foo" attribute. this then uses the R function "slot<-" (there 
might be a better way with the R api, but I did not find it at the time)
There is coercion. for example a Rcpp::NumericVector is only allowed to 
store a REALSXP, if you give it a INTSXP, coercion happens.

There is no copy. The objects are essentially proxy objects to the SEXP 
they encapsulate. We have the clone template function to explicitely 
copy when this is needed, for example :

Rcpp::NumericVector aa(a) ;
Rcpp::NumericVector b = Rcpp::clone(aa) ;
yes. we want that too. but this is somewhat orthogonal to the "copy on 
change" semantics of R ...
#
Le 12/03/10 18:23, Douglas Bates a ?crit :
No, they are shipped to R as a LISTSXP.
This should work since the List( SEXP ) constructor uses coercion. Let 
us know otherwise.
res.attr( "class" ) = "foo" ;
For now the only things we have in the api to deal with S4'ness is these 
methods : "isS4", "slot", "hasSlot".

Not sure right now how we could formalize S4 objects in the api, but 
this would definitely be a fine addition.
Yes. The objects of the new api have the SEXP as their data member and 
methods applied on objects are relayed back to the SEXP in terms of 
calls t the R API. We use R memory management all the way.

There are some times though where memory is reallocated, because there 
is no other way. for example if you add values to a vector (push_back, 
push_front, etc ...)

  In some way I
There is something that might be helpful (or not). In the file 
RcppCommon.h, we have the RCPP_RETURN_VECTOR macro, this is an initial 
attempt to avoid writing the same switch( TYPEOF(x), case REALSXP: ... 
code over and over again. There is an example use in RcppCommon.cpp

  
    
#
On Fri, Mar 12, 2010 at 2:23 PM, Romain Francois
<romain at r-enthusiasts.com> wrote:
Hmm.  That may cause problems in the long run.  IIRC, having LISTSXPs,
other than those related to language objects, loose in R is frowned
upon.  I'll check before saying anything definite.
I tried it and was terribly frustrated that my example didn't work and
I couldn't decide why.  After much exploration I found that it had
nothing to do with that construction and was in fact due to my using 6
SEXPs in a Pairlist constructor.  Apparently my version is not
compiled with HAS_VARIADIC_TEMPLATES set and that means the
constructors only go up to 5.  I see now that because I am using g++
4.4.1 I should probably set the compiler flag -std=c++0x

I have tried it again initializing a list from the Pairlist object and
that works.  Do you happen to know if that conversion determines the
length of the result before allocating it?  I tried to read the code
for internal::ListInitialization but I am not sufficiently fluent in
C++ to decide what is going on.
#
Le 12/03/10 22:00, Douglas Bates a ?crit :
I agree. Nobody would do something like this in R:

 > pairlist( foo = 1, bar = 2 )

so I would be in favour of giving less exposure to the Pairlist class. 
IIRC the only reason why Dirk uses them is because of the convenience of 
creating the object in one call... so we just need to have similar 
semantics with the List class.
This should enable c++0x :

RCPP_CXX0X="yes" R CMD INSTALL Rcpp

(we don't use the PKG_CXXFLAGS for this because otherwise R CMD check is 
unhappy about non portable flags)


Ah yes. 5 here is arbitrary, and it would not be too hard to make it 10, 
20 or something. Now that I am on OSX, I am stuck with gcc 4.2 :-( so no 
c++0x for me anymore (well only when I use my fedora machine). I took 
the precaution of wrapping the code bloat code into pseudo xml :

/* <code-bloat> */

/* </code-bloat> */

but wrote that manually. Should not be hard to come up with some R code 
to generate the code bloat... maybe tomorrow.
We outsource this to the as.list function of R. This is probably not the 
easiest piece of code, and it surely deserve some more documentation. In 
a nutshell :

The constructor :

List(SEXP) aka Vector<VECSXP>( SEXP )

calls r_cast<VECSXP> which calls r_true_cast if the object is not a 
VECSXP, which looks like this:

template<> SEXP r_true_cast<VECSXP>(SEXP x){
	return convert_using_rfunction(x, "as.list" ) ;
}

This is somewhat lazy and we could to some switch( TYPEOF( )) to keep 
some things internal for the easy cases (LISTSXP, EXPRSXP, LANGSXP) and 
use as.list for the other cases to avoid the price of the round trip to 
the R side.
ListInitialization might not be what you think. It is only there to 
support this notation.

NumericVector x(4) ;
x = 1.0, 2.0, 3.0, 4.0 ;

This is a trick we borrow from blitz++

but C++0x gives this notation that I find much nicer:

NumericVector x = {1.0, 2.0, 3.0; 4.0} ;