Skip to content

[Rcpp-devel] Rcpp internal versions of lapply and sapply

7 messages · Douglas Bates, Romain Francois

#
Ever since I read Phil Spector's book on S I have been a fan of
functional programming in S and R.  When Jose Pinheiro and I were
working on the nlme package there was a joke between us that you could
tell which of us wrote which parts of the code because his parts
always had an object named "aux" and my parts always had
"unlist(lapply(list(...)))".

Even within C++ I like to use the std:: algorithms like
std::transform, but, of course, there are differences between a
strongly typed language like C++ and a dynamically typed language like
S.  Templates can get around these differences to some extent but I am
still a bit of a novice regarding templates.

Currently I want to apply some functions to lists but entirely within
the C++ code (i.e. I don't want to create Rcpp Function objects and
call back to R).  For the sake of argument, consider a function that
extracts the lengths of the components of the lists.


library(Rcpp)
library(inline)
inc <- '
class length {
public:
    R_len_t operator() (RObject const& x) {return Rf_length(SEXP(x));}
};
'
code <- '
List lst(ll);
IntegerVector ans(lst.size());
std::transform(lst.begin(), lst.end(), ans.begin(), length());
return ans;
'
ltst <- cxxfunction(signature(ll = "list"), code, "Rcpp", inc)
ll <- list(a = numeric(0), b = LETTERS[6:9], c = c)
ltst(ll)
sapply(ll, length)

I would like to create a templated sapply function like

template <int RTYPE>
Vector<RTYPE> sapply(List ll, ??) {
    Vector<RTYPE> ans(ll.size());
    CharacterVector nms = ll.names();
    if (nms.size() == ll.size()) ans.names() = nms;
    std::transform(ll.begin(), ll.end(), ans.begin(), ??);
    return ans;
}

but I don't know how to specify the second argument that is a function
that returns the atomic element type of a Vector<RTYPE> (is this as
simple as Vector<RTYPE>::value_type?) and has a single argument which
probably should be an RObject (although might be an SEXP, if that was
more convenient).  Can someone (probably Romain) provide some
guidance?
#
Le 16/06/10 14:58, Douglas Bates a ?crit :
I recently added Vector::import_transform which might be all you need.

NumericVector x = NumericVector::import_transform(
	y.begin(), y.end(), f) ;


This will work as long as f is of some type acceptable by std::transform.


But let's keep going anyway. One way (I'll write later why this is not 
fully satisfactory) is here:

require( Rcpp )
require( inline )

inc <- '

template <typename OBJECT, typename FUN>
SEXP sapply(const OBJECT& object, FUN fun) {

	const int RTYPE = Rcpp::traits::r_sexptype_traits< typename 
FUN::result_type>::rtype ;
	
	Rcpp::Vector<RTYPE> ans = Rcpp::Vector<RTYPE>::import_transform(
		object.begin(), object.end(), fun
		) ;
	return ans ;
}

template <int RT, typename FUN>
SEXP sapply( const Rcpp::Vector<RT>& object, FUN fun){

	const int RTYPE = Rcpp::traits::r_sexptype_traits< typename 
FUN::result_type>::rtype ;
	
	Rcpp::Vector<RTYPE> ans = Rcpp::Vector<RTYPE>::import_transform(
		object.begin(), object.end(), fun
		) ;
	ans.names() = object.names() ;
	return ans ;
}

template <typename T>
struct square : std::unary_function<T,T> {
	inline T operator()( T& t ){ return t*t ; }
} ;

'

code <- '
	NumericVector xx( x );
	
	return sapply( xx, square<double>() ) ;
	
'

fx <- cxxfunction( signature( x = "numeric" ), code, "Rcpp", inc )


 > fx( seq(1,10, length = 5 ) )
[1]   1.0000  10.5625  30.2500  60.0625 100.0000

I'll try to explain bit by bit. Please let me know if something is too 
cryptic. The good thing about templates is that as long as they work 
they don't complain. The bad thing is that when they start complaining, 
you need large screens to accomodate all the warnings/erros you get...



Starting from the last thing, the square struct.

template <typename T>
struct square : std::unary_function<T,T> {
	inline T operator()( const T& t ){ return t*t ; }
} ;

we need the function that goes into sapply to help, i.e we need it to be 
aware of the result type. inheriting from std::unary_function makes this 
easy. It essentially just adds some typedef. See 
http://www.cplusplus.com/reference/std/functional/unary_function/

So when we do square<double>(), the function is self aware that it 
returns a double. We need that to decide which kind of R vectors we want 
to create. Which is the job of the Rcpp::traits::r_sexptype_traits 
trait. So:

const int RTYPE = Rcpp::traits::r_sexptype_traits< typename 
FUN::result_type>::rtype ;

it is very important that this is const, because we are using this as a 
template argument later and this needs to be compile time constant.

Then, we just delegate to std::transform through Vector::import_transform:

Rcpp::Vector<RTYPE> ans = Rcpp::Vector<RTYPE>::import_transform(
	object.begin(), object.end(), fun
) ;
	


For now the sapply just return a SEXP because that is easy, but we could 
have the result type deduced as well (fasten your seatbelt):

template <typename OBJECT, typename FUN>
Rcpp::Vector< Rcpp::traits::r_sexptype_traits< typename 
FUN::result_type>::rtype >
sapply(const OBJECT& object, FUN fun) {


Anyway, why is there two sapply ? because the first one is more generic, 
you could sapply on a std::vector<double> for example, the only thing we 
do with the object is to call begin() and end(), so as long as they 
exist and they produce something that makes sense for std::transform, we 
are fine (this is why templates are just so great).

The second version is specific to Rcpp vectors, on which we can call 
names().



Now why is not satisfactory ? This is not lazy enough for me. With a 
little bit more work, instead of returning a Rcpp::Vector which 
allocates all of its memory right now (not lazy), we could have sapply 
returning some sort of proxy object. The proxy object would expose a 
similar interface : operator[] and iterator, but the result will only be 
calculated when truly necessary.

Why should we care about that ? Imagine, instead of just calculating the 
square, we want to find out if all the squared values are below 15, ie. 
the equivalent of the R code "all( x^2 < 15 )" ? We only need to 
calculate three values to make our decision, so why bother calculating 
the 2 last ones. (lazy)

This is the kind of thing I want to bring to Rcpp with the "sugar" piece 
of the puzzle.

We would then have this expression:

code <- '
	Rcpp::NumericVector xx( x) ;
	
	return Rcpp::all( sapply( xx, square<double>() ) < 15 ) ;
	
'

It almost work right now (I need to add some code so that <15 makes 
sense, I only added Vector < Vector so far), but currently requires to 
compute all the 5 values before handling the data to all.


Please come back with questions if some gaps need to be filled.

Romain
#
Small update incorporating the use of all (and using namespace Rcpp to 
make things slightly less cluterred).

require( Rcpp )
require( inline )

inc <- '

using namespace Rcpp ;

template <typename OBJECT, typename FUN>
Vector< traits::r_sexptype_traits< typename FUN::result_type>::rtype >
sapply(const OBJECT& object, FUN fun) {

	const int RTYPE = traits::r_sexptype_traits< typename 
FUN::result_type>::rtype ;
	
	Vector<RTYPE> ans = Vector<RTYPE>::import_transform(
		object.begin(), object.end(), fun
		) ;
	return ans ;
}

template <int RT, typename FUN>
Vector< traits::r_sexptype_traits< typename FUN::result_type>::rtype >
sapply( const Vector<RT>& object, FUN fun){

	const int RTYPE = traits::r_sexptype_traits< typename 
FUN::result_type>::rtype ;
	
	Vector<RTYPE> ans = Vector<RTYPE>::import_transform(
		object.begin(), object.end(), fun
		) ;
	ans.names() = object.names() ;
	return ans ;
}

template <typename T>
struct square : std::unary_function<T,T> {
	inline T operator()( const T& t ){ return t*t ; }
} ;

'

code <- '
	NumericVector xx( x) ;
	return all( sapply( xx, square<double>() ) < 15 ) ;
	
'

fx <- cxxfunction( signature( x = "numeric" ), code, "Rcpp", inc )
fx( seq(1,10, length = 5 ) )


This needs rev 1555 of Rcpp for the "< 15" part. I'll try to come up 
with a lazy version of this for the Rcpp sugar.

I'll probably recycle some of this thread into a blog post (with nicer 
formatting).


Also worth noting the existence of Rcpp::unary_call, Rcpp::fixed_call 
and Rcpp::binary_call that wrap up an R function in a strong type 
illusion (powered by the twins Rcpp::as and Rcpp::wrap).

So for example (this will be less efficient for obvious reasons):

code <- '
	NumericVector xx( x) ;
	return all( sapply( xx, unary_call<double,double>( Function("square") ) 
) < 15 ) ;
	
'

square <- function(x) x^2
fx <- cxxfunction( signature( x = "numeric" ), code, "Rcpp", inc )
fx( seq(1,10, length = 5 ) )

Romain

PS : More examples of these in the runit.Language.R file.

Le 16/06/10 16:18, Romain Francois a ?crit :

  
    
#
Thank you very much, Romain.  This certainly helps my understanding of
unary_function, traits, etc.  These are the sorts of things I can read
about but until I see a non-trivial usage it is hard to keep these
ideas straight.

Two minor questions occur to me at this point.  In my code I assigned
the names to be the names of the input list but I think now that I
should have assigned a clone of the object names instead.  To be safe
I should have cloned before assigning, right?  I also checked the
length before assigning but you didn't.  Am I reflecting too much of
my C programming background here?  In other words, is there some magic
in the Rcpp names assignment that does the checking?

Thanks again.

On Wed, Jun 16, 2010 at 9:55 AM, Romain Francois
<romain.francois at dbmail.com> wrote:
#
Le 16/06/10 17:44, Douglas Bates a ?crit :
I have been lazy (the wrong kind of lazyness) when implementing names<-, 
and just used a callback to the R function names<- . The relevant chunk 
is in Vector.h in the NamesProxy class:

		void set(SEXP x) const {
			SEXP new_vec = PROTECT( internal::try_catch(
			Rf_lcons( Rf_install("names<-"),
					Rf_cons( parent, Rf_cons( x , R_NilValue) )))) ;
			/* names<- makes a new vector, so we have to change
			   the SEXP of the parent of this proxy, it might be
			   worth to work directly with the names attribute instead
			   of using the names<- R function, but then we need to
			   take care of coercion, recycling, etc ... we cannot just
			   brutally assign the names attribute */
			const_cast<Vector&>(parent).setSEXP( new_vec ) ;
			UNPROTECT(1) ; /* new_vec */
     		}

So with this implementation, you don't need to worry about copying 
names. But it might not be the best implementation.
The magic comes from R.

 > x <- 1:10
 > names(x) <- c("a", "b")
 > x
    a    b <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
    1    2    3    4    5    6    7    8    9   10

In your example ll.names() is either NULL or a vector of the appropriate 
size. I think (not sure though) that names<- is restrictive about the rhs:

 > names(x) <- rnorm
Erreur dans as.character(function (n, mean = 0, sd = 1)  :
   cannot coerce type 'closure' to vector of type 'character'

Romain

  
    
#
By the way, you were correct in your original response that all I
really need is input_transform but it is certainly a worthwhile
exercise to see how traits, etc. are used and I appreciate your kind
explanations.

On Wed, Jun 16, 2010 at 10:58 AM, Romain Francois
<romain at r-enthusiasts.com> wrote:
2 days later
#
Hello,

As a follow up on this thread, I've now commited a lazy version of 
sapply. It can be used with a functor that has a nested result_type 
type, or raw functions as illustrated by these examples from the unit 
tests:

test.sugar.sapply <- function( ){

	inc <- '
	template <typename T>
	class square : public std::unary_function<T,T> {
	public:
		T operator()( T t) const { return t*t ; }
	} ;
	'
	
	fx <- cxxfunction( signature( x = "numeric" ), '
	
		NumericVector xx(x) ;
		NumericVector res = sapply( xx, square<double>() );
		
		return res ;
	
	', include = inc, plugin = "Rcpp" )
	
	checkEquals( fx(1:10) , (1:10)^2 )
}

test.sugar.sapply.rawfun <- function( ){

	inc <- '
	double square( double x){ return x*x; }
	'
	
	fx <- cxxfunction( signature( x = "numeric" ), '
	
		NumericVector xx(x) ;
		NumericVector res = sapply( xx, square );
		
		return res ;
	
	', include = inc, plugin = "Rcpp" )
	
	checkEquals( fx(1:10) , (1:10)^2 )
}


There is one more thing since this thread, sapply uses the new 
Rcpp::traits::result_of trait to find out the result type of the function.

namespace Rcpp{
namespace traits{

template <typename T>
struct result_of{
	typedef typename T::result_type type ;
} ;

template <typename RESULT_TYPE, typename INPUT_TYPE>
struct result_of< RESULT_TYPE (*)(INPUT_TYPE) >{
	typedef RESULT_TYPE type ;
} ;

}
}


This version of sapply is lazy, so for example this works :

test.sugar.sapply.square <- function( ){

	inc <- '
	template <typename T>
	class square : public std::unary_function<T,T> {
	public:
		T operator()( T t) const { return t*t ; }
	} ;
	'
	
	fx <- cxxfunction( signature( x = "numeric" ), '
	
		NumericVector xx(x) ;
		return all( sapply( xx * xx , square<double>() ) < 10 );
	
	', include = inc, plugin = "Rcpp" )
	
	checkTrue( fx(1:10)  )
}

and sapply only needs to calculate square( xx[0] * xx[0] ) and square( 
xx[1] * xx[1] ) to make its final decision.

Romain

Le 16/06/10 14:58, Douglas Bates a ?crit :