faster base::sequence
Hi Romain, FWIW I see at least 2 small differences in the way sequence_c() behaves with respect to good old sequence(): zeros and names. > sequence(c(a=5, b=0, c=2)) a1 a2 a3 a4 a5 c1 c2 1 2 3 4 5 1 2 sequence_c() ignores the names and doesn't support zeros in the input. Cheers, H.
On 11/28/2010 01:56 AM, Romain Francois wrote:
Le 28/11/10 10:30, Prof Brian Ripley a ?crit :
Is sequence used enough to warrant this? As the help page says Note that ?sequence <- function(nvec) unlist(lapply(nvec, seq_len))? and it mainly exists in reverence to the very early history of R.
I don't know. Would it be used more if it were more efficient ?
I regard it as unsafe to assume that NA_INTEGER will always be negative, and bear in mind that at some point not so far off R integers (or at least lengths) will need to be more than 32-bit.
sure. updated and dressed up as a patch. I've made it a .Call because I'm not really comfortable with .Internal, etc ... Do you mean that I should also use something else instead of "int" and "int*". Is there some future proof typedef or macro for the type associated with INTSXP ?
On Sun, 28 Nov 2010, Romain Francois wrote:
Hello, Based on yesterday's R-help thread (help: program efficiency), and following Bill's suggestions, it appeared that sequence:
sequence
function (nvec)
unlist(lapply(nvec, seq_len))
<environment: namespace:base>
could benefit from being written in C to avoid unnecessary memory
allocations.
I made this version using inline:
require( inline )
sequence_c <- local( {
fx <- cfunction( signature( x = "integer"), '
int n = length(x) ;
int* px = INTEGER(x) ;
int x_i, s = 0 ;
/* error checking */
for( int i=0; i<n; i++){
x_i = px[i] ;
/* this includes the check for NA */
if( x_i <= 0 ) error( "needs non negative integer" ) ;
s += x_i ;
}
SEXP res = PROTECT( allocVector( INTSXP, s ) ) ;
int * p_res = INTEGER(res) ;
for( int i=0; i<n; i++){
x_i = px[i] ;
for( int j=0; j<x_i; j++, p_res++)
*p_res = j+1 ;
}
UNPROTECT(1) ;
return res ;
' )
function( nvec ){
fx( as.integer(nvec) )
}
})
And here are some timings:
x <- 1:10000 system.time( a <- sequence(x ) )
utilisateur syst?me ?coul? 0.191 0.108 0.298
system.time( b <- sequence_c(x ) )
utilisateur syst?me ?coul? 0.060 0.063 0.122
identical( a, b )
[1] TRUE
system.time( for( i in 1:10000) sequence(1:10) )
utilisateur syst?me ?coul? 0.119 0.000 0.119
system.time( for( i in 1:10000) sequence_c(1:10) )
utilisateur syst?me ?coul? 0.019 0.000 0.019 I would write a proper patch if someone from R-core is willing to push it. Romain
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319