In C, a fast way to slice a vector?
r-devel-request at r-project.org wrote:
Impressive stuff. Nice to see people giving some though to this. I will explore the packages you mentioned. Thank you Saptarshi Guha On Mon, May 11, 2009 at 12:37 AM, Patrick Aboyoun <paboyoun at fhcrc.org> wrote:
Saptarshi, I know of two alternatives you can use to do fast extraction of consecutive subsequences of a vector: 1) Fast copy: ?The method you mentioned of creating a memcpy'd vector 2) Pointer management: Creating an externalptr object in R and manage the start and end of your data If you are looking for a prototyping environment to try, I recommend using the IRanges and Biostrings packages from the Bioconductor project. The IRanges package contains a function called subseq for performing 1) on all basic vector types (raw, logical, integer, etc.) and Biostrings package contains a subseq method on an externalptr based class that implements 2. I was going to lobby R core members quietly about adding something akin to subseq from IRanges into base R since it is extremely useful for all long vectors and could replace all a:b calls with a <= b in R code, but this publicity can't hurt.
The Python development team has been developing something similar for python 3.0 (Buffer and Memoryview), and they are backporting it to the latest 2.x releases. I have just started toying with it, and it seems looking very nice. There might be good ideas to take from there into a possible R built-in capability. L.
Here is an example:
source("http://bioconductor.org/biocLite.R")
biocLite(c("IRanges", "Biostrings"))
<< download output omitted >>
suppressMessages(library(Biostrings))
x <- rep(charToRaw("a"), 1e7)
y <- BString(rawToChar(x))
suppressMessages(library(Biostrings))
x <- rep(charToRaw("a"), 1e7)
y <- BString(rawToChar(x))
system.time(x[13:1e7])
? user ?system elapsed ?0.304 ? 0.073 ? 0.378
system.time(subseq(x, 13))
? user ?system elapsed ?0.011 ? 0.007 ? 0.019
system.time(subseq(y, 13))
? user ?system elapsed ?0.003 ? 0.000 ? 0.004
identical(x[13:1e7], subseq(x, 13))
[1] TRUE
identical(x[13:1e7], charToRaw(as.character(subseq(y, 13))))
[1] TRUE
sessionInfo()
R version 2.10.0 Under development (unstable) (2009-05-08 r48504) i386-apple-darwin9.6.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base other attached packages: [1] Biostrings_2.13.5 IRanges_1.3.5 loaded via a namespace (and not attached): [1] Biobase_2.5.2 Quoting Saptarshi Guha <saptarshi.guha at gmail.com>:
Hello, Suppose in the following code, PROTECT(sr = R_tryEval( .... )) sr is a RAWSXP vector. I wish to return another RAWSXP starting at position 13 onwards (base=0). I could create another RAWSXP of the correct length and then memcpy the required bytes and length to this new one. However is there a more efficient method? Regards Saptarshi Guha
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel