Skip to content
Prev 2721 / 10988 Next

[Rcpp-devel] add new components to list without specifying list size initially

Hi Walrus,

While I'm a huge fan of Rcpp, I think you'll be a bit better served
(for the time being) to read up on some of the bioconductor packages
that are suited for these types of things.

In particular I am thinking about the IRanges and GenomicRanges
packages. They R wrappers to what is basically an IntervalTree that
you can annotate, and then use to perform fast overlap/intersection
queries.

For instance:

R> library(GenomicRanges)
R> probes <- GRanges('chr1', IRanges(c(81,85), c(85,100)), strand='*')
R> genes <- GRanges(c('chr1', 'chr1', 'chr2'), IRanges(c(11, 111, 11),
c(90, 190, 90)), strand='*',
     name=c('g1', 'g2', 'g3'))

R> genes
GRanges with 3 ranges and 1 elementMetadata value
    seqnames     ranges strand |        name
       <Rle>  <IRanges>  <Rle> | <character>
[1]     chr1 [ 11,  90]      * |          g1
[2]     chr1 [111, 190]      * |          g2
[3]     chr2 [ 11,  90]      * |          g3

## How many probes does each gene have land in it?
R> countOverlaps(genes, probes)
[1] 2 0 0

## Which probes are these?

R> subsetByOverlaps(probes, genes)
GRanges with 2 ranges and 0 elementMetadata values
    seqnames    ranges strand |
       <Rle> <IRanges>  <Rle> |
[1]     chr1 [81,  85]      * |
[2]     chr1 [85, 100]      * |

## and much more stuff

There's a mess load of functionality in IRanges, GenomicRanges,
Biostrings packages that you'll likely find very useful, and efficient
(much of the core of these packages are written in C) if you're doing
a lot of bioinformatics/genomics work. So, taking some time to get
familiar with those will be useful -- you'll find that you'll also
need to drop into Rcpp for other stuff (as I do, too), so it will
still be useful for you in the future.

That's just my 2 cents.

-steve


On Thu, Aug 11, 2011 at 9:44 PM, Walrus Foolhill
<walrus.foolhill at gmail.com> wrote: