Skip to content

[Bioc-devel] Subject: GRanges performance issue, how to avoid looping?

7 messages · Kasper Daniel Hansen, Michael Lawrence, Hervé Pagès +1 more

#
?findOverlaps

Kasper
On Wed, Feb 22, 2012 at 7:15 AM, Jesper G?din <jesper.gadin at gmail.com> wrote:
#
Hi Jesper,
On 02/22/2012 04:15 AM, Jesper G?din wrote:
We can't run this function because we don't know what 'reads' is.
It looks like a list-like object (because you are using [[ on it),
where each element could be a GRanges or GappedAlignments object
(because you are using seqnames() and ranges() on those elements).
Given the subject of your email, I'll assume it's a GRangesList
object or a list of GRanges objects.
You are using 2 nested loops.
The outer loop is on the samples and I would expect the nb of samples
to be relatively small, so this loop is probably not an issue.
The inner loop is on the genomic ranges that actually overlap with
the specified position. It is probably expensive but you shouldn't
do that loop. Instead you should take advantage of the fact that
arithmetic is vectorized in R. So instead of:

         if(!length(pre_seq)==0) {
             for (k in (1:length(pre_seq))) {
                 position <- (someposition-start(pre_seq[k])) +1
                 print("found the given someposition on this position 
within the read")
                 print(position)
             }
         }

do something like:

         if (length(pre_seq) != 0L) {
             position <- someposition - start(pre_seq) + 1L
             cat("found the given someposition on those positions within 
the read:\n")
             print(position)
         }

Now it's not clear to me why you want to print this but that's another
story.

Cheers,
H.

  
    
4 days later