Message-ID: <CAC2h7uu1mJAZAp+M0ZgT4UUDHHRUGZZyuFUnvVQ2ercyz+ujKA@mail.gmail.com>
Date: 2012-02-22T12:59:49Z
From: Kasper Daniel Hansen
Subject: [Bioc-devel] Subject: GRanges performance issue, how to avoid looping?
In-Reply-To: <CAPmAPXPHv7bOrgwETmppAQQgDTscqtjUqqYgjgZYOw_g62c+QA@mail.gmail.com>
?findOverlaps
Kasper
On Wed, Feb 22, 2012 at 7:15 AM, Jesper G?din <jesper.gadin at gmail.com> wrote:
> Hi everyone!
> I want to find all reads that map to a specific position in a GRanges
> object.
> In my case this object is named "reads" and I have tried using the function
> below.
>
>
> #From a given position and chromosome, find the reads where this position
> map
> fun_find_reads_from_pos <- function(reads,chr,someposition, verbose=TRUE) {
>
> ? ?#get length over the sample list
> ? ?nr_samples <-length(reads)
>
> ? ?for (j in (1:nr_samples)) {
> ? ? ? ?onePerson <- reads[[j]]
> ? ? ? ?chr_onePerson <- onePerson[seqnames(onePerson)==chr]
> ? ? ? ?pre_seq<-
> chr_onePerson[(ranges(chr_onePerson))%in%(IRanges(start=someposition,
> width=1)),6]
>
> ? ? ? ?if(!length(pre_seq)==0) {
> ? ? ? ? ? ?for (k in (1:length(pre_seq))) {
> ? ? ? ? ? ? ? ?position <- (someposition-start(pre_seq[k])) +1
> #57537220
> ? ? ? ? ? ? ? ?print("found the given someposition on this position within
> the read")
> ? ? ? ? ? ? ? ?print(position)
> ? ? ? ? ? ?}
> ? ? ? ?}
> ? ?}
>
> } #End of function
>
>
> To use the function try:
> fun_find_reads_from_pos(reads,"chr12",57537220)
>
> Now that should work.
> And everything would be fine if it wasnt for the time issue.
> The time-thief I guess is that I have to loop over every read
> in every sample to get to the information I want. Is it
> possible to use the data structure in another better way to
> avoid unnecessary looping. Anyone have an idea?
>
> This function would be the core function of my future package.
> So its important that it is effective.
>
> Sincerely,
> Jesper
>
> Link to the reads object needed to run the function:
> http://uppsalanf.se/sites/default/files/reads_object.RData
> And the function itself (same as above):
> http://uppsalanf.se/sites/default/files/example-bioc-dev.R
>
>
>> sessionInfo()
> R version 2.13.2 (2011-09-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> ?[1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C
> ?[3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8
> ?[5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8
> ?[7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C
> ?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base
>
> other attached packages:
> [1] Biostrings_2.20.4 ? GenomicRanges_1.4.8 IRanges_1.10.6
>> ?GRanges
>> sessionInfo()
> R version 2.13.2 (2011-09-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> ?[1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C
> ?[3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8
> ?[5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8
> ?[7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C
> ?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base
>
> other attached packages:
> [1] Biostrings_2.20.4 ? GenomicRanges_1.4.8 IRanges_1.10.6
>
> loaded via a namespace (and not attached):
> [1] tools_2.13.2
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel