[Bioc-devel] Subject: GRanges performance issue, how to avoid looping?
?findOverlaps Kasper
On Wed, Feb 22, 2012 at 7:15 AM, Jesper G?din <jesper.gadin at gmail.com> wrote:
Hi everyone!
I want to find all reads that map to a specific position in a GRanges
object.
In my case this object is named "reads" and I have tried using the function
below.
#From a given position and chromosome, find the reads where this position
map
fun_find_reads_from_pos <- function(reads,chr,someposition, verbose=TRUE) {
? ?#get length over the sample list
? ?nr_samples <-length(reads)
? ?for (j in (1:nr_samples)) {
? ? ? ?onePerson <- reads[[j]]
? ? ? ?chr_onePerson <- onePerson[seqnames(onePerson)==chr]
? ? ? ?pre_seq<-
chr_onePerson[(ranges(chr_onePerson))%in%(IRanges(start=someposition,
width=1)),6]
? ? ? ?if(!length(pre_seq)==0) {
? ? ? ? ? ?for (k in (1:length(pre_seq))) {
? ? ? ? ? ? ? ?position <- (someposition-start(pre_seq[k])) +1
#57537220
? ? ? ? ? ? ? ?print("found the given someposition on this position within
the read")
? ? ? ? ? ? ? ?print(position)
? ? ? ? ? ?}
? ? ? ?}
? ?}
} #End of function
To use the function try:
fun_find_reads_from_pos(reads,"chr12",57537220)
Now that should work.
And everything would be fine if it wasnt for the time issue.
The time-thief I guess is that I have to loop over every read
in every sample to get to the information I want. Is it
possible to use the data structure in another better way to
avoid unnecessary looping. Anyone have an idea?
This function would be the core function of my future package.
So its important that it is effective.
Sincerely,
Jesper
Link to the reads object needed to run the function:
http://uppsalanf.se/sites/default/files/reads_object.RData
And the function itself (same as above):
http://uppsalanf.se/sites/default/files/example-bioc-dev.R
sessionInfo()
R version 2.13.2 (2011-09-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: ?[1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C ?[3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8 ?[5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8 ?[7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C ?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base other attached packages: [1] Biostrings_2.20.4 ? GenomicRanges_1.4.8 IRanges_1.10.6
?GRanges sessionInfo()
R version 2.13.2 (2011-09-30) Platform: x86_64-pc-linux-gnu (64-bit) locale: ?[1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C ?[3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8 ?[5] LC_MONETARY=C ? ? ? ? ? ? ?LC_MESSAGES=en_US.UTF-8 ?[7] LC_PAPER=en_US.UTF-8 ? ? ? LC_NAME=C ?[9] LC_ADDRESS=C ? ? ? ? ? ? ? LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base other attached packages: [1] Biostrings_2.20.4 ? GenomicRanges_1.4.8 IRanges_1.10.6 loaded via a namespace (and not attached): [1] tools_2.13.2 ? ? ? ?[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel