Skip to content

vectorization with subset?

4 messages · dlv04c, David Winsemius

#
Hello,

I have a data frame (68,000 rows) of scores (V4) for a series of [genomic]
coordinates ranges (V2 to V3).



I also have a data frame (1.2 million rows) of single [genomic] coordinates.  



For each genomic coordinate (in coord), I would like to determine the
average of all scores whose genomic ranges (in scores) encompass the
coordinate (in coord). To accomplish this, I tried:



The function works, but is extremely slow.

It would take about 4 days for this to finish for a single data set, and I
have 64 data sets.

Why does the rate at which coordinate averages are calculated increase when
coord is smaller, but not when scores is smaller?

How can I accomplish the same thing more efficiently?

Thanks,

Dan

--
View this message in context: http://r.789695.n4.nabble.com/vectorization-with-subset-tp4635156.html
Sent from the R help mailing list archive at Nabble.com.
#
On Jul 2, 2012, at 12:15 PM, dlv04c wrote:

            
You probably need to start by reading the vignettes for the IRanges  
package. It's difficult to be sure since you did not show the code for  
what you were doing currently.
#
On Jul 2, 2012, at 5:16 PM, dlv04c wrote:

            
No code here or in original posting to rhelp. You are under the  
delusion that Nabble is R-help. It is not.
This is the rhelp mailing list. Not a website.