Efficiency challenge: MANY subsets

Try this one;  it is doing a list of 7000 in under 2 seconds:
 sequences <- list(
+
+
+  c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I"
+ ,"M",
+
+
+  "N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","F",
"N","I","N","I","N","I","D","K","M","Y","I","H","*")
+  )

 indexes <- list(
+   list(
+     c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
+   )
+  )
indexes <- rep(indexes,10)
sequences <- rep(sequences,7000)

system.time({
+ fragments <- lapply(indexes, function(.seq){
+     lapply(.seq, function(.range){
+         .range <- seq(.range[1], .range[2])  # save since we use several times
+         lapply(sequences, '[', .range)
+     })
+ })
+ })
   user  system elapsed
   1.24    0.00    1.26

On Fri, Jan 16, 2009 at 3:16 PM, Johannes Graumann
Thanks. Very elegant, but doesn't solve the problem of the outer "for" loop,
since I now would rewrite the code like so:

fragments <- list()
for(iN in seq(length(sequences))){
 cat(paste(iN,"\n"))
 fragments[[iN]] <-
   lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))])
}

still very slow for length(sequences) ~ 7000.

Joh

On Friday 16 January 2009 14:23:47 Henrique Dallazuanna wrote:
Try this:

lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))])

On Fri, Jan 16, 2009 at 11:06 AM, Johannes Graumann <

johannes_graumann at web.de> wrote:
Hello,

I have a list of character vectors like this:

sequences <- list(

c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I"
,"M",

"N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","
F", "N","I","N","I","N","I","D","K","M","Y","I","H","*")
)

and another list of subset ranges like this:

indexes <- list(
 list(
   c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
 )
)

What I now want to do is to subset each entry in "sequences"
(sequences[[1]]) with all ranges in the corresponding low level list in
"indexes" (indexes[[1]]). Here is what I came up with.

fragments <- list()
for(iN in seq(length(sequences))){
 cat(paste(iN,"\n"))
 tmpFragments <- sapply(
   indexes[[iN]],
   function(x){
     sequences[[iN]][seq.int(x[1],x[2])]
   }
 )
 fragments[[iN]] <- tmpFragments
}

This works fine, but "sequences" contains thousands of entries and the
corresponding "indexes" are sometimes hundreds of ranges long, so this
whole
process is EXTREMELY inefficient.

Does somebody out there take the challenge and show me a way on how to
speed
this up?

Thanks for any hints,

Joh

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Efficiency challenge: MANY subsets

Thread (6 messages)