Hello,
I have a list of character vectors like this:
sequences <- list(
c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I","M",
"N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","F",
"N","I","N","I","N","I","D","K","M","Y","I","H","*")
)
and another list of subset ranges like this:
indexes <- list(
list(
c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
)
)
What I now want to do is to subset each entry in "sequences"
(sequences[[1]]) with all ranges in the corresponding low level list in
"indexes" (indexes[[1]]). Here is what I came up with.
fragments <- list()
for(iN in seq(length(sequences))){
cat(paste(iN,"\n"))
tmpFragments <- sapply(
indexes[[iN]],
function(x){
sequences[[iN]][seq.int(x[1],x[2])]
}
)
fragments[[iN]] <- tmpFragments
}
This works fine, but "sequences" contains thousands of entries and the
corresponding "indexes" are sometimes hundreds of ranges long, so this whole
process is EXTREMELY inefficient.
Does somebody out there take the challenge and show me a way on how to speed
this up?
Thanks for any hints,
Joh
Efficiency challenge: MANY subsets
6 messages · Johannes Graumann, Jorge Ivan Velez, Henrique Dallazuanna +1 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090116/abfba3b4/attachment-0001.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090116/89a3a32e/attachment-0001.pl>
Thanks. Very elegant, but doesn't solve the problem of the outer "for" loop,
since I now would rewrite the code like so:
fragments <- list()
for(iN in seq(length(sequences))){
cat(paste(iN,"\n"))
fragments[[iN]] <-
lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))])
}
still very slow for length(sequences) ~ 7000.
Joh
On Friday 16 January 2009 14:23:47 Henrique Dallazuanna wrote:
Try this: lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))]) On Fri, Jan 16, 2009 at 11:06 AM, Johannes Graumann < johannes_graumann at web.de> wrote:
Hello,
I have a list of character vectors like this:
sequences <- list(
c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I"
,"M",
"N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","
F", "N","I","N","I","N","I","D","K","M","Y","I","H","*")
)
and another list of subset ranges like this:
indexes <- list(
list(
c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
)
)
What I now want to do is to subset each entry in "sequences"
(sequences[[1]]) with all ranges in the corresponding low level list in
"indexes" (indexes[[1]]). Here is what I came up with.
fragments <- list()
for(iN in seq(length(sequences))){
cat(paste(iN,"\n"))
tmpFragments <- sapply(
indexes[[iN]],
function(x){
sequences[[iN]][seq.int(x[1],x[2])]
}
)
fragments[[iN]] <- tmpFragments
}
This works fine, but "sequences" contains thousands of entries and the
corresponding "indexes" are sometimes hundreds of ranges long, so this
whole
process is EXTREMELY inefficient.
Does somebody out there take the challenge and show me a way on how to
speed
this up?
Thanks for any hints,
Joh
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 835 bytes Desc: This is a digitally signed message part. URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090116/cc294606/attachment-0002.bin>
Try this one; it is doing a list of 7000 in under 2 seconds:
sequences <- list(
+
+
+ c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I"
+ ,"M",
+
+
+ "N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","F",
"N","I","N","I","N","I","D","K","M","Y","I","H","*")
+ )
indexes <- list(
+ list( + c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51) + ) + )
indexes <- rep(indexes,10)
sequences <- rep(sequences,7000)
system.time({
+ fragments <- lapply(indexes, function(.seq){
+ lapply(.seq, function(.range){
+ .range <- seq(.range[1], .range[2]) # save since we use several times
+ lapply(sequences, '[', .range)
+ })
+ })
+ })
user system elapsed
1.24 0.00 1.26
On Fri, Jan 16, 2009 at 3:16 PM, Johannes Graumann
<johannes_graumann at web.de> wrote:
Thanks. Very elegant, but doesn't solve the problem of the outer "for" loop,
since I now would rewrite the code like so:
fragments <- list()
for(iN in seq(length(sequences))){
cat(paste(iN,"\n"))
fragments[[iN]] <-
lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))])
}
still very slow for length(sequences) ~ 7000.
Joh
On Friday 16 January 2009 14:23:47 Henrique Dallazuanna wrote:
Try this: lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))]) On Fri, Jan 16, 2009 at 11:06 AM, Johannes Graumann < johannes_graumann at web.de> wrote:
Hello,
I have a list of character vectors like this:
sequences <- list(
c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I"
,"M",
"N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","
F", "N","I","N","I","N","I","D","K","M","Y","I","H","*")
)
and another list of subset ranges like this:
indexes <- list(
list(
c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
)
)
What I now want to do is to subset each entry in "sequences"
(sequences[[1]]) with all ranges in the corresponding low level list in
"indexes" (indexes[[1]]). Here is what I came up with.
fragments <- list()
for(iN in seq(length(sequences))){
cat(paste(iN,"\n"))
tmpFragments <- sapply(
indexes[[iN]],
function(x){
sequences[[iN]][seq.int(x[1],x[2])]
}
)
fragments[[iN]] <- tmpFragments
}
This works fine, but "sequences" contains thousands of entries and the
corresponding "indexes" are sometimes hundreds of ranges long, so this
whole
process is EXTREMELY inefficient.
Does somebody out there take the challenge and show me a way on how to
speed
this up?
Thanks for any hints,
Joh
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
3 days later
Many thanks for this example, which doesn't entirely cover my case since I
have as many "indexes" entries as "sequences" entries. It was very
educational none the less and I used it to come up with something a bit
faster than what I had before. The main trick I used though was naming all
entries in "sequences" and "indexes" likes so
name(indexes) <- seq(length(indexes)
and then do a lapply on "names(indexes)", which allows me to access both
lists easily. What I end up with is this:
fragments <- lapply(
names(indexes),
function(x){
lapply(
indexes[[x]],
function(.range){
.range <- seq.int(
.range[1], .range[2]
)
unlist(lapply(sequences[x], '[', .range),use.names=FALSE)
}
)
}
)
Although this is still quite slow, it's much faster than what I had before.
Any further comments are highly welcome. I can send the real "sequences" and
"indexes" as exported R objects ...
Thanks, Joh
jim holtman wrote:
Try this one; it is doing a list of 7000 in under 2 seconds:
sequences <- list(
+
+
+
c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I"
+ ,"M", +
+
+
"N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","F",
"N","I","N","I","N","I","D","K","M","Y","I","H","*") + )
indexes <- list(
+ list( + c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51) + ) + )
indexes <- rep(indexes,10)
sequences <- rep(sequences,7000)
system.time({
+ fragments <- lapply(indexes, function(.seq){
+ lapply(.seq, function(.range){
+ .range <- seq(.range[1], .range[2]) # save since we use several
times
+ lapply(sequences, '[', .range)
+ })
+ })
+ })
user system elapsed
1.24 0.00 1.26
On Fri, Jan 16, 2009 at 3:16 PM, Johannes Graumann <johannes_graumann at web.de> wrote:
Thanks. Very elegant, but doesn't solve the problem of the outer "for"
loop, since I now would rewrite the code like so:
fragments <- list()
for(iN in seq(length(sequences))){
cat(paste(iN,"\n"))
fragments[[iN]] <-
lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq,
as.list(g))])
}
still very slow for length(sequences) ~ 7000.
Joh
On Friday 16 January 2009 14:23:47 Henrique Dallazuanna wrote:
Try this: lapply(indexes[[1]], function(g)sequences[[1]][do.call(seq, as.list(g))]) On Fri, Jan 16, 2009 at 11:06 AM, Johannes Graumann < johannes_graumann at web.de> wrote:
Hello, I have a list of character vectors like this: sequences <- list(
c("M","G","L","W","I","S","F","G","T","P","P","S","Y","T","Y","L","L","I"
,"M",
"N","H","K","L","L","L","I","N","N","N","N","L","T","E","V","H","T","Y","
F", "N","I","N","I","N","I","D","K","M","Y","I","H","*")
)
and another list of subset ranges like this:
indexes <- list(
list(
c(1,22),c(22,46),c(46, 51),c(1,46),c(22,51),c(1,51)
)
)
What I now want to do is to subset each entry in "sequences"
(sequences[[1]]) with all ranges in the corresponding low level list
in "indexes" (indexes[[1]]). Here is what I came up with.
fragments <- list()
for(iN in seq(length(sequences))){
cat(paste(iN,"\n"))
tmpFragments <- sapply(
indexes[[iN]],
function(x){
sequences[[iN]][seq.int(x[1],x[2])]
}
)
fragments[[iN]] <- tmpFragments
}
This works fine, but "sequences" contains thousands of entries and the
corresponding "indexes" are sometimes hundreds of ranges long, so this
whole
process is EXTREMELY inefficient.
Does somebody out there take the challenge and show me a way on how to
speed
this up?
Thanks for any hints,
Joh
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.