On Jan 20, 2013, at 8:26 AM, Sam Steingold wrote:
* Bert Gunter <thagre.oregba at trar.pbz> [2013-01-19 22:26:46 -0800]:
But David W. and Bill Dunlap gave you solutions that also work and
are
much faster, no?!
Yes, indeed, and I am now using David's solution as it is fast
(enough), simple and concise.
I am a bit surprised by that. I do agree that it was simple and
concise, two programming virtues that I occasionally achieve.
However, when I tested it against either of Bill Dunlap's
suggestions mine was 15-40 times slower. (So I saved Bill's code and
made a mental note to study it's superiority.) I could see why the
f2 version was superior, since it progressively shrank the index
candidates for further comparison, but his first function used no
such logic and was still 15 times faster.
My test included the creation of the smaller data.frame which his
did not, but when I modified mine to only return the index vector,
that was the step that consumed all the time. I wondered if it were
`which` that consumed the time but it appears the inner step of
x==x[[1]] that was the culprit.
x <- data.frame(lapply(structure(1:10,names=letters[1:10]),
function(i) sample(c(NA,1,1,1,2,2,2,3), replace=TRUE, size=1e6)))
system.time({ keep <- x[[1]] == x[[2]]
+ for (i in seq_len(ncol(x))[-(1:2)]) {
+ keep <- keep & x[[i - 1]] == x[[i]]
+ }
+ z2 <- !is.na(keep) & keep})
user system elapsed
0.179 0.056 0.240
system.time({z <- rowSums(x==x[[1]]) })
user system elapsed
3.535 0.535 4.067
system.time({z <- x==x[[1]] })
user system elapsed
3.540 0.524 4.061