On 03/12/2022 07:21, Bert Gunter wrote:
Perhaps it is worth pointing out that looping constructs like lapply()
be avoided and the procedure vectorized by mimicking Martin Morgan's
solution:
## s is the string to be searched.
diff(c(0,grep('b',strsplit(s,'')[[1]])))
However, Martin's solution is simpler and likely even faster as the regex
engine is unneeded:
diff(c(0, which(strsplit(s, "")[[1]] == "b"))) ## completely vectorized
This seems much preferable to me.
Of all the proposed solutions, Andrew Hart's solution seems the most
efficient:
big_string <- strrep("abaaabbaaaaabaaabaaaaaaaaaaaaaaaaaaab", 500000)
system.time(nchar(strsplit(big_string, split="b", fixed=TRUE)[[1]]) + 1)
# user system elapsed
# 0.736 0.028 0.764
system.time(diff(c(0, which(strsplit(big_string, "", fixed=TRUE)[[1]]
== "b"))))
# user system elapsed
# 2.100 0.356 2.455
The bigger the string, the bigger the gap in performance.
Also, the bigger the average gap between 2 successive b's, the bigger
the gap in performance.
Finally: always use fixed=TRUE in strsplit() if you don't need to use
the regex engine.
Cheers,
H.
-- Bert
On Sat, Dec 3, 2022 at 12:49 AM Rui Barradas <ruipbarradas at sapo.pt>
?s 17:18 de 02/12/2022, Evan Cooch escreveu:
Was wondering if there is an 'efficient/elegant' way to do the
(without tidyverse). Take a string
abaaabbaaaaabaaab
Its easy enough to count the number of times the character 'b' shows up
in the string, but...what I'm looking for is outputing the 'intervals'
between occurrences of 'b' (starting the counter at the beginning of
string). So, for the preceding example, 'b' shows up in positions
2, 6, 7, 13, 17
So, the interval data would be: 2, 4, 1, 6, 4
My main approach has been to simply output positions (say, something
like unlist(gregexpr('b', target_string))), and 'do the math' between
successive positions. Can anyone suggest a more elegant approach?
Thanks in advance...