interval between specific characters in a string...

Thanks. Very informative.
I certainly missed this.

-- Bert

On Sat, Dec 3, 2022 at 3:49 PM Herv? Pag?s <hpages.on.github at gmail.com>
wrote:
On 03/12/2022 07:21, Bert Gunter wrote:
Perhaps it is worth pointing out that looping constructs like lapply()
can
be avoided and the procedure vectorized by mimicking Martin Morgan's
solution:

## s is the string to be searched.
diff(c(0,grep('b',strsplit(s,'')[[1]])))

However, Martin's solution is simpler and likely even faster as the regex
engine is unneeded:

diff(c(0, which(strsplit(s, "")[[1]] == "b"))) ## completely vectorized

This seems much preferable to me.
Of all the proposed solutions, Andrew Hart's solution seems the most
efficient:

   big_string <- strrep("abaaabbaaaaabaaabaaaaaaaaaaaaaaaaaaab", 500000)

   system.time(nchar(strsplit(big_string, split="b", fixed=TRUE)[[1]]) + 1)
   #    user  system elapsed
   #   0.736   0.028   0.764

   system.time(diff(c(0, which(strsplit(big_string, "", fixed=TRUE)[[1]]
== "b"))))
   #    user  system elapsed
   #  2.100   0.356   2.455

The bigger the string, the bigger the gap in performance.

Also, the bigger the average gap between 2 successive b's, the bigger
the gap in performance.

Finally: always use fixed=TRUE in strsplit() if you don't need to use
the regex engine.

Cheers,

H.

-- Bert

On Sat, Dec 3, 2022 at 12:49 AM Rui Barradas <ruipbarradas at sapo.pt>
wrote:

?s 17:18 de 02/12/2022, Evan Cooch escreveu:
Was wondering if there is an 'efficient/elegant' way to do the
following
(without tidyverse). Take a string

abaaabbaaaaabaaab

Its easy enough to count the number of times the character 'b' shows up
in the string, but...what I'm looking for is outputing the 'intervals'
between occurrences of 'b' (starting the counter at the beginning of
the
string). So, for the preceding example, 'b' shows up in positions

2, 6, 7, 13, 17

So, the interval data would be: 2, 4, 1, 6, 4

My main approach has been to simply output positions (say, something
like unlist(gregexpr('b', target_string))), and 'do the math' between
successive positions. Can anyone suggest a more elegant approach?

Thanks in advance...

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Hello,

I don't find your solution inelegant, it's even easy to write it as a
one-line function.

char_interval <- function(x, s) {
    lapply(gregexpr(x, s), \(y) c(head(y, 1), diff(y)))
}

target_string <-"abaaabbaaaaabaaab"
char_interval('b', target_string)
#> [[1]]
#> [1] 2 4 1 6 4

Hope this helps,

Rui Barradas

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

      [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Herv? Pag?s

Bioconductor Core Team
hpages.on.github at gmail.com

interval between specific characters in a string...

Thread (14 messages)