Best way to test for numeric digits?
Dear Rui,
On 10/18/2023 8:45 PM, Rui Barradas wrote:
split_chem_elements <- function(x, rm.digits = TRUE) {
? regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
? if(rm.digits) {
??? stringr::str_replace_all(mol, regex, "#") |>
????? strsplit("#|[[:digit:]]") |>
????? lapply(\(x) x[nchar(x) > 0L])
? } else {
??? strsplit(x, regex, perl = TRUE)
? }
}
split.symbol.character = function(x, rm.digits = TRUE) {
? # Perl is partly broken in R 4.3, but this works:
? regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
? s <- strsplit(x, regex, perl = TRUE)
? if(rm.digits) {
??? s <- lapply(s, \(x) x[grep("[[:digit:]]+", x, invert = TRUE)])
? }
? s
}
You have a glitch (mol is hardcoded) in the code of the first function.
The times are similar, after correcting for that glitch.
Note:
- grep("[[:digit:]]", ...) behaves almost twice as slow as grep("[0-9]",
...)!
- corrected results below;
Sincerely,
Leonard
#######
split_chem_elements <- function(x, rm.digits = TRUE) {
? regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
? if(rm.digits) {
??? stringr::str_replace_all(x, regex, "#") |>
????? strsplit("#|[[:digit:]]") |>
????? lapply(\(x) x[nchar(x) > 0L])
? } else {
??? strsplit(x, regex, perl = TRUE)
? }
}
split.symbol.character = function(x, rm.digits = TRUE) {
? # Perl is partly broken in R 4.3, but this works:
? regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
? s <- strsplit(x, regex, perl = TRUE)
? if(rm.digits) {
??? s <- lapply(s, \(x) x[grep("[0-9]", x, invert = TRUE)])
? }
? s
}
mol <- c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl")
mol10000 <- rep(mol, 10000)
system.time(
? split_chem_elements(mol10000)
)
#?? user? system elapsed
#?? 0.58??? 0.00??? 0.58
system.time(
? split.symbol.character(mol10000)
)
#?? user? system elapsed
#?? 0.67??? 0.00??? 0.67