Skip to content
Prev 395293 / 398502 Next

Best way to test for numeric digits?

Dear Rui,
On 10/18/2023 8:45 PM, Rui Barradas wrote:
You have a glitch (mol is hardcoded) in the code of the first function. 
The times are similar, after correcting for that glitch.

Note:
- grep("[[:digit:]]", ...) behaves almost twice as slow as grep("[0-9]", 
...)!
- corrected results below;

Sincerely,

Leonard
#######

split_chem_elements <- function(x, rm.digits = TRUE) {
 ? regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
 ? if(rm.digits) {
 ??? stringr::str_replace_all(x, regex, "#") |>
 ????? strsplit("#|[[:digit:]]") |>
 ????? lapply(\(x) x[nchar(x) > 0L])
 ? } else {
 ??? strsplit(x, regex, perl = TRUE)
 ? }
}

split.symbol.character = function(x, rm.digits = TRUE) {
 ? # Perl is partly broken in R 4.3, but this works:
 ? regex <- "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])"
 ? s <- strsplit(x, regex, perl = TRUE)
 ? if(rm.digits) {
 ??? s <- lapply(s, \(x) x[grep("[0-9]", x, invert = TRUE)])
 ? }
 ? s
}

mol <- c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl")
mol10000 <- rep(mol, 10000)

system.time(
 ? split_chem_elements(mol10000)
)
#?? user? system elapsed
#?? 0.58??? 0.00??? 0.58

system.time(
 ? split.symbol.character(mol10000)
)
#?? user? system elapsed
#?? 0.67??? 0.00??? 0.67