Message-ID: <c5fabf84-2032-4fce-b913-502654494f9b@sapo.pt>
Date: 2023-10-18T15:53:42Z
From: Rui Barradas
Subject: Best way to test for numeric digits?
In-Reply-To: <6c4ac344-ddbe-4c3d-8ecd-c5a29657893d@syonic.eu>
?s 15:59 de 18/10/2023, Leonard Mada via R-help escreveu:
> Dear List members,
>
> What is the best way to test for numeric digits?
>
> suppressWarnings(as.double(c("Li", "Na", "K",? "2", "Rb", "Ca", "3")))
> # [1] NA NA NA? 2 NA NA? 3
> The above requires the use of the suppressWarnings function. Are there
> any better ways?
>
> I was working to extract chemical elements from a formula, something
> like this:
> split.symbol.character = function(x, rm.digits = TRUE) {
> ?? ?# Perl is partly broken in R 4.3, but this works:
> ?? ?regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
> ?? ?# stringi::stri_split(x, regex = regex);
> ?? ?s = strsplit(x, regex, perl = TRUE);
> ?? ?if(rm.digits) {
> ?? ???? s = lapply(s, function(s) {
> ?? ???? ??? isNotD = is.na(suppressWarnings(as.numeric(s)));
> ?? ???? ??? s = s[isNotD];
> ?? ???? });
> ?? ?}
> ?? ?return(s);
> }
>
> split.symbol.character(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"))
>
>
> Sincerely,
>
>
> Leonard
>
>
> Note:
> # works:
> regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
> strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)
>
>
> # broken in R 4.3.1
> # only slightly "erroneous" with stringi::stri_split
> regex = "(?<=[A-Z])(?![a-z]|$)|(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
> strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,
If you want to extract chemical elements symbols, the following might work.
It uses the periodic table in GitHub package chemr and a package stringr
function.
devtools::install_github("paleolimbot/chemr")
split_chem_elements <- function(x) {
data(pt, package = "chemr", envir = environment())
el <- pt$symbol[order(nchar(pt$symbol), decreasing = TRUE)]
pat <- paste(el, collapse = "|")
stringr::str_extract_all(x, pat)
}
mol <- c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl")
split_chem_elements(mol)
#> [[1]]
#> [1] "C" "Cl" "F"
#>
#> [[2]]
#> [1] "Li" "Al" "H"
#>
#> [[3]]
#> [1] "C" "Cl" "C" "O" "Al" "P" "O" "Si" "O" "Cl"
It is also possible to rewrite the function without calls to non base
packages but that will take some more work.
Hope this helps,
Rui Barradas
--
Este e-mail foi analisado pelo software antiv?rus AVG para verificar a presen?a de v?rus.
www.avg.com