Skip to content
Prev 395298 / 398502 Next

Best way to test for numeric digits?

This seems unnecessarily complex.  Or rather,
it pushes the complexity into an arcane notation
What we really want is something that says "here is a string,
here is a pattern, give me all the substrings that match."
What we're given is a function that tells us where those
substrings are.

# greg.matches(pattern, text)
# accepts a POSIX regular expression, pattern
# and a text to search in.  Both arguments must be character strings
# (length(...) = 1) not longer vectors of strings.
# It returns a character vector of all the (non-overlapping)
# substrings of text as determined by gregexpr.

greg.matches <- function (pattern, text) {
    if (length(pattern) > 1) stop("pattern has too many elements")
    if (length(text)    > 1) stop(   "text has too many elements")
    match.info <- gregexpr(pattern, text)
    starts <- match.info[[1]]
    stops <- attr(starts, "match.length") - 1 + starts
    sapply(seq(along=starts), function (i) {
       substr(text, starts[i], stops[i])
    })
}

Given greg.matches, we can do the rest with very simple
and easily comprehended regular expressions.

# parse.chemical(formula)
# takes a simple chemical formula "<element><count>..." and
# returns a list with components
# $elements -- character -- the atom symbols
# $counts   -- number    -- the counts (missing counts taken as 1).
# BEWARE.  This does not handle formulas like "CH(OH)3".

parse.chemical <- function (formula) {
    parts <- greg.matches("[A-Z][a-z]*[0-9]*", formula)
    elements <- gsub("[0-9]+", "", parts)
    counts <- as.numeric(gsub("[^0-9]+", "", parts))
    counts <- ifelse(is.na(counts), 1, counts)
    list(elements=elements, counts=counts)
}
$elements
[1] "C"  "Cl" "F"

$counts
[1] 1 3 1
$elements
[1] "Li" "Al" "H"

$counts
[1]  4  4 16
$elements
 [1] "C"  "Cl" "C"  "O"  "Al" "P"  "O"  "Si" "O"  "Cl"

$counts
 [1] 1 2 1 2 1 1 4 1 4 1


On Thu, 19 Oct 2023 at 03:59, Leonard Mada via R-help <r-help at r-project.org>
wrote: