Skip to content
Prev 245809 / 398506 Next

Parsing a Simple Chemical Formula

try this:
+ {
+     # pattern to match the initial chemical
+     # assumes chemical starts with an upper case and optional lower
case followed
+     # by zero or more digits.
+     first <- "^([[:upper:]][[:lower:]]?)([0-9]*).*"
+     # inverse of above to remove the initial chemical
+     last <- "^[[:upper:]][[:lower:]]?[0-9]*(.*)"
+     result <- list()
+     extract <- formula
+     # repeat as long as there is data
+     while ((start <- nchar(extract)) > 0){
+         chem <- sub(first, '\\1 \\2', extract)
+         extract <- sub(last, '\\1', extract)
+         # if the number of characters is the same, then there was an error
+         if (nchar(extract) == start){
+             warning("Invalid formula:", formula)
+             return(NULL)
+         }
+         # append to the list
+         result[[length(result) + 1L]] <- strsplit(chem, ' ')[[1]]
+     }
+     result
+ }
[[1]]
[1] "C" "5"

[[2]]
[1] "H"  "11"

[[3]]
[1] "Br"

[[4]]
[1] "O"
[[1]]
[1] "H" "2"

[[2]]
[1] "O"
[[1]]
[1] "C"

[[2]]
[1] "C"

[[3]]
[1] "C"
NULL
Warning message:
In f.extract("Crr") : Invalid formula:Crr
On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson <hanson at depauw.edu> wrote: