Parsing a Simple Chemical Formula
try this:
f.extract <- function(formula)
+ {
+ # pattern to match the initial chemical
+ # assumes chemical starts with an upper case and optional lower
case followed
+ # by zero or more digits.
+ first <- "^([[:upper:]][[:lower:]]?)([0-9]*).*"
+ # inverse of above to remove the initial chemical
+ last <- "^[[:upper:]][[:lower:]]?[0-9]*(.*)"
+ result <- list()
+ extract <- formula
+ # repeat as long as there is data
+ while ((start <- nchar(extract)) > 0){
+ chem <- sub(first, '\\1 \\2', extract)
+ extract <- sub(last, '\\1', extract)
+ # if the number of characters is the same, then there was an error
+ if (nchar(extract) == start){
+ warning("Invalid formula:", formula)
+ return(NULL)
+ }
+ # append to the list
+ result[[length(result) + 1L]] <- strsplit(chem, ' ')[[1]]
+ }
+ result
+ }
f.extract("C5H11BrO")
[[1]] [1] "C" "5" [[2]] [1] "H" "11" [[3]] [1] "Br" [[4]] [1] "O"
f.extract("H2O")
[[1]] [1] "H" "2" [[2]] [1] "O"
f.extract("CCC")
[[1]] [1] "C" [[2]] [1] "C" [[3]] [1] "C"
f.extract("Crr") # bad
NULL
Warning message:
In f.extract("Crr") : Invalid formula:Crr
On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson <hanson at depauw.edu> wrote:
Hello R Folks... I've been looking around the 'net and I see many complex solutions in various languages to this question, but I have a pretty simple need (and I'm not much good at regex). ?I want to use a chemical formula as a function argument. ?The formula would be in "Hill order" which is to list C, then H, then all other elements in alphabetical order. ?My example will have only a limited number of elements, few enough that one can search directly for each element. ?So some examples would be C5H12, or C5H12O or C5H11BrO (note that for oxygen and bromine, O or Br, there is no following number meaning a 1 is implied). Let's say
form <- "C5H11BrO"
I'd like to get the count of each element, so in this case I need to extract C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular weight by mulitplying). ?Sounds pretty simple, but my experiments with grep and strsplit don't immediately clue me into an obvious solution. ?As I said, I don't need a general solution to the problem of calculating molecular weight from an arbitrary formula, that seems quite challenging, just a way to convert "form" into a list or data frame which I can then do the math on. Here's hoping this is a simple issue for more experienced R users! ?TIA, ?Bryan *********** Bryan Hanson Professor of Chemistry & Biochemistry
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve?