Skip to content
Prev 245821 / 398506 Next

Parsing a Simple Chemical Formula

I think the OP had a very limited need but there is something
more sophisticated that may be of larger insterest called "SMILES"
which attempts to capture some structural information about a molecule
in a text sting. Reducing pictures to tractable text is an important step
in many analysis efforts and i was curious what others may be able to say about
R support for things like this.

A quick google search turned up this, 

http://cran.r-project.org/web/packages/rpubchem/rpubchem.pdf

but I wasn't sure if there are more packages for manipulating
different ball and stick collections( the atom and bond descriptions
could just as easily represent any other collection of nodes
and connections).

You can get some idea what this does by typing your favorite chemical
name here,

http://pubchem.ncbi.nlm.nih.gov/

and the entries give something called "Canonical SMILES structures"
For example, 

http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=8030&loc=ec_rcs


UPAC Name: thiophene
Canonical SMILES: C1=CSC=C1
InChI: InChI=1S/C4H4S/c1-2-4-5-3-1/h1-4H
InChIKey: YTPLMLYBLZKORZ-UHFFFAOYSA-N [Click for Info]