Analyzing Publications from Pubmed via XML
On Dec 13, 2007, at 9:03 PM, Farrel Buchinsky wrote:
I would like to track in which journals articles about a particular
disease
are being published. Creating a pubmed search is trivial. The search
provides data but obviously not as an R dataframe. I can get the
search to
export the data as an xml feed and the xml package seems to be able
to read
it.
xmlTreeParse("
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/erss.cgi?
rss_guid=0_JYbpsax0ZAAPnOd7nFAX-29fXDpTk5t8M4hx9ytT-
",isURL=TRUE)
But getting from there to a dataframe in which one column would be
the name
of the journal and another column would be the year (to keep things
simple)
seems to be beyond my capabilities.
If you're comfortable with Python (or Perl, Ruby etc), it'd be easier to just extract the required stuff from the raw feed - using ElementTree in Python makes this a trivial task Once you have the raw data you can read it into R ------------------------------------------------------------------- Rajarshi Guha <rguha at indiana.edu> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- A committee is a group that keeps the minutes and loses hours. -- Milton Berle