Skip to content
Prev 132065 / 398506 Next

Analyzing Publications from Pubmed via XML

On Dec 13, 2007, at 9:16 PM, Farrel Buchinsky wrote:

            
Certainly - probably as a better Python programmer than an R  
programmer, it's faster and neater for me to do it in Python:

from elementtree.ElementTree import XML
import urllib

url = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/erss.cgi? 
rss_guid=0_JYbpsax0ZAAPnOd7nFAX-29fXDpTk5t8M4hx9ytT-'
con = urllib.urlopen(url)
dat = con.read()
root = XML(dat)
items = root.findall("channel/item")
for item in items:
     category = item.find("category")
     print category.text

The problem is that the RSS feed you linked to, does not contain the  
year of the article in an easily accessible XML element. Rather you  
have to process the HTML content of the description element - which,  
is something R could do, but you'd be using the wrong tool for the job.

In general, if you're planning to analyze article data from Pubmed  
I'd suggest going through the Entrez CGI's (ESearch and EFetch)   
which will give you all the details of the articles in an XML format  
which can then be easily parsed in your language of choice.

This is something that can be done in R (the rpubchem package  
contains functions to process XML files from Pubchem, which might  
provide some pointers)

-------------------------------------------------------------------
Rajarshi Guha  <rguha at indiana.edu>
GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04  06F7 1BB9 E634 9B87 56EE
-------------------------------------------------------------------
Writing software is more fun than working.

Thread (26 messages)

Farrel Buchinsky Analyzing Publications from Pubmed via XML Dec 13 Rajarshi Guha Analyzing Publications from Pubmed via XML Dec 13 Farrel Buchinsky Analyzing Publications from Pubmed via XML Dec 13 Gabor Grothendieck Analyzing Publications from Pubmed via XML Dec 13 Rajarshi Guha Analyzing Publications from Pubmed via XML Dec 13 Robert Gentleman Analyzing Publications from Pubmed via XML Dec 13 Farrel Buchinsky Analyzing Publications from Pubmed via XML Dec 14 Farrel Buchinsky Analyzing Publications from Pubmed via XML Dec 14 Gabor Grothendieck Analyzing Publications from Pubmed via XML Dec 14 Duncan Temple Lang Analyzing Publications from Pubmed via XML Dec 14 David Winsemius Analyzing Publications from Pubmed via XML Dec 15 David Winsemius Analyzing Publications from Pubmed via XML Dec 15 Gabor Grothendieck Analyzing Publications from Pubmed via XML Dec 15 David Winsemius Analyzing Publications from Pubmed via XML Dec 16 Gabor Grothendieck Analyzing Publications from Pubmed via XML Dec 16 David Winsemius Analyzing Publications from Pubmed via XML Dec 16 David Winsemius Analyzing Publications from Pubmed via XML Dec 16 Duncan Temple Lang Analyzing Publications from Pubmed via XML Dec 16 Armin Goralczyk Analyzing Publications from Pubmed via XML Dec 17 Martin Morgan Analyzing Publications from Pubmed via XML Dec 17 Armin Goralczyk Analyzing Publications from Pubmed via XML Dec 17 David Winsemius Analyzing Publications from Pubmed via XML Dec 17 David Winsemius Analyzing Publications from Pubmed via XML Dec 17 Armin Goralczyk Analyzing Publications from Pubmed via XML Dec 18 David Winsemius Analyzing Publications from Pubmed via XML Dec 18 Armin Goralczyk Analyzing Publications from Pubmed via XML Dec 19