Analyzing Publications from Pubmed via XML
On Dec 14, 2007 3:04 PM, Farrel Buchinsky <fjbuch at gmail.com> wrote:
The problem is that the RSS feed you linked to, does not contain the year of the article in an easily accessible XML element. Rather you have to process the HTML content of the description element - which, is something R could do, but you'd be using the wrong tool for the job.
Yes. I have noticed that there two sorts of xml that pubmed will
provide. The kind I had hooked into was an rss feed which provides a
lot of the information simply as a formatted table for viewing in a
rss reader. There is another way to get the xml to come out with more
tags. However, I found the best way to do this is probably through the
bioconductor annotate package
x <- pubmed("18046565", "17978930", "17975511")
a <- xmlRoot(x)
numAbst <- length(xmlChildren(a))
absts <- list()
for (i in 1:numAbst) {
absts[[i]] <- buildPubMedAbst(a[[i]])
}
I am now trying to work through that approach to see what I can come up with.
Note that the lines after a<-xmlRoot(x) could be reduced to: xmlSApply(a, buildPubMedAbst)