Analyzing Publications from Pubmed via XML
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Farrel Buchinsky wrote:
The problem is that the RSS feed you linked to, does not contain the year of the article in an easily accessible XML element. Rather you have to process the HTML content of the description element - which, is something R could do, but you'd be using the wrong tool for the job.
Yes. I have noticed that there two sorts of xml that pubmed will
provide. The kind I had hooked into was an rss feed which provides a
lot of the information simply as a formatted table for viewing in a
rss reader. There is another way to get the xml to come out with more
tags. However, I found the best way to do this is probably through the
bioconductor annotate package
x <- pubmed("18046565", "17978930", "17975511")
a <- xmlRoot(x)
numAbst <- length(xmlChildren(a))
absts <- list()
for (i in 1:numAbst) {
absts[[i]] <- buildPubMedAbst(a[[i]])
}
You can simplify the final 5 lines to absts = xmlApply(a, buildPubMedAbst) which is shorter, fractionally faster and handles cases where there are no abstracts.
I am now trying to work through that approach to see what I can come up with.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHYv6Z9p/Jzwa2QP4RAp0NAJ4pfGS7Jy9nwHMOGpT1jVM+IMedywCeOZPG 9GER8GI62Y24a+cQT7KbW08= =4TVP -----END PGP SIGNATURE-----