Skip to content
Back to formatted view

Raw Message

Message-ID: <971536df0712141236u450d977fj82cbdb6040de66c4@mail.gmail.com>
Date: 2007-12-14T20:36:17Z
From: Gabor Grothendieck
Subject: Analyzing Publications from Pubmed via XML
In-Reply-To: <bd93cdad0712141204j5cf2e10axf8a47f337ae7002c@mail.gmail.com>

On Dec 14, 2007 3:04 PM, Farrel Buchinsky <fjbuch at gmail.com> wrote:
> > The problem is that the RSS feed you linked to, does not contain the
> > year of the article in an easily accessible XML element. Rather you
> > have to process the HTML content of the description element - which,
> > is something R could do, but you'd be using the wrong tool for the job.
> >
>
> Yes. I have noticed that there two sorts of xml that pubmed will
> provide. The kind I had hooked into was an rss feed which provides a
> lot of the information simply as a formatted table for viewing in a
> rss reader. There is another way to get the xml to come out with more
> tags. However, I found the best way to do this is probably through the
> bioconductor annotate package
>
> x <- pubmed("18046565", "17978930", "17975511")
> a <- xmlRoot(x)
> numAbst <- length(xmlChildren(a))
> absts <- list()
> for (i in 1:numAbst) {
> absts[[i]] <- buildPubMedAbst(a[[i]])
>   }
>
> I am now trying to work through that approach to see what I can come up with.

Note that the lines after a<-xmlRoot(x) could be reduced to:

xmlSApply(a, buildPubMedAbst)