Skip to content
Prev 132266 / 398506 Next

Analyzing Publications from Pubmed via XML

Hi Armin -- 

See the help page for esearch

http://www.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html

especially the 'retmax' key.

A couple of other thoughts on this thread...

1) using the full path, e.g.,

ids <- xpathApply(doc, "/eSearchResult/IdList/Id", xmlValue)

is likely to lead to less grief in the long run, as you'll only select
elements of the node you're interested in, rather than any element,
anywhere in the document, labeled 'Id'

2) From a different post in the thread, things like
On Dec 16, 2007 2:53 PM, David Winsemius <dwinsemius at comcast.net> wrote:
[snip]
will lead to more trouble, because they assume that AbstractText, etc
occur exactly once in each record. It would seem better to extract the
relevant node, and query that, probably defining appropriate
defaults. I started with

xpath_or_na <- function(doc, q) {
    res <- xpathApply(doc, q, xmlValue)
    if (length(res)==1) res[[1]]
    else NA_character_
}

citn <- function(citation){
 	Abstract <- xpath_or_na(citation,
                           "/MedlineCitation/Article/Abstract/AbstractText")
 	Journal <- xpath_or_na(citation,
                          "/MedlineCitation/Article/Journal/Title")
 	Pmid <- xpath_or_na(citation,
                       "/MedlineCitation/PMID")
    c(Abstract=Abstract, Journal=Journal, Pmid=Pmid)
}

medline_q <- "/PubmedArticleSet/PubmedArticle/MedlineCitation"
res <- xpathApply(doc, medline_q, citn)

One would still have to coerce res into a data.frame. Also worth
thinking about each of the lines in citn -- e.g., clearly only applies
to Journals.  Eventually one wants to consult the DTD (basically, the
contract spelling out the content) of the document, confirm that the
xpath queries will perform correctly, and verify that the document
actually conforms to its DTD.

Following my own advice, I quickly found that doing things 'more
right' becomes quite complicated, and suddenly became satisfied with
the information I can get out of the 'annotate' package.

Martin

"Armin Goralczyk" <agoralczyk at gmail.com> writes:

  
    

Thread (26 messages)

Farrel Buchinsky Analyzing Publications from Pubmed via XML Dec 13 Rajarshi Guha Analyzing Publications from Pubmed via XML Dec 13 Farrel Buchinsky Analyzing Publications from Pubmed via XML Dec 13 Gabor Grothendieck Analyzing Publications from Pubmed via XML Dec 13 Rajarshi Guha Analyzing Publications from Pubmed via XML Dec 13 Robert Gentleman Analyzing Publications from Pubmed via XML Dec 13 Farrel Buchinsky Analyzing Publications from Pubmed via XML Dec 14 Farrel Buchinsky Analyzing Publications from Pubmed via XML Dec 14 Gabor Grothendieck Analyzing Publications from Pubmed via XML Dec 14 Duncan Temple Lang Analyzing Publications from Pubmed via XML Dec 14 David Winsemius Analyzing Publications from Pubmed via XML Dec 15 David Winsemius Analyzing Publications from Pubmed via XML Dec 15 Gabor Grothendieck Analyzing Publications from Pubmed via XML Dec 15 David Winsemius Analyzing Publications from Pubmed via XML Dec 16 Gabor Grothendieck Analyzing Publications from Pubmed via XML Dec 16 David Winsemius Analyzing Publications from Pubmed via XML Dec 16 David Winsemius Analyzing Publications from Pubmed via XML Dec 16 Duncan Temple Lang Analyzing Publications from Pubmed via XML Dec 16 Armin Goralczyk Analyzing Publications from Pubmed via XML Dec 17 Martin Morgan Analyzing Publications from Pubmed via XML Dec 17 Armin Goralczyk Analyzing Publications from Pubmed via XML Dec 17 David Winsemius Analyzing Publications from Pubmed via XML Dec 17 David Winsemius Analyzing Publications from Pubmed via XML Dec 17 Armin Goralczyk Analyzing Publications from Pubmed via XML Dec 18 David Winsemius Analyzing Publications from Pubmed via XML Dec 18 Armin Goralczyk Analyzing Publications from Pubmed via XML Dec 19