An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20071213/f2d4b9db/attachment.pl
Analyzing Publications from Pubmed via XML
26 messages · Farrel Buchinsky, Gabor Grothendieck, Rajarshi Guha +5 more
Messages 1–25 of 26
On Dec 13, 2007, at 9:03 PM, Farrel Buchinsky wrote:
I would like to track in which journals articles about a particular
disease
are being published. Creating a pubmed search is trivial. The search
provides data but obviously not as an R dataframe. I can get the
search to
export the data as an xml feed and the xml package seems to be able
to read
it.
xmlTreeParse("
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/erss.cgi?
rss_guid=0_JYbpsax0ZAAPnOd7nFAX-29fXDpTk5t8M4hx9ytT-
",isURL=TRUE)
But getting from there to a dataframe in which one column would be
the name
of the journal and another column would be the year (to keep things
simple)
seems to be beyond my capabilities.
If you're comfortable with Python (or Perl, Ruby etc), it'd be easier to just extract the required stuff from the raw feed - using ElementTree in Python makes this a trivial task Once you have the raw data you can read it into R ------------------------------------------------------------------- Rajarshi Guha <rguha at indiana.edu> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- A committee is a group that keeps the minutes and loses hours. -- Milton Berle
An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20071213/8610572a/attachment.pl
On Dec 13, 2007 9:03 PM, Farrel Buchinsky <fjbuch at gmail.com> wrote:
I would like to track in which journals articles about a particular disease
are being published. Creating a pubmed search is trivial. The search
provides data but obviously not as an R dataframe. I can get the search to
export the data as an xml feed and the xml package seems to be able to read
it.
xmlTreeParse("
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/erss.cgi?rss_guid=0_JYbpsax0ZAAPnOd7nFAX-29fXDpTk5t8M4hx9ytT-
",isURL=TRUE)
But getting from there to a dataframe in which one column would be the name
of the journal and another column would be the year (to keep things simple)
seems to be beyond my capabilities.
Has anyone ever done this and could you share your script? Are there any
published examples where the end result is a dataframe.
I guess what I am looking for is an easy and simple way to parse the feed
and extract the data. Alternatively how does one turn an RSS feed into a CSV
file?
Try this:
library(XML)
doc <-
xmlTreeParse("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/erss.cgi?rss_guid=0_JYbpsax0ZAAPnOd7nFAX-29fXDpTk5t8M4hx9ytT-",
isURL = TRUE, useInternalNodes = TRUE)
sapply(c("//author", "//category"), xpathApply, doc = doc, fun = xmlValue)
On Dec 13, 2007, at 9:16 PM, Farrel Buchinsky wrote:
I am afraid not! The only thing I know about Python (or Perl, Ruby etc) is that they exist and that I have been able to download some amazing freeware or open source software thanks to their existence. The XML package and specifically the xmlTreeParse function looks as if it is begging to do the task for me. Is that not true?
Certainly - probably as a better Python programmer than an R programmer, it's faster and neater for me to do it in Python: from elementtree.ElementTree import XML import urllib url = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/erss.cgi? rss_guid=0_JYbpsax0ZAAPnOd7nFAX-29fXDpTk5t8M4hx9ytT-' con = urllib.urlopen(url) dat = con.read() root = XML(dat) items = root.findall("channel/item") for item in items: category = item.find("category") print category.text The problem is that the RSS feed you linked to, does not contain the year of the article in an easily accessible XML element. Rather you have to process the HTML content of the description element - which, is something R could do, but you'd be using the wrong tool for the job. In general, if you're planning to analyze article data from Pubmed I'd suggest going through the Entrez CGI's (ESearch and EFetch) which will give you all the details of the articles in an XML format which can then be easily parsed in your language of choice. This is something that can be done in R (the rpubchem package contains functions to process XML files from Pubchem, which might provide some pointers) ------------------------------------------------------------------- Rajarshi Guha <rguha at indiana.edu> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE ------------------------------------------------------------------- Writing software is more fun than working.
or just try looking in the annotate package from Bioconductor
Gabor Grothendieck wrote:
On Dec 13, 2007 9:03 PM, Farrel Buchinsky <fjbuch at gmail.com> wrote:
I would like to track in which journals articles about a particular disease
are being published. Creating a pubmed search is trivial. The search
provides data but obviously not as an R dataframe. I can get the search to
export the data as an xml feed and the xml package seems to be able to read
it.
xmlTreeParse("
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/erss.cgi?rss_guid=0_JYbpsax0ZAAPnOd7nFAX-29fXDpTk5t8M4hx9ytT-
",isURL=TRUE)
But getting from there to a dataframe in which one column would be the name
of the journal and another column would be the year (to keep things simple)
seems to be beyond my capabilities.
Has anyone ever done this and could you share your script? Are there any
published examples where the end result is a dataframe.
I guess what I am looking for is an easy and simple way to parse the feed
and extract the data. Alternatively how does one turn an RSS feed into a CSV
file?
Try this:
library(XML)
doc <-
xmlTreeParse("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/erss.cgi?rss_guid=0_JYbpsax0ZAAPnOd7nFAX-29fXDpTk5t8M4hx9ytT-",
isURL = TRUE, useInternalNodes = TRUE)
sapply(c("//author", "//category"), xpathApply, doc = doc, fun = xmlValue)
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org
The problem is that the RSS feed you linked to, does not contain the year of the article in an easily accessible XML element. Rather you have to process the HTML content of the description element - which, is something R could do, but you'd be using the wrong tool for the job.
Yes. I have noticed that there two sorts of xml that pubmed will
provide. The kind I had hooked into was an rss feed which provides a
lot of the information simply as a formatted table for viewing in a
rss reader. There is another way to get the xml to come out with more
tags. However, I found the best way to do this is probably through the
bioconductor annotate package
x <- pubmed("18046565", "17978930", "17975511")
a <- xmlRoot(x)
numAbst <- length(xmlChildren(a))
absts <- list()
for (i in 1:numAbst) {
absts[[i]] <- buildPubMedAbst(a[[i]])
}
I am now trying to work through that approach to see what I can come up with.
Farrel Buchinsky
On Dec 13, 2007 11:35 PM, Robert Gentleman <rgentlem at fhcrc.org> wrote:
or just try looking in the annotate package from Bioconductor
Yip. annotate seems to be the most streamlined way to do this.
1) How does one turn the list that is created into a dataframe whose
column names are along the lines of date, title, journal, authors etc
2) I have already created a standing search in pubmed using MyNCBI.
There are many ways I can feed those results to the pubmed() function.
The most brute force way of doing it is by running the search and
outputing the data as a UI List and getting that into the pubmed
brackets. A way that involved more finesse would allow me to create a
rss feed based on my search and then give the rss feed url to the
pubmed function. Or perhaps once could just plop the query inside the
pubmed functions
pubmed(somefunction("Laryngeal Neoplasms"[MeSH] AND "Papilloma"[MeSH])
OR ((("recurrence"[TIAB] NOT Medline[SB]) OR "recurrence"[MeSH Terms]
OR recurrent[Text Word]) AND respiratory[All Fields] AND
(("papilloma"[TIAB] NOT Medline[SB]) OR "papilloma"[MeSH Terms] OR
papillomatosis[Text Word])))
Does "somefunction" exist?
If there are any further questions do you think I should migrate this
conversation to the bioconductor mailing list?
Farrel Buchinsky
On Dec 14, 2007 3:04 PM, Farrel Buchinsky <fjbuch at gmail.com> wrote:
The problem is that the RSS feed you linked to, does not contain the year of the article in an easily accessible XML element. Rather you have to process the HTML content of the description element - which, is something R could do, but you'd be using the wrong tool for the job.
Yes. I have noticed that there two sorts of xml that pubmed will
provide. The kind I had hooked into was an rss feed which provides a
lot of the information simply as a formatted table for viewing in a
rss reader. There is another way to get the xml to come out with more
tags. However, I found the best way to do this is probably through the
bioconductor annotate package
x <- pubmed("18046565", "17978930", "17975511")
a <- xmlRoot(x)
numAbst <- length(xmlChildren(a))
absts <- list()
for (i in 1:numAbst) {
absts[[i]] <- buildPubMedAbst(a[[i]])
}
I am now trying to work through that approach to see what I can come up with.
Note that the lines after a<-xmlRoot(x) could be reduced to: xmlSApply(a, buildPubMedAbst)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Farrel Buchinsky wrote:
The problem is that the RSS feed you linked to, does not contain the year of the article in an easily accessible XML element. Rather you have to process the HTML content of the description element - which, is something R could do, but you'd be using the wrong tool for the job.
Yes. I have noticed that there two sorts of xml that pubmed will
provide. The kind I had hooked into was an rss feed which provides a
lot of the information simply as a formatted table for viewing in a
rss reader. There is another way to get the xml to come out with more
tags. However, I found the best way to do this is probably through the
bioconductor annotate package
x <- pubmed("18046565", "17978930", "17975511")
a <- xmlRoot(x)
numAbst <- length(xmlChildren(a))
absts <- list()
for (i in 1:numAbst) {
absts[[i]] <- buildPubMedAbst(a[[i]])
}
You can simplify the final 5 lines to absts = xmlApply(a, buildPubMedAbst) which is shorter, fractionally faster and handles cases where there are no abstracts.
I am now trying to work through that approach to see what I can come up with.
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHYv6Z9p/Jzwa2QP4RAp0NAJ4pfGS7Jy9nwHMOGpT1jVM+IMedywCeOZPG 9GER8GI62Y24a+cQT7KbW08= =4TVP -----END PGP SIGNATURE-----
"Farrel Buchinsky" <fjbuch at gmail.com> wrote in news:bd93cdad0712141216s23071d27n17d87a487ad06950 at mail.gmail.com:
On Dec 13, 2007 11:35 PM, Robert Gentleman <rgentlem at fhcrc.org> wrote:
or just try looking in the annotate package from Bioconductor
Yip. annotate seems to be the most streamlined way to do this. 1) How does one turn the list that is created into a dataframe whose column names are along the lines of date, title, journal, authors etc
Gabor's example already did that task.
2) I have already created a standing search in pubmed using MyNCBI.
There are many ways I can feed those results to the pubmed() function.
The most brute force way of doing it is by running the search and
outputing the data as a UI List and getting that into the pubmed
brackets. A way that involved more finesse would allow me to create a
rss feed based on my search and then give the rss feed url to the
pubmed function. Or perhaps once could just plop the query inside the
pubmed functions
pubmed(somefunction("Laryngeal Neoplasms"[MeSH] AND "Papilloma"[MeSH])
OR ((("recurrence"[TIAB] NOT Medline[SB]) OR "recurrence"[MeSH Terms]
OR recurrent[Text Word]) AND respiratory[All Fields] AND
(("papilloma"[TIAB] NOT Medline[SB]) OR "papilloma"[MeSH Terms] OR
papillomatosis[Text Word])))
Does "somefunction" exist?
I could not find it. The pubmed function appears to assume that you will already have a list of PMIDs. When I set up a function to take an arbitrary PubMed search string (quoted by the user) and return the PMIDs, I had success by following Gabor's example:
pm.srch<- function (){
srch.stem <-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=" query <-as.character(scan(file="",what="character")) doc <-xmlTreeParse(paste(srch.stem,query,sep=""),isURL = TRUE, useInternalNodes = TRUE) sapply(c("//Id"), xpathApply, doc = doc, fun = xmlValue) }
pm.srch()
1: "laryngeal neoplasms[mh]"
2:
Read 1 item
//Id
[1,] "18042931"
[2,] "18038886"
[3,] "17978930"
[4,] "17974987"
[5,] "17972507"
[6,] "17970149"
[7,] "17967299"
[8,] "17962724"
[9,] "17954109"
[10,] "17942038"
[11,] "17940076"
[12,] "17848290"
[13,] "17848288"
[14,] "17848287"
[15,] "17848278"
[16,] "17938330"
[17,] "17938329"
[18,] "17918311"
[19,] "17910347"
[20,] "17908862"
Emboldened by that minor success, I pushed on. Pubmed said your example
was malformed and I took their suggested modification:
("Laryngeal Neoplasms"[MeSH] AND "Papilloma"[MeSH]) OR (("recurrence"[TIAB] NOT Medline[SB]) OR "recurrence"[MeSH Terms] OR recurrent[Text Word]) AND respiratory[All Fields] AND (("papilloma"[TIAB] NOT Medline[SB]) OR "papilloma"[MeSH Terms] OR papillomatosis[Text Word])
That returned 400+ citations, and I put it into a text file.
After quite a bit of hacking (in the sense of ineffective chopping with
a dull ax), I finally came up with:
pm.srch<- function (){
srch.stem<-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term="
query<-readLines(con=file.choose())
query<-gsub("\\\"","",x=query)
doc<-xmlTreeParse(paste(srch.stem,query,sep=""),isURL = TRUE,
useInternalNodes = TRUE)
return(sapply(c("//Id"), xpathApply, doc = doc, fun = xmlValue) )
}
pm.srch() #choosing the search-file
//Id
[1,] "18046565"
[2,] "17978930"
[3,] "17975511"
[4,] "17935912"
[5,] "17851940"
[6,] "17765779"
[7,] "17688640"
[8,] "17638782"
[9,] "17627059"
[10,] "17599582"
[11,] "17589729"
[12,] "17585283"
[13,] "17568846"
[14,] "17560665"
[15,] "17547971"
[16,] "17428551"
[17,] "17419899"
[18,] "17419519"
[19,] "17385606"
[20,] "17366752"
David Winsemius
David Winsemius <dwinsemius at comcast.net> wrote in news:Xns9A077F740B4A0dNOTwinscomcast at 80.91.229.13:
"Farrel Buchinsky" <fjbuch at gmail.com> wrote in news:bd93cdad0712141216s23071d27n17d87a487ad06950 at mail.gmail.com:
On Dec 13, 2007 11:35 PM, Robert Gentleman <rgentlem at fhcrc.org> wrote:
or just try looking in the annotate package from Bioconductor
Yip. annotate seems to be the most streamlined way to do this. 1) How does one turn the list that is created into a dataframe whose column names are along the lines of date, title, journal, authors etc
Gabor's example already did that task.
Actually the object returned by Gabor's method was a list of lists. Here
is one way (probably very inefficient) of getting "doc" into a
data.frame:
colvals <-sapply(c("//title", "//author", "//category"), xpathApply,
doc = doc, fun = xmlValue)
titles=as.vector(unlist(colvals[1])[3:17])
# needed to drop extraneous titles for search name and an NCBI header
#>str(colvals)
#List of 3
# $ //title :List of 17
# ..$ : chr "PubMed: (\"Laryngeal Neoplasm..."
# ..$ : chr "NCBI PubMed"
authors=colvals[[2]]
jrnls=colvals[[3]]
# not sure why, but trying to do it in one step failed:
# cites<-data.frame(titles=as.vector(unlist(colvals[1])[3:17]),
# authors=colvals[[2]],jnrls=colvals[[3]])
# Error in data.frame(titles = as.vector(unlist(colvals[1])[3:17]),
# authors = colvals[[2]], :
# arguments imply differing number of rows: 15, 1
# but the following worked
cites<-data.frame(titles=as.vector(titles))
cites$author<-authors
cites$jrnls<-jrnls
cites
I am still wondering how to extract material that does not have an XML
tag. Each item looks like:
<item>
<title>Gastroesophageal reflux in patients with recurrent laryngeal
papillomatosis.</title>
<link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?
tmpl=NoSidebarfile&db=PubMed&cmd=Retrieve&list_uids=17589729
&dopt=Abstract</link>
<description>
<![CDATA[
<table border="0" width="100%"><tr><td align="left"><a
href="http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0034-
72992007000200011&lng=en&nrm=iso&tlng=en"><img
src="http://www.ncbi.nlm.nih.gov/entrez/query/egifs/http:--www.scielo.br-
img-scielo_en.gif" border="0"/></a> </td><td align="right"><a
href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?
db=PubMed&cmd=Display&dopt=PubMed_PubMed&from_uid=17589729">
Related Articles</a></td></tr></table>
<p><b>Gastroesophageal reflux in patients with recurrent
laryngeal papillomatosis.</b></p>
<p>Rev Bras Otorrinolaringol (Engl Ed). 2007 Mar-Apr;73(2):210-4
</p>
<p>Authors: Pignatari SS, Liriano RY, Avelino MA, Testa JR,
Fujita R, De Marco EK</p>
<p>Evidence of a relation between gastroesophaeal reflux and
pediatric respiratory disorders increases every year. Many respiratory
symptoms and clinical conditions such as stridor, chronic cough, and
recurrent pneumonia and bronchitis appear to be related to
gastroesophageal reflux. Some studies have also suggested that
gastroesophageal reflux may be associated with recurrent laryngeal
papillomatosis, contributing to its recurrence and severity. AIM: the aim
of this study was to verify the frequency and intensity of
gastroesophageal reflux in children with recurrent laryngeal
papillomatosis. MATERIAL AND METHODS: ten children of both genders, aged
between 3 and 12 years, presenting laryngeal papillomatosis, were
included in this study. The children underwent 24-hour double-probe pH-
metry. RESULTS: fifty percent of the patients had evidence of
gastroesophageal reflux at the distal sphincter; 90% presented reflux at
the proximal sphincter. CONCLUSION: the frequency of proximal
gastroesophageal reflux is significantly increased in patients with
recurrent laryngeal papillomatosis.</p>
<p>PMID: 17589729 [PubMed - in process]</p> ]]>
</description>
<author>Pignatari SS, Liriano RY, Avelino MA, Testa JR, Fujita R, De
Marco EK</author>
<category>Rev Bras Otorrinolaringol (Engl Ed)</category>
<guid isPermaLink="false">PubMed:17589729</guid>
</item>
I would like to access, for instance, the PMID or the abstract within the
<description> element, but I do not think that they have names in the the
same way that <author> or <category> have xml named nodes. I suspect that
getting the output in a different format, say as MEDLINE, might produce
output that was tagged more completely.
David Winsemius
If we can assume that the abstract is always the 4th paragraph then we
can try something like this:
library(XML)
doc <- xmlTreeParse("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/erss.cgi?rss_guid=0_JYbpsax0ZAAPnOd7nFAX-29fXDpTk5t8M4hx9ytT-",
isURL = TRUE, useInternalNodes = TRUE, trim = TRUE)
out <- cbind(
Author = unlist(xpathApply(doc, "//author", xmlValue)),
PMID = gsub(".*:", "", unlist(xpathApply(doc, "//guid", xmlValue))),
Abstract = unlist(xpathApply(doc, "//description",
function(x) {
on.exit(free(doc2))
doc2 <- htmlTreeParse(xmlValue(x)[[1]], asText = TRUE,
useInternalNodes = TRUE, trim = TRUE)
xpathApply(doc2, "//p[4]", xmlValue)
}
)))
free(doc)
substring(out, 1, 25) # display first 25 chars of each field
The last line produces (it may look messed up in this email):
substring(out, 1, 25) # display it
Author PMID Abstract [1,] " Goon P, Sonnex C, Jani P" "18046565" "Human papillomaviruses (H" [2,] " Rad MH, Alizadeh E, Ilkh" "17978930" "Recurrent laryngeal papil" [3,] " Lee LA, Cheng AJ, Fang T" "17975511" "OBJECTIVES:: Papillomas o" [4,] " Gerein V, Schmandt S, Ba" "17935912" "BACKGROUND: Human papillo" [5,] " Hopp R, Natarajan N, Lew" "17908862" "" [6,] " Preuss SF, Klussmann JP," "17851940" "CONCLUSIONS: The presente" [7,] " Mouadeb DA, Belafsky PC" "17765779" "OBJECTIVES: The 585nm pul" [8,] " Thompson L" "17702311" "" [9,] " Schaffer A, Brotherton J" "17688640" "" [10,] " Stephen JK, Vaught LE, C" "17638782" "OBJECTIVE: To investigate" [11,] " Shah KV, Westra WH" "17627059" "" [12,] " Koufman JA, Rees CJ, Fra" "17599582" "BACKGROUND: Unsedated off" [13,] " Akst LM, Broadhurst MS, " "17592395" "" [14,] " Pignatari SS, Liriano RY" "17589729" "Evidence of a relation be"
On Dec 15, 2007 10:13 PM, David Winsemius <dwinsemius at comcast.net> wrote:
David Winsemius <dwinsemius at comcast.net> wrote in news:Xns9A077F740B4A0dNOTwinscomcast at 80.91.229.13:
"Farrel Buchinsky" <fjbuch at gmail.com> wrote in news:bd93cdad0712141216s23071d27n17d87a487ad06950 at mail.gmail.com:
On Dec 13, 2007 11:35 PM, Robert Gentleman <rgentlem at fhcrc.org> wrote:
or just try looking in the annotate package from Bioconductor
Yip. annotate seems to be the most streamlined way to do this. 1) How does one turn the list that is created into a dataframe whose column names are along the lines of date, title, journal, authors etc
Gabor's example already did that task.
Actually the object returned by Gabor's method was a list of lists. Here
is one way (probably very inefficient) of getting "doc" into a
data.frame:
colvals <-sapply(c("//title", "//author", "//category"), xpathApply,
doc = doc, fun = xmlValue)
titles=as.vector(unlist(colvals[1])[3:17])
# needed to drop extraneous titles for search name and an NCBI header
#>str(colvals)
#List of 3
# $ //title :List of 17
# ..$ : chr "PubMed: (\"Laryngeal Neoplasm..."
# ..$ : chr "NCBI PubMed"
authors=colvals[[2]]
jrnls=colvals[[3]]
# not sure why, but trying to do it in one step failed:
# cites<-data.frame(titles=as.vector(unlist(colvals[1])[3:17]),
# authors=colvals[[2]],jnrls=colvals[[3]])
# Error in data.frame(titles = as.vector(unlist(colvals[1])[3:17]),
# authors = colvals[[2]], :
# arguments imply differing number of rows: 15, 1
# but the following worked
cites<-data.frame(titles=as.vector(titles))
cites$author<-authors
cites$jrnls<-jrnls
cites
I am still wondering how to extract material that does not have an XML
tag. Each item looks like:
<item>
<title>Gastroesophageal reflux in patients with recurrent laryngeal
papillomatosis.</title>
<link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?
tmpl=NoSidebarfile&db=PubMed&cmd=Retrieve&list_uids=17589729
&dopt=Abstract</link>
<description>
<![CDATA[
<table border="0" width="100%"><tr><td align="left"><a
href="http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0034-
72992007000200011&lng=en&nrm=iso&tlng=en"><img
src="http://www.ncbi.nlm.nih.gov/entrez/query/egifs/http:--www.scielo.br-
img-scielo_en.gif" border="0"/></a> </td><td align="right"><a
href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?
db=PubMed&cmd=Display&dopt=PubMed_PubMed&from_uid=17589729">
Related Articles</a></td></tr></table>
<p><b>Gastroesophageal reflux in patients with recurrent
laryngeal papillomatosis.</b></p>
<p>Rev Bras Otorrinolaringol (Engl Ed). 2007 Mar-Apr;73(2):210-4
</p>
<p>Authors: Pignatari SS, Liriano RY, Avelino MA, Testa JR,
Fujita R, De Marco EK</p>
<p>Evidence of a relation between gastroesophaeal reflux and
pediatric respiratory disorders increases every year. Many respiratory
symptoms and clinical conditions such as stridor, chronic cough, and
recurrent pneumonia and bronchitis appear to be related to
gastroesophageal reflux. Some studies have also suggested that
gastroesophageal reflux may be associated with recurrent laryngeal
papillomatosis, contributing to its recurrence and severity. AIM: the aim
of this study was to verify the frequency and intensity of
gastroesophageal reflux in children with recurrent laryngeal
papillomatosis. MATERIAL AND METHODS: ten children of both genders, aged
between 3 and 12 years, presenting laryngeal papillomatosis, were
included in this study. The children underwent 24-hour double-probe pH-
metry. RESULTS: fifty percent of the patients had evidence of
gastroesophageal reflux at the distal sphincter; 90% presented reflux at
the proximal sphincter. CONCLUSION: the frequency of proximal
gastroesophageal reflux is significantly increased in patients with
recurrent laryngeal papillomatosis.</p>
<p>PMID: 17589729 [PubMed - in process]</p> ]]>
</description>
<author>Pignatari SS, Liriano RY, Avelino MA, Testa JR, Fujita R, De
Marco EK</author>
<category>Rev Bras Otorrinolaringol (Engl Ed)</category>
<guid isPermaLink="false">PubMed:17589729</guid>
</item>
I would like to access, for instance, the PMID or the abstract within the
<description> element, but I do not think that they have names in the the
same way that <author> or <category> have xml named nodes. I suspect that
getting the output in a different format, say as MEDLINE, might produce
output that was tagged more completely.
--
David Winsemius
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 15 Dec 2007, you wrote in gmane.comp.lang.r.general:
If we can assume that the abstract is always the 4th paragraph then we
can try something like this:
library(XML)
doc <-
xmlTreeParse("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/erss.cgi?rss
_guid=0_JYbpsax0ZAAPnOd7nFAX-29fXDpTk5t8M4hx9ytT-", isURL = TRUE,
useInternalNodes = TRUE, trim = TRUE)
out <- cbind(
Author = unlist(xpathApply(doc, "//author", xmlValue)),
PMID = gsub(".*:", "", unlist(xpathApply(doc, "//guid",
xmlValue))),
Abstract = unlist(xpathApply(doc, "//description",
function(x) {
on.exit(free(doc2))
doc2 <- htmlTreeParse(xmlValue(x)[[1]], asText = TRUE,
useInternalNodes = TRUE, trim = TRUE)
xpathApply(doc2, "//p[4]", xmlValue)
}
)))
free(doc)
substring(out, 1, 25) # display first 25 chars of each field
The last line produces (it may look messed up in this email):
substring(out, 1, 25) # display it
Author PMID Abstract
[1,] " Goon P, Sonnex C, Jani P" "18046565" "Human papillomaviruses (H" [2,] " Rad MH, Alizadeh E, Ilkh" "17978930" "Recurrent laryngeal papil" [3,] " Lee LA, Cheng AJ, Fang T" "17975511" "OBJECTIVES:: Papillomas o" [4,] " Gerein V, Schmandt S, Ba" "17935912" "BACKGROUND: Human papillo" snip
It looked beautifully regular in my newsreader. It is helpful to see an
example showing the indexed access to nodes. It was also helpful to see the
example of substring for column display. Thank you (for this and all of
your other contributions.)
I find upon further browsing that the pmfetch access point is obsolete.
Experimentation with the PubMed eFetch server access point results in fully
xml-tagged results:
e.fetch.doc<- function (){
fetch.stem <-
"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?"
src.mode <- "db=pubmed&retmode=xml&"
request <- "id=11045395"
doc<-xmlTreeParse(paste(fetch.stem,src.mode,request,sep=""),
isURL = TRUE, useInternalNodes = TRUE)
}
# in the debugging phase I needed to set useInternalNodes = TRUE to see the
tags. Never did find a way to "print" them when internal.
doc<-e.fetch.doc()
get.info<- function(doc){
df<-cbind(
Abstract = unlist(xpathApply(doc, "//AbstractText", xmlValue)),
Journal = unlist(xpathApply(doc, "//Title", xmlValue)),
Pmid = unlist(xpathApply(doc, "//PMID", xmlValue))
)
return(df)
}
# this works
substring(get.info(doc), 1, 25)
Abstract Journal Pmid [1,] "We studied the prevalence" "Pediatric nephrology (Ber" "11045395"
David Winsemius
On Dec 16, 2007 2:53 PM, David Winsemius <dwinsemius at comcast.net> wrote:
# in the debugging phase I needed to set useInternalNodes = TRUE to see the tags. Never did find a way to "print" them when internal.
I assume you mean FALSE. See: ?saveXML
"Gabor Grothendieck" <ggrothendieck at gmail.com> wrote in news:971536df0712161226j2cddb7c6qa99992ae7366ed63 at mail.gmail.com:
On Dec 16, 2007 2:53 PM, David Winsemius <dwinsemius at comcast.net> wrote:
# in the debugging phase I needed to set useInternalNodes = TRUE to see the tags. Never did find a way to "print" them when internal.
I assume you mean FALSE. See: ?saveXML
You're correct, yet again; I did a copy/paste/forget-to-edit. And thanks for the further tip.
David
"Gabor Grothendieck" <ggrothendieck at gmail.com> wrote in news:971536df0712161226j2cddb7c6qa99992ae7366ed63 at mail.gmail.com:
On Dec 16, 2007 2:53 PM, David Winsemius <dwinsemius at comcast.net> wrote:
# Never did find a way to "print" them when internal.
?saveXML
And now I understand where that odd "\n <text>" originated before I changed the searched-for node name from \\Abstract to \\AbstractText. It's a remnant from the pretty-printing of the XML tree after excising the intervening node name.
David Winsemius
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
David Winsemius wrote:
On 15 Dec 2007, you wrote in gmane.comp.lang.r.general:
If we can assume that the abstract is always the 4th paragraph then we
can try something like this:
library(XML)
doc <-
xmlTreeParse("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/erss.cgi?rss
_guid=0_JYbpsax0ZAAPnOd7nFAX-29fXDpTk5t8M4hx9ytT-", isURL = TRUE,
useInternalNodes = TRUE, trim = TRUE)
out <- cbind(
Author = unlist(xpathApply(doc, "//author", xmlValue)),
PMID = gsub(".*:", "", unlist(xpathApply(doc, "//guid",
xmlValue))),
Abstract = unlist(xpathApply(doc, "//description",
function(x) {
on.exit(free(doc2))
doc2 <- htmlTreeParse(xmlValue(x)[[1]], asText = TRUE,
useInternalNodes = TRUE, trim = TRUE)
xpathApply(doc2, "//p[4]", xmlValue)
}
)))
free(doc)
substring(out, 1, 25) # display first 25 chars of each field
The last line produces (it may look messed up in this email):
substring(out, 1, 25) # display it
Author PMID Abstract
[1,] " Goon P, Sonnex C, Jani P" "18046565" "Human papillomaviruses (H" [2,] " Rad MH, Alizadeh E, Ilkh" "17978930" "Recurrent laryngeal papil" [3,] " Lee LA, Cheng AJ, Fang T" "17975511" "OBJECTIVES:: Papillomas o" [4,] " Gerein V, Schmandt S, Ba" "17935912" "BACKGROUND: Human papillo" snip
It looked beautifully regular in my newsreader. It is helpful to see an
example showing the indexed access to nodes. It was also helpful to see the
example of substring for column display. Thank you (for this and all of
your other contributions.)
I find upon further browsing that the pmfetch access point is obsolete.
Experimentation with the PubMed eFetch server access point results in fully
xml-tagged results:
e.fetch.doc<- function (){
fetch.stem <-
"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?"
src.mode <- "db=pubmed&retmode=xml&"
request <- "id=11045395"
doc<-xmlTreeParse(paste(fetch.stem,src.mode,request,sep=""),
isURL = TRUE, useInternalNodes = TRUE)
}
# in the debugging phase I needed to set useInternalNodes = TRUE to see the
tags. Never did find a way to "print" them when internal.
saveXML(node) will return a string giving the XML content of that node as tree.
doc<-e.fetch.doc()
get.info<- function(doc){
df<-cbind(
Abstract = unlist(xpathApply(doc, "//AbstractText", xmlValue)),
Journal = unlist(xpathApply(doc, "//Title", xmlValue)),
Pmid = unlist(xpathApply(doc, "//PMID", xmlValue))
)
return(df)
}
# this works
substring(get.info(doc), 1, 25)
Abstract Journal Pmid [1,] "We studied the prevalence" "Pediatric nephrology (Ber" "11045395"
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHZcKo9p/Jzwa2QP4RAnu3AJ9ucFyb17rm48PLQaPTw6VWyrZWSQCdG0rT zdLB6mkNPFh5lWgNgb70sDc= =SR2E -----END PGP SIGNATURE-----
On Dec 15, 2007 6:31 PM, David Winsemius <dwinsemius at comcast.net> wrote:
After quite a bit of hacking (in the sense of ineffective chopping with
a dull ax), I finally came up with:
pm.srch<- function (){
srch.stem<-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term="
query<-readLines(con=file.choose())
query<-gsub("\\\"","",x=query)
doc<-xmlTreeParse(paste(srch.stem,query,sep=""),isURL = TRUE,
useInternalNodes = TRUE)
return(sapply(c("//Id"), xpathApply, doc = doc, fun = xmlValue) )
}
pm.srch() #choosing the search-file
//Id
[1,] "18046565"
[2,] "17978930"
[3,] "17975511"
[4,] "17935912"
[5,] "17851940"
[6,] "17765779"
[7,] "17688640"
[8,] "17638782"
[9,] "17627059"
[10,] "17599582"
[11,] "17589729"
[12,] "17585283"
[13,] "17568846"
[14,] "17560665"
[15,] "17547971"
[16,] "17428551"
[17,] "17419899"
[18,] "17419519"
[19,] "17385606"
[20,] "17366752"
I tried the example above, but only the first 20 PMIDs will be returned. How can I circumvent this (I guesss its a restraint from pubmed)?
Armin Goralczyk, M.D. -- Universit?tsmedizin G?ttingen Abteilung Allgemein- und Viszeralchirurgie Rudolf-Koch-Str. 40 39099 G?ttingen -- Dept. of General Surgery University of G?ttingen G?ttingen, Germany -- http://www.chirurgie-goettingen.de
Hi Armin -- See the help page for esearch http://www.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html especially the 'retmax' key. A couple of other thoughts on this thread... 1) using the full path, e.g., ids <- xpathApply(doc, "/eSearchResult/IdList/Id", xmlValue) is likely to lead to less grief in the long run, as you'll only select elements of the node you're interested in, rather than any element, anywhere in the document, labeled 'Id' 2) From a different post in the thread, things like
On Dec 16, 2007 2:53 PM, David Winsemius <dwinsemius at comcast.net> wrote:
[snip]
get.info<- function(doc){
df<-cbind(
Abstract = unlist(xpathApply(doc, "//AbstractText", xmlValue)),
Journal = unlist(xpathApply(doc, "//Title", xmlValue)),
Pmid = unlist(xpathApply(doc, "//PMID", xmlValue))
)
return(df)
}
will lead to more trouble, because they assume that AbstractText, etc
occur exactly once in each record. It would seem better to extract the
relevant node, and query that, probably defining appropriate
defaults. I started with
xpath_or_na <- function(doc, q) {
res <- xpathApply(doc, q, xmlValue)
if (length(res)==1) res[[1]]
else NA_character_
}
citn <- function(citation){
Abstract <- xpath_or_na(citation,
"/MedlineCitation/Article/Abstract/AbstractText")
Journal <- xpath_or_na(citation,
"/MedlineCitation/Article/Journal/Title")
Pmid <- xpath_or_na(citation,
"/MedlineCitation/PMID")
c(Abstract=Abstract, Journal=Journal, Pmid=Pmid)
}
medline_q <- "/PubmedArticleSet/PubmedArticle/MedlineCitation"
res <- xpathApply(doc, medline_q, citn)
One would still have to coerce res into a data.frame. Also worth
thinking about each of the lines in citn -- e.g., clearly only applies
to Journals. Eventually one wants to consult the DTD (basically, the
contract spelling out the content) of the document, confirm that the
xpath queries will perform correctly, and verify that the document
actually conforms to its DTD.
Following my own advice, I quickly found that doing things 'more
right' becomes quite complicated, and suddenly became satisfied with
the information I can get out of the 'annotate' package.
Martin
"Armin Goralczyk" <agoralczyk at gmail.com> writes:
On Dec 15, 2007 6:31 PM, David Winsemius <dwinsemius at comcast.net> wrote:
After quite a bit of hacking (in the sense of ineffective chopping with
a dull ax), I finally came up with:
pm.srch<- function (){
srch.stem<-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term="
query<-readLines(con=file.choose())
query<-gsub("\\\"","",x=query)
doc<-xmlTreeParse(paste(srch.stem,query,sep=""),isURL = TRUE,
useInternalNodes = TRUE)
return(sapply(c("//Id"), xpathApply, doc = doc, fun = xmlValue) )
}
pm.srch() #choosing the search-file
//Id
[1,] "18046565"
[2,] "17978930"
[3,] "17975511"
[4,] "17935912"
[5,] "17851940"
[6,] "17765779"
[7,] "17688640"
[8,] "17638782"
[9,] "17627059"
[10,] "17599582"
[11,] "17589729"
[12,] "17585283"
[13,] "17568846"
[14,] "17560665"
[15,] "17547971"
[16,] "17428551"
[17,] "17419899"
[18,] "17419519"
[19,] "17385606"
[20,] "17366752"
I tried the example above, but only the first 20 PMIDs will be returned. How can I circumvent this (I guesss its a restraint from pubmed)? -- Armin Goralczyk, M.D. -- Universit?tsmedizin G?ttingen Abteilung Allgemein- und Viszeralchirurgie Rudolf-Koch-Str. 40 39099 G?ttingen -- Dept. of General Surgery University of G?ttingen G?ttingen, Germany -- http://www.chirurgie-goettingen.de
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M2 B169 Phone: (206) 667-2793
On Dec 15, 2007 6:31 PM, David Winsemius <dwinsemius at comcast.net> wrote:
pm.srch<- function (){
srch.stem <-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=" query <-as.character(scan(file="",what="character")) doc <-xmlTreeParse(paste(srch.stem,query,sep=""),isURL = TRUE, useInternalNodes = TRUE) sapply(c("//Id"), xpathApply, doc = doc, fun = xmlValue) }
pm.srch()
1: "laryngeal neoplasms[mh]"
2:
Read 1 item
//Id
[1,] "18042931"
[2,] "18038886"
[3,] "17978930"
[4,] "17974987"
[5,] "17972507"
[6,] "17970149"
[7,] "17967299"
[8,] "17962724"
[9,] "17954109"
[10,] "17942038"
[11,] "17940076"
[12,] "17848290"
[13,] "17848288"
[14,] "17848287"
[15,] "17848278"
[16,] "17938330"
[17,] "17938329"
[18,] "17918311"
[19,] "17910347"
[20,] "17908862"
I tried the above function with simple search terms and it worked fine for me (also more output thanks to Martin's post) but when I use search terms attributed to certain fields, i.e. with [au] or [ta], I get the following error message:
pm.srch()
1: "laryngeal neoplasms[mh]"
2:
Read 1 item
Fehler in .Call("RS_XML_ParseTree", as.character(file), handlers,
as.logical(ignoreBlanks), :
error in creating parser for
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=laryngeal
neoplasms[mh]
I/O warning : failed to load external entity
"http%3A//eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi%3Fdb=pubmed&term=laryngeal%20neoplasms%5Bmh%5D"
What's wrong? Thanks for any help
Armin Goralczyk, M.D. -- Universit?tsmedizin G?ttingen Abteilung Allgemein- und Viszeralchirurgie Rudolf-Koch-Str. 40 39099 G?ttingen -- Dept. of General Surgery University of G?ttingen G?ttingen, Germany -- http://www.chirurgie-goettingen.de
"Armin Goralczyk" <agoralczyk at gmail.com> wrote in news:a695fbee0712171238g4995040x579e58f52f83376e at mail.gmail.com:
On Dec 15, 2007 6:31 PM, David Winsemius <dwinsemius at comcast.net> wrote:
pm.srch<- function (){
srch.stem <-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pub med&term=" query <-as.character(scan(file="",what="character")) doc <-xmlTreeParse(paste(srch.stem,query,sep=""),isURL = TRUE, useInternalNodes = TRUE) sapply(c("//Id"), xpathApply, doc = doc, fun = xmlValue) }
pm.srch()
1: "laryngeal neoplasms[mh]"
2:
Read 1 item
//Id
[1,] "18042931"
snipped list of IDs
I tried the above function with simple search terms and it worked fine for me (also more output thanks to Martin's post) but when I use search terms attributed to certain fields, i.e. with [au] or [ta], I get the following error message:
pm.srch()
1: "laryngeal neoplasms[mh]"
2:
Read 1 item
Fehler in .Call("RS_XML_ParseTree", as.character(file), handlers,
as.logical(ignoreBlanks), :
error in creating parser for
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&ter
m=laryngeal neoplasms[mh]
I/O warning : failed to load external entity
"http%3A//eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi%3Fdb=pubme
d&term=laryngeal%20neoplasms%5Bmh%5D"
What's wrong?
I'm not sure. You included my simple example. rather than your search string that provoked an error. This is an example search that one can find on the how-to page for literature searches with /esearch: http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=PNAS[ta]+AND+97[vi]&retstart=6&retmax=6&tool=biomed3 I am wondering if you used spaces, rather than "+"'s? If so then you may want your function to do more gsub-processing of the input string. When I use the search terms in NCBI's example I get:
pm.srch<- function (){
+ srch.stem<-"http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=" + query<-as.character(scan(file="",what="character")) + doc<-xmlTreeParse(paste(srch.stem,query,sep=""),isURL = TRUE, useInternalNodes = TRUE) + sapply(c("//Id"), xpathApply, doc = doc, fun = xmlValue) + }
doc.xml<-pm.srch()
1: "PNAS[ta]+AND+97[vi]" 2: Read 1 item
doc.xml
//Id [1,] "16578858" [2,] "11186225" [3,] "11121081" [4,] "11121080" [5,] "11121079" [6,] "11121078" [7,] "11121077" [8,] "11121076" [9,] "11121075" [10,] "11121074" [11,] "11121073" [12,] "11121072" [13,] "11121071" [14,] "11121070" [15,] "11121069" [16,] "11121068" [17,] "11121067" [18,] "11121066" [19,] "11121065" [20,] "11121064"
David Winsemius, MD > Thanks for any help > -- > Armin Goralczyk, M.D.
David Winsemius <dwinsemius at comcast.net> wrote in news:Xns9A09CA51DB1E4dNOTwinscomcast at 80.91.229.13:
"Armin Goralczyk" <agoralczyk at gmail.com> wrote in news:a695fbee0712171238g4995040x579e58f52f83376e at mail.gmail.com:
I tried the above function with simple search terms and it worked fine for me (also more output thanks to Martin's post) but when I use search terms attributed to certain fields, i.e. with [au] or [ta], I get the following error message:
pm.srch()
1: "laryngeal neoplasms[mh]" 2:
I am wondering if you used spaces, rather than "+"'s? If so then you may want your function to do more gsub-processing of the input string.
I tried my theory that one would need "+"'s instead of spaces, but disproved it. Spaces in the input string seems to produce acceptable results on my WinXP/R.2.6.1/RGui system even with more complex search strings.
David Winsemius
On 12/18/07, David Winsemius <dwinsemius at comcast.net> wrote:
David Winsemius <dwinsemius at comcast.net> wrote in news:Xns9A09CA51DB1E4dNOTwinscomcast at 80.91.229.13:
"Armin Goralczyk" <agoralczyk at gmail.com> wrote in news:a695fbee0712171238g4995040x579e58f52f83376e at mail.gmail.com:
I tried the above function with simple search terms and it worked fine for me (also more output thanks to Martin's post) but when I use search terms attributed to certain fields, i.e. with [au] or [ta], I get the following error message:
pm.srch()
1: "laryngeal neoplasms[mh]" 2:
I am wondering if you used spaces, rather than "+"'s? If so then you may want your function to do more gsub-processing of the input string.
I tried my theory that one would need "+"'s instead of spaces, but disproved it. Spaces in the input string seems to produce acceptable results on my WinXP/R.2.6.1/RGui system even with more complex search strings. --
It's not the spaces, the problem is the tag (sorry that I didn't
specify this), or maybe the string []. I am working on a Mac OS X 10.4
with R version 2.6. Is it maybe a string conversion problem? In the
following warning strings in the html adress seem to be different:
Fehler in .Call("RS_XML_ParseTree", as.character(file), handlers,
as.logical(ignoreBlanks), :
error in creating parser for
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=laryngeal
neoplasms[mh]
I/O warning : failed to load external entity
"http%3A//eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi%3Fdb=pubmed&term=laryngeal%20neoplasms%5Bmh%5D"
Armin Goralczyk, M.D. -- Universit?tsmedizin G?ttingen Abteilung Allgemein- und Viszeralchirurgie Rudolf-Koch-Str. 40 39099 G?ttingen -- Dept. of General Surgery University of G?ttingen G?ttingen, Germany -- http://www.chirurgie-goettingen.de
"Armin Goralczyk" <agoralczyk at gmail.com> wrote in news:a695fbee0712180702k1a351b5cxca54d45b81096166 at mail.gmail.com:
On 12/18/07, David Winsemius <dwinsemius at comcast.net> wrote:
David Winsemius <dwinsemius at comcast.net> wrote in news:Xns9A09CA51DB1E4dNOTwinscomcast at 80.91.229.13:
"Armin Goralczyk" <agoralczyk at gmail.com> wrote in news:a695fbee0712171238g4995040x579e58f52f83376e at mail.gmail.com:
I tried the above function with simple search terms and it worked fine for me (also more output thanks to Martin's post) but when I use search terms attributed to certain fields, i.e. with [au] or [ta], I get the following error message:
pm.srch()
1: "laryngeal neoplasms[mh]" 2:
I am wondering if you used spaces, rather than "+"'s? If so then you may want your function to do more gsub-processing of the input string.
I tried my theory that one would need "+"'s instead of spaces, but disproved it. Spaces in the input string seems to produce acceptable results on my WinXP/R.2.6.1/RGui system even with more complex search strings. --
It's not the spaces, the problem is the tag (sorry that I didn't
specify this), or maybe the string []. I am working on a Mac OS X 10.4
with R version 2.6. Is it maybe a string conversion problem? In the
following warning strings in the html adress seem to be different:
Fehler in .Call("RS_XML_ParseTree", as.character(file), handlers,
as.logical(ignoreBlanks), :
error in creating parser for
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&ter
m=laryngeal neoplasms[mh]
I/O warning : failed to load external entity
"http%3A//eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi%3Fdb=pubme
d&term=laryngeal%20neoplasms%5Bmh%5D"
I do not have an up-to-date version of R on my Mac, since I have not yet upgraded to OSX10.4. I can try with my older version of R, but failure (or even success) with versions OSX-10.2/R-2.0 is not likely to be very informative. If you will post an example of the input that is resulting in the error, I can try it on my WinXP machine. If we cannot reproduce it there, then it may be more appropriate to take further questions to the Mac-R mailing list. The error message suggests to me that the fault lies in the connection phase of the task.
David Winsemius