Failure to understand namespaces in XML::getNodeSet
I think you want
x <- read_xml('<?xml version="1.0" ?>
<WorkSet xmlns="http://labkey.org/etl/xml">
<Description>MFIA 9-Plex (CharlesRiver)</Description>
</WorkSet>')
The collapse argument do what you think it does.
Hadley
On Tue, Jan 31, 2017 at 5:36 PM, Mark Sharp <msharp at txbiomed.org> wrote:
Hadley,
Thank you. I am able to get the xml_ns_strip() function to work with my file directly so I will likely be able to reach my immediate goal.
However, I still have had no success with understanding the namespace problem. I am not able to use read_xml() using the object I generated for the reproducible example, which is simply a character vector of length 4 having the contents of the XML file as produce by readLines(). I then used dput() to define the structure. The resulting structure apparently is not to the liking of read_xml(). I have reproduced the necessary code here for your convenience. There error is below.
##
library(xml2)
library(stringr)
with_ns_xml <- c("<?xml version=\"1.0\" ?>",
"<WorkSet xmlns=\"http://labkey.org/etl/xml\">",
"<Description>MFIA 9-Plex (CharlesRiver)</Description>",
"</WorkSet>")
## without str_c() collapse it complain of a vector of length > 1 also.
read_xml(str_c(with_ns_xml, collapse = TRUE))
Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html, :
Start tag expected, '<' not found [4]
## produces the following error message.
Error in doc_parse_raw(x, encoding = encoding, base_url = base_url, as_html = as_html, :
Start tag expected, '<' not found [4]
I have similar issues with xml2::xml_find_all
xml_find_all(str_c(with_ns_xml, collapse = TRUE), "/WorkSet//Description")
## Produces the following error message.
Error in UseMethod("xml_find_all") :
no applicable method for 'xml_find_all' applied to an object of class "character"
R. Mark Sharp, Ph.D.
msharp at TxBiomed.org
On Jan 31, 2017, at 4:27 PM, Hadley Wickham <h.wickham at gmail.com> wrote: See the last example in ?xml2::xml_find_all or use xml2::xml2::xml_ns_strip() Hadley On Tue, Jan 31, 2017 at 9:43 AM, Mark Sharp <msharp at txbiomed.org> wrote:
I am trying to read a series of XML files that use a namespace and I have failed, thus far, to discover the proper syntax. I have a reproducible example below. I have two XML character strings defined: one without a namespace and one with. I show that I can successfully extract the node using the XML string without the namespace and fail when using the XML string with the namespace.
Mark
PS I am having the same problem with the xml2 package and am hoping understanding one with help with the other.
##
library(XML)
## The first XML text (no_ns_xml) does not have a namespace defined
no_ns_xml <- c("<?xml version=\"1.0\" ?>", "<WorkSet>",
"<Description>MFIA 9-Plex (CharlesRiver)</Description>",
"</WorkSet>")
l_no_ns_xml <-xmlTreeParse(no_ns_xml, asText = TRUE, getDTD = FALSE,
useInternalNodes = TRUE)
## The node is found
getNodeSet(l_no_ns_xml, "/WorkSet//Description")
## The second XML text (with_ns_xml) has a namespace defined
with_ns_xml <- c("<?xml version=\"1.0\" ?>",
"<WorkSet xmlns=\"http://labkey.org/etl/xml\">",
"<Description>MFIA 9-Plex (CharlesRiver)</Description>",
"</WorkSet>")
l_with_ns_xml <-xmlTreeParse(with_ns_xml, asText = TRUE, getDTD = FALSE,
useInternalNodes = TRUE)
## The node is not found
getNodeSet(l_with_ns_xml, "/WorkSet//Description")
## I attempt to provide the namespace, but fail.
ns <- "http://labkey.org/etl/xml"
names(ns)[1] <- "xmlns"
getNodeSet(l_with_ns_xml, "/WorkSet//Description", namespaces = ns)
R. Mark Sharp, Ph.D.
Director of Data Science Core
Southwest National Primate Research Center
Texas Biomedical Research Institute
P.O. Box 760549
San Antonio, TX 78245-0549
Telephone: (210)258-9476
e-mail: msharp at TxBiomed.org
CONFIDENTIALITY NOTICE: This e-mail and any files and/or...{{dropped:10}}
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
CONFIDENTIALITY NOTICE: This e-mail and any files and/or attachments transmitted, may contain privileged and confidential information and is intended solely for the exclusive use of the individual or entity to whom it is addressed. If you are not the intended recipient, you are hereby notified that any review, dissemination, distribution or copying of this e-mail and/or attachments is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender stating that this transmission was misdirected; return the e-mail to sender; destroy all paper copies and delete all electronic copies from your system without disclosing its contents.