another XML package question

5 messages · Duncan Temple Lang, Antje

Original

1

5

Antje

Mon, Sep 8, 2008 2:41 AM #

Hi there,

does anybody know how to return the xmlPath from a node?
For example, at several location in the xml file, I have nodes with the same 
name and I'd like to process only the nodes from a certain path.

Any idea?

Antje

Duncan Temple Lang

Mon, Sep 8, 2008 3:09 AM #

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Antje wrote:

As with your previous question, there are ways to do this
with either XPath queries or R functions that operate on
the nodes from the earlier queries.

By "xmlPath", let's assume you mean the ordered collection of
nodes from the node to the root node of the document,
i.e. the collection of ancestor nodes.
So using XPath, you could use

   a = getNodeSet( node, "ancestor::*")

where node is the R variable containing the node within the tree
whose ancestors you want, e.g.
    getNodeSet(doc, "//val")[[1]]

The nodes in are in "reverse" order.


You can do the same thing with the R function
xmlParent().  To get the ancestors,

  tmp = xmlParent(node)
  ans = list()
  while( !is.null(tmp)) {
      ans = c(ans, tmp)
      tmp = xmlParent(tmp)
  }

and of course in your case you could terminate the loop
at any point.


But a different approach to the problem is to use a more specific
XPath query in the first place to get only the nodes of interest.
For example, to get the <val> nodes in the second <data> node of
your example, you could use

  getNodeSet(doc, "//data[2]/val")

or to find all <val> nodes which have the attribute  i = "t2",

   getNodeSet(doc, "//val[@i='t2']")

Or to find all <val> nodes with an ancestor which have an ancestor
with an attribute name "loc"

     getNodeSet(doc, "//*[@loc='1']//val")



(
The  sample XML document was

<root>
   <data loc="1">
     <val i="t1"> 22 </val>
     <val i="t2"> 45 </val>
   </data>
   <data loc="2">
     <val i="t1"> 44 </val>
     <val i="t2"> 11 </val>
   </data>
</root>

)


 D.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjE+fQACgkQ9p/Jzwa2QP6KBwCeImMuyCL0kpF/0eRqo77ywZj/
AloAn3HRWaD+RDV+ZETRagtfV7zlJpk6
=JtiD
-----END PGP SIGNATURE-----

Antje

Mon, Sep 8, 2008 6:35 AM #

Hi Duncan,

thanks a lot for your explanations.

I tried the following now to understand a bit more:

data <- getNodeSet(doc, "//Data")
xmlName(data[[1]])
xmlName(xmlRoot(data[[1]]))
xpathApply(data[[1]], "./*", xmlName)

Is it right that using "data" in the xpathApply() somehow sets the current node 
but does not change the root?
So looking for a subnode at all levels below my current node is not possible 
with the xPath syntax? (search on all levels starting from root is possible 
with "//nodename")

Antje




Duncan Temple Lang schrieb:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Duncan Temple Lang

Mon, Sep 8, 2008 8:25 AM #

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Antje wrote:

The answer is "it depends", specifically on what version of
the XML package you have.
In version 1.96-0 (the latest release), yes.
There is code also in the package (but overriden)
that creates a new temporary tree with the given node as the
root of the new tree (but without copying the nodes).
But the former is most likely what is desired.

It is possible

  getNodeSet( data[[1]], ".//*")

does that. The // means "any level". BTW, it doesn't match text
nodes, so you might want
          ".//*|.//text()|.//processing-instruction()"
for completeness (or maybe not!)

The key thing is that when you supply a node (and not the document)
as the first argument of getNodeSet() or xpathApply(), the XPath
query should be a relative query, e.g. .//* rather than //*.

And the reason for keeping the root the same is so that we can do

  getNodeSet(data[[1]], "ancestor::*")
or
  getNodeSet(data[[1]], "../foo")

i.e. have an XPath expression that refers to nodes "higher" up the tree.

 D.

(search on all levels starting from root
is possible with "//nodename")

Antje




Duncan Temple Lang schrieb:


Antje wrote:

Hi there,

does anybody know how to return the xmlPath from a node?
For example, at several location in the xml file, I have nodes with the
same name and I'd like to process only the nodes from a certain path.

Any idea?

As with your previous question, there are ways to do this
with either XPath queries or R functions that operate on
the nodes from the earlier queries.

By "xmlPath", let's assume you mean the ordered collection of
nodes from the node to the root node of the document,
i.e. the collection of ancestor nodes.
So using XPath, you could use

   a = getNodeSet( node, "ancestor::*")

where node is the R variable containing the node within the tree
whose ancestors you want, e.g.
    getNodeSet(doc, "//val")[[1]]

The nodes in are in "reverse" order.


You can do the same thing with the R function
xmlParent().  To get the ancestors,

  tmp = xmlParent(node)
  ans = list()
  while( !is.null(tmp)) {
      ans = c(ans, tmp)
      tmp = xmlParent(tmp)
  }

and of course in your case you could terminate the loop
at any point.


But a different approach to the problem is to use a more specific
XPath query in the first place to get only the nodes of interest.
For example, to get the <val> nodes in the second <data> node of
your example, you could use

  getNodeSet(doc, "//data[2]/val")

or to find all <val> nodes which have the attribute  i = "t2",

   getNodeSet(doc, "//val[@i='t2']")

Or to find all <val> nodes with an ancestor which have an ancestor
with an attribute name "loc"

     getNodeSet(doc, "//*[@loc='1']//val")



(
The  sample XML document was

<root>
   <data loc="1">
     <val i="t1"> 22 </val>
     <val i="t2"> 45 </val>
   </data>
   <data loc="2">
     <val i="t1"> 44 </val>
     <val i="t2"> 11 </val>
   </data>
</root>

)


 D.

Antje

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjFQLMACgkQ9p/Jzwa2QP5mSwCffr3WDFAAvEQ+PDhIl65R8uQb
EvUAn0bHeUqZSKQzUlDO4qaCV69tMuNg
=y6Eo
-----END PGP SIGNATURE-----

Antje

Mon, Sep 8, 2008 8:33 AM #

Duncan Temple Lang schrieb:

allright, I didn't try this (I assumed that the // means "everything below 
root"...)
Now, I can do what I was looking for.

Thanks a lot for everything!

does that. The // means "any level". BTW, it doesn't match text
nodes, so you might want
          ".//*|.//text()|.//processing-instruction()"
for completeness (or maybe not!)

The key thing is that when you supply a node (and not the document)
as the first argument of getNodeSet() or xpathApply(), the XPath
query should be a relative query, e.g. .//* rather than //*.

And the reason for keeping the root the same is so that we can do

  getNodeSet(data[[1]], "ancestor::*")
or
  getNodeSet(data[[1]], "../foo")

i.e. have an XPath expression that refers to nodes "higher" up the tree.

 D.

(search on all levels starting from root
is possible with "//nodename")

Antje




Duncan Temple Lang schrieb:


Antje wrote:

Hi there,

does anybody know how to return the xmlPath from a node?
For example, at several location in the xml file, I have nodes with the
same name and I'd like to process only the nodes from a certain path.

Any idea?

As with your previous question, there are ways to do this
with either XPath queries or R functions that operate on
the nodes from the earlier queries.

By "xmlPath", let's assume you mean the ordered collection of
nodes from the node to the root node of the document,
i.e. the collection of ancestor nodes.
So using XPath, you could use

   a = getNodeSet( node, "ancestor::*")

where node is the R variable containing the node within the tree
whose ancestors you want, e.g.
    getNodeSet(doc, "//val")[[1]]

The nodes in are in "reverse" order.


You can do the same thing with the R function
xmlParent().  To get the ancestors,

  tmp = xmlParent(node)
  ans = list()
  while( !is.null(tmp)) {
      ans = c(ans, tmp)
      tmp = xmlParent(tmp)
  }

and of course in your case you could terminate the loop
at any point.


But a different approach to the problem is to use a more specific
XPath query in the first place to get only the nodes of interest.
For example, to get the <val> nodes in the second <data> node of
your example, you could use

  getNodeSet(doc, "//data[2]/val")

or to find all <val> nodes which have the attribute  i = "t2",

   getNodeSet(doc, "//val[@i='t2']")

Or to find all <val> nodes with an ancestor which have an ancestor
with an attribute name "loc"

     getNodeSet(doc, "//*[@loc='1']//val")



(
The  sample XML document was

<root>
   <data loc="1">
     <val i="t1"> 22 </val>
     <val i="t2"> 45 </val>
   </data>
   <data loc="2">
     <val i="t1"> 44 </val>
     <val i="t2"> 11 </val>
   </data>
</root>

)


 D.

Antje

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.