Skip to content

another XML package question

5 messages · Duncan Temple Lang, Antje

#
Hi there,

does anybody know how to return the xmlPath from a node?
For example, at several location in the xml file, I have nodes with the same 
name and I'd like to process only the nodes from a certain path.

Any idea?

Antje
#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Antje wrote:
As with your previous question, there are ways to do this
with either XPath queries or R functions that operate on
the nodes from the earlier queries.

By "xmlPath", let's assume you mean the ordered collection of
nodes from the node to the root node of the document,
i.e. the collection of ancestor nodes.
So using XPath, you could use

   a = getNodeSet( node, "ancestor::*")

where node is the R variable containing the node within the tree
whose ancestors you want, e.g.
    getNodeSet(doc, "//val")[[1]]

The nodes in are in "reverse" order.


You can do the same thing with the R function
xmlParent().  To get the ancestors,

  tmp = xmlParent(node)
  ans = list()
  while( !is.null(tmp)) {
      ans = c(ans, tmp)
      tmp = xmlParent(tmp)
  }

and of course in your case you could terminate the loop
at any point.


But a different approach to the problem is to use a more specific
XPath query in the first place to get only the nodes of interest.
For example, to get the <val> nodes in the second <data> node of
your example, you could use

  getNodeSet(doc, "//data[2]/val")

or to find all <val> nodes which have the attribute  i = "t2",

   getNodeSet(doc, "//val[@i='t2']")

Or to find all <val> nodes with an ancestor which have an ancestor
with an attribute name "loc"

     getNodeSet(doc, "//*[@loc='1']//val")



(
The  sample XML document was

<root>
   <data loc="1">
     <val i="t1"> 22 </val>
     <val i="t2"> 45 </val>
   </data>
   <data loc="2">
     <val i="t1"> 44 </val>
     <val i="t2"> 11 </val>
   </data>
</root>

)


 D.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjE+fQACgkQ9p/Jzwa2QP6KBwCeImMuyCL0kpF/0eRqo77ywZj/
AloAn3HRWaD+RDV+ZETRagtfV7zlJpk6
=JtiD
-----END PGP SIGNATURE-----
#
Hi Duncan,

thanks a lot for your explanations.

I tried the following now to understand a bit more:

data <- getNodeSet(doc, "//Data")
xmlName(data[[1]])
xmlName(xmlRoot(data[[1]]))
xpathApply(data[[1]], "./*", xmlName)

Is it right that using "data" in the xpathApply() somehow sets the current node 
but does not change the root?
So looking for a subnode at all levels below my current node is not possible 
with the xPath syntax? (search on all levels starting from root is possible 
with "//nodename")

Antje




Duncan Temple Lang schrieb:
#
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Antje wrote:
The answer is "it depends", specifically on what version of
the XML package you have.
In version 1.96-0 (the latest release), yes.
There is code also in the package (but overriden)
that creates a new temporary tree with the given node as the
root of the new tree (but without copying the nodes).
But the former is most likely what is desired.
It is possible

  getNodeSet( data[[1]], ".//*")

does that. The // means "any level". BTW, it doesn't match text
nodes, so you might want
          ".//*|.//text()|.//processing-instruction()"
for completeness (or maybe not!)

The key thing is that when you supply a node (and not the document)
as the first argument of getNodeSet() or xpathApply(), the XPath
query should be a relative query, e.g. .//* rather than //*.

And the reason for keeping the root the same is so that we can do

  getNodeSet(data[[1]], "ancestor::*")
or
  getNodeSet(data[[1]], "../foo")

i.e. have an XPath expression that refers to nodes "higher" up the tree.

 D.

        
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjFQLMACgkQ9p/Jzwa2QP5mSwCffr3WDFAAvEQ+PDhIl65R8uQb
EvUAn0bHeUqZSKQzUlDO4qaCV69tMuNg
=y6Eo
-----END PGP SIGNATURE-----
#
Duncan Temple Lang schrieb:
allright, I didn't try this (I assumed that the // means "everything below 
root"...)
Now, I can do what I was looking for.

Thanks a lot for everything!