Skip to content

XML and str

4 messages · Ashley Ford, Duncan Temple Lang, Martin Maechler

#
If I read in an .xml file eg with
package="XML"))

It appears to be OK however examining it with str() gives an apparent
error
List of 2
 $ doc:List of 3
  ..$ file    : list()
  .. ..- attr(*, "class")= chr [1:2] "XMLComment" "XMLNode"
  ..$ version :List of 4
  .. ..- attr(*, "class")= chr "XMLNode"
  ..$ children:Error in obj$children[[...]] : subscript out of bounds

I am unsure if this is a feature or a bug and if the latter whether it
is in XML or str, it is not causing a problem but I would like to
understand what is happening, any ideas ?

examining components eg
List of 2
 $ comment: list()
  ..- attr(*, "class")= chr [1:2] "XMLComment" "XMLNode"
etc 

is OK.

XML Version 1.4-1, 
same behaviour on Windows and Linux, R version 2.4.1 (2006-12-18)




The information contained in this E-Mail and any subsequent
correspondence is private and is intended solely for the intended
recipient(s).  The information in this communication may be confidential
and/or legally privileged.  Nothing in this e-mail is intended to
conclude a contract on behalf of QinetiQ or make QinetiQ subject to any
other legally binding commitments, unless the e-mail contains an express
statement to the contrary or incorporates a formal Purchase Order.

For those other than the recipient any disclosure, copying,
distribution, or any action taken or omitted to be taken in reliance on
such information is prohibited and may be unlawful.

Emails and other electronic communication with QinetiQ may be monitored
and recorded for business purposes including security, audit and
archival purposes.  Any response to this email indicates consent to
this.

Telephone calls to QinetiQ may be monitored or recorded for quality
control, security and other business purposes.

QinetiQ Group plc,

Company Registration No: 4586941,  

Registered office: 85 Buckingham Gate, London SW1E 6PD
#
Ashley> If I read in an .xml file eg with 

    >> xeg <- xmlTreeParse(system.file("exampleData", "test.xml",
                                       package="XML"))

    Ashley> It appears to be OK however examining it with str() gives an apparent
    Ashley> error

    >> str(xeg, 2)
    Ashley> List of 2
    Ashley> $ doc:List of 3
    Ashley> ..$ file    : list()
    Ashley> .. ..- attr(*, "class")= chr [1:2] "XMLComment" "XMLNode"
    Ashley> ..$ version :List of 4
    Ashley> .. ..- attr(*, "class")= chr "XMLNode"
    Ashley> ..$ children:Error in obj$children[[...]] : subscript out of bounds

    Ashley> I am unsure if this is a feature or a bug and if the latter whether it
    Ashley> is in XML or str, it is not causing a problem but I would like to
    Ashley> understand what is happening, any ideas ?

Yes -  thank you for providing a well-reproducible example.
After setting  
      options(error = recover)

I do

   > obj <- xeg$doc
   > mode(obj)     # "list"
   [1] "list"
   > is.list(obj)  # TRUE
   [1] TRUE
   > length(obj)   # 3
   [1] 3
   > obj[[3]]      # ---> the error you see above.
   Error in obj$children[[...]] : subscript out of bounds

   Enter a frame number, or 0 to exit   

   1: obj[[3]]
   2: `[[.XMLDocumentContent`(obj, 3)

   Selection: 0

   > obj$children  # works, should be identical to obj[[3]]
   $comment
   <!--A comment-->

   $foo
   <foo x="1">
    <element attrib1="my value"/>
   ......

This shows that the XML package implements the "[[" method
wrongly IMHO and also inconsistently with the "$" method.
this is not a bug in XML but rather str() which assumes that
x[[length(x)]] works for objects of mode "list" even when they
are not of *class* "list", but I hope he would still rather
consider changing [[.XMLDocumentContent ...

Martin

    Ashley> examining components eg 
    >> str(xeg$doc$children,2)

    Ashley> List of 2
    Ashley> $ comment: list()
    Ashley> ..- attr(*, "class")= chr [1:2] "XMLComment" "XMLNode"
    Ashley> etc 

    Ashley> is OK.

    Ashley> XML Version 1.4-1, 
    Ashley> same behaviour on Windows and Linux, R version 2.4.1 (2006-12-18)




    Ashley> The information contained in this E-Mail and any subsequent
    Ashley> correspondence is private and is intended solely for the intended
    Ashley> recipient(s).  The information in this communication may be confidential
    Ashley> and/or legally privileged.  Nothing in this e-mail is intended to
    Ashley> conclude a contract on behalf of QinetiQ or make QinetiQ subject to any
    Ashley> other legally binding commitments, unless the e-mail contains an express
    Ashley> statement to the contrary or incorporates a formal Purchase Order.

    Ashley> For those other than the recipient any disclosure, copying,
    Ashley> distribution, or any action taken or omitted to be taken in reliance on
    Ashley> such information is prohibited and may be unlawful.

    Ashley> Emails and other electronic communication with QinetiQ may be monitored
    Ashley> and recorded for business purposes including security, audit and
    Ashley> archival purposes.  Any response to this email indicates consent to
    Ashley> this.

    Ashley> Telephone calls to QinetiQ may be monitored or recorded for quality
    Ashley> control, security and other business purposes.

    Ashley> QinetiQ Group plc,

    Ashley> Company Registration No: 4586941,  

    Ashley> Registered office: 85 Buckingham Gate, London SW1E 6PD

    Ashley> ______________________________________________
    Ashley> R-help at stat.math.ethz.ch mailing list
    Ashley> https://stat.ethz.ch/mailman/listinfo/r-help
    Ashley> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    Ashley> and provide commented, minimal, self-contained, reproducible code.
2 days later
#
Martin Maechler wrote:
More likely, the appropriate fix is to have
length() return the relevant value.
I even recall considering this at the time of writing
the package initially.  But that was back in 1999/2000
and S4 and R/S-Plus compatibility were not what they
are now.  It could be changed.  Not certain when I will
get a chance.

 D.
#

        
DTL> Martin Maechler wrote:
>>>>>>> "Ashley" == Ashley Ford <ford at signal.QinetiQ.com>
    >>>>>>> on Wed, 07 Feb 2007 17:18:56 +0000 writes:
    >> 
    Ashley> If I read in an .xml file eg with 
    >> 
    >> >> xeg <- xmlTreeParse(system.file("exampleData", "test.xml",
    >> package="XML"))
    >> 
    Ashley> It appears to be OK however examining it with str() gives an apparent
    Ashley> error
    >> 
    >> >> str(xeg, 2)
    Ashley> List of 2
    Ashley> $ doc:List of 3
    Ashley> ..$ file    : list()
    Ashley> .. ..- attr(*, "class")= chr [1:2] "XMLComment" "XMLNode"
    Ashley> ..$ version :List of 4
    Ashley> .. ..- attr(*, "class")= chr "XMLNode"
    Ashley> ..$ children:Error in obj$children[[...]] : subscript out of bounds
    >> 
    Ashley> I am unsure if this is a feature or a bug and if the latter whether it
    Ashley> is in XML or str, it is not causing a problem but I would like to
    Ashley> understand what is happening, any ideas ?
    >> 
    >> Yes -  thank you for providing a well-reproducible example.
    >> After setting  
    >> options(error = recover)
    >> 
    >> I do
    >> 
    >> > obj <- xeg$doc
    >> > mode(obj)     # "list"
    >> [1] "list"
    >> > is.list(obj)  # TRUE
    >> [1] TRUE
    >> > length(obj)   # 3
    >> [1] 3
    >> > obj[[3]]      # ---> the error you see above.
    >> Error in obj$children[[...]] : subscript out of bounds
    >> 
    >> Enter a frame number, or 0 to exit   
    >> 
    >> 1: obj[[3]]
    >> 2: `[[.XMLDocumentContent`(obj, 3)
    >> 
    >> Selection: 0
    >> 
    >> > obj$children  # works, should be identical to obj[[3]]
    >> $comment
    >> <!--A comment-->
    >> 
    >> $foo
    >> <foo x="1">
    >> <element attrib1="my value"/>
    >> ......
    >> 
    >> This shows that the XML package implements the "[[" method
    >> wrongly IMHO and also inconsistently with the "$" method.
    >> 
    >>> From a strict OOP view, the XML author could argue that
    >> this is not a bug in XML but rather str() which assumes that
    >> x[[length(x)]] works for objects of mode "list" even when they
    >> are not of *class* "list", but I hope he would still rather
    >> consider changing [[.XMLDocumentContent ...
    >> 


    DTL> More likely, the appropriate fix is to have
    DTL> length() return the relevant value.

Hmm. 

  > library(XML)
  > xeg <- xmlTreeParse(system.file("exampleData", "test.xml", package= "XML"))
  > obj <- xeg$doc
  > mode(obj)     # "list"
  [1] "list"
  > is.list(obj)  # TRUE
  [1] TRUE
  > length(obj)   # 3
  [1] 3
  > obj[[3]]      # ---> the error you see above.
  Error in obj$children[[...]] : subscript out of bounds
  > names(obj)
  [1] "file"     "version"  "children"
  > class(obj)
  [1] "XMLDocumentContent"
  > methods(class=class(obj))
  [1] xmlApply.XMLDocumentContent*  [[.XMLDocumentContent*       
  [3] xmlRoot.XMLDocumentContent*   xmlSApply.XMLDocumentContent*

  > XML:::`[[.XMLDocumentContent`
  function (obj, ...) 
  {
      obj$children[[...]]
  }
  <environment: namespace:XML>

so  length(obj) is 3 and obj is a simple S3 object
which is just a list with 3 named components,
Do you really want to define  length(.) to also return the
length of obj$children instead of the length() of the list
itself?   
With that you'd have your XMLDocumentContent objects ``look''
like lists with three named components on one hand
(and help(xmlTreeParse) does mention these components)
but behave in other contexts as if it was just its own component
'obj$children'.   Of course you then should also define 
  print.XMLDocumentContent() and
  str.XMLDocumentContent()   accordingly, 
so users would barely know about the "file" and "version"
component of 'obj'.
But is this really desirable ?
With the above "[[.XMLDoc..."  you break the basic S-language
premise of  "[[" and "$" to behave accordingly.

You could solve "everything" elegantly if you used S4 instead of S3
classes, since there's no defined correspondence between slot
access and "[[" (and yes, then (with S4), I'd agree that 

setMethod("length", "XMLDocumentContent", 
          function(x) length(x at children))

would be needed too -- and fine.

Martin

    DTL> I even recall considering this at the time of writing
    DTL> the package initially.  But that was back in 1999/2000
    DTL> and S4 and R/S-Plus compatibility were not what they
    DTL> are now.  It could be changed.  Not certain when I will
    DTL> get a chance.


    Ashley> examining components eg 
    >> >> str(xeg$doc$children,2)
    >> 
    Ashley> List of 2
    Ashley> $ comment: list()
    Ashley> ..- attr(*, "class")= chr [1:2] "XMLComment" "XMLNode"
    Ashley> etc 
    >> 
    Ashley> is OK.
    >> 
    Ashley> XML Version 1.4-1, 
    Ashley> same behaviour on Windows and Linux, R version 2.4.1 (2006-12-18)
    >>