Skip to content

Custom XML Readers

6 messages · Ben Tupper, Duncan Temple Lang, pl.rudy at gmail.com

#
I need to construct a custom XML reader, the files I'm working with are in
funky XML format:

<str name="author">Paul H</str>
  <str name="country">USA</str>
  <date name="created_date">2010-02-16</date>
 
I want to read the file so it looks like:

author = Paul H
country = USA
created_date=2010-02-16

Does any one know how to go about this problem, or know of good references i
could access?

Thanks,
Andy


--
View this message in context: http://r.789695.n4.nabble.com/Custom-XML-Readers-tp4229614p4229614.html
Sent from the R help mailing list archive at Nabble.com.
#
Hi Andy,
On Dec 23, 2011, at 2:51 PM, pl.rudy at gmail.com wrote:

            
Have you tried Duncan Temple Lang's XML package for R?  It works very well for parsing and building XML formatted data.

http://www.omegahat.org/RSXML/

Cheers,
Ben
Ben Tupper
Bigelow Laboratory for Ocean Sciences
180 McKown Point Rd. P.O. Box 475
West Boothbay Harbor, Maine   04575-0475 
http://www.bigelow.org
1 day later
#
In addition to the general tools of the XML package,
I also had code that read documents with a similar structure
to the ones Andy illustrated. I put them and simple examples
of using them at the bottom of

   http://www.omegahat.org/RSXML/

page.

  D.
On 12/23/11 5:50 PM, Ben Tupper wrote:
3 days later
#
Thanks all for helpful advise,  however I'm still running in to an error
while trying to run "readSolrDoc"  provided by Ducan Temple Lang.  The
documents I'm trying to parse come from  solr and look very much like the
example provided on http://www.omegahat.org/RSXML/

I'm not that familiar with the XML package yet and I'm having difficulty
figuring out what is wrong, has any one encountered similar error or knows
how to handle this?

This is the error I'm getting:
attributes construct error
Couldn't find end of Start Tag lst line 2
Opening and ending tag mismatch: response line 1 and lst
Extra content at the end of the document
Error in readSolrDoc(xmlParse(doc), ...) : 
  error in evaluating the argument 'doc' in selecting a method for function
'readSolrDoc': Error: 1: error parsing attribute name
2: attributes construct error
3: Couldn't find end of Start Tag lst line 2
4: Opening and ending tag mismatch: response line 1 and lst
5: Extra content at the end of the document





--
View this message in context: http://r.789695.n4.nabble.com/Custom-XML-Readers-tp4229614p4240320.html
Sent from the R help mailing list archive at Nabble.com.
1 day later
#
I found the source of the error, in my XML document there are some costume
tags such us 
<response>
<doc>
if I change those tags to <lst> the code work.  One other source of error is
when the text does not fit on to one line such as:

<str?name=&quot;fulltext&quot;>
MORGANZA, La. (AP) -- Federal officials say they are going to open a
Mississippi River floodgate for the first time in nearly four decades at 3
p.m. CDT. The Army Corps of Engineers made the</str>

Does anyone know how I can add different tags to sorlDoc?
http://www.omegahat.org/RSXML/solrDocs.R

And how to deal with the text on more then one line?

Just to be clear I'm using XML package, with the sorlDoc code found at the
bottom of this page 
http://www.omegahat.org/RSXML/

Thanks,
Andy


--
View this message in context: http://r.789695.n4.nabble.com/Custom-XML-Readers-tp4229614p4243739.html
Sent from the R help mailing list archive at Nabble.com.