I need to construct a custom XML reader, the files I'm working with are in funky XML format: <str name="author">Paul H</str> <str name="country">USA</str> <date name="created_date">2010-02-16</date> I want to read the file so it looks like: author = Paul H country = USA created_date=2010-02-16 Does any one know how to go about this problem, or know of good references i could access? Thanks, Andy -- View this message in context: http://r.789695.n4.nabble.com/Custom-XML-Readers-tp4229614p4229614.html Sent from the R help mailing list archive at Nabble.com.
Custom XML Readers
6 messages · Ben Tupper, Duncan Temple Lang, pl.rudy at gmail.com
Hi Andy,
On Dec 23, 2011, at 2:51 PM, pl.rudy at gmail.com wrote:
I need to construct a custom XML reader, the files I'm working with are in funky XML format: <str name="author">Paul H</str> <str name="country">USA</str> <date name="created_date">2010-02-16</date> I want to read the file so it looks like: author = Paul H country = USA created_date=2010-02-16 Does any one know how to go about this problem, or know of good references i could access?
Have you tried Duncan Temple Lang's XML package for R? It works very well for parsing and building XML formatted data. http://www.omegahat.org/RSXML/ Cheers, Ben
Thanks, Andy -- View this message in context: http://r.789695.n4.nabble.com/Custom-XML-Readers-tp4229614p4229614.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Ben Tupper Bigelow Laboratory for Ocean Sciences 180 McKown Point Rd. P.O. Box 475 West Boothbay Harbor, Maine 04575-0475 http://www.bigelow.org
1 day later
In addition to the general tools of the XML package, I also had code that read documents with a similar structure to the ones Andy illustrated. I put them and simple examples of using them at the bottom of http://www.omegahat.org/RSXML/ page. D.
On 12/23/11 5:50 PM, Ben Tupper wrote:
Hi Andy, On Dec 23, 2011, at 2:51 PM, pl.rudy at gmail.com wrote:
I need to construct a custom XML reader, the files I'm working with are in funky XML format: <str name="author">Paul H</str> <str name="country">USA</str> <date name="created_date">2010-02-16</date> I want to read the file so it looks like: author = Paul H country = USA created_date=2010-02-16 Does any one know how to go about this problem, or know of good references i could access?
Have you tried Duncan Temple Lang's XML package for R? It works very well for parsing and building XML formatted data. http://www.omegahat.org/RSXML/ Cheers, Ben
Thanks, Andy -- View this message in context: http://r.789695.n4.nabble.com/Custom-XML-Readers-tp4229614p4229614.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Ben Tupper Bigelow Laboratory for Ocean Sciences 180 McKown Point Rd. P.O. Box 475 West Boothbay Harbor, Maine 04575-0475 http://www.bigelow.org
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
3 days later
Thanks all for helpful advise, however I'm still running in to an error while trying to run "readSolrDoc" provided by Ducan Temple Lang. The documents I'm trying to parse come from solr and look very much like the example provided on http://www.omegahat.org/RSXML/ I'm not that familiar with the XML package yet and I'm having difficulty figuring out what is wrong, has any one encountered similar error or knows how to handle this? This is the error I'm getting:
error parsing attribute name
attributes construct error Couldn't find end of Start Tag lst line 2 Opening and ending tag mismatch: response line 1 and lst Extra content at the end of the document Error in readSolrDoc(xmlParse(doc), ...) : error in evaluating the argument 'doc' in selecting a method for function 'readSolrDoc': Error: 1: error parsing attribute name 2: attributes construct error 3: Couldn't find end of Start Tag lst line 2 4: Opening and ending tag mismatch: response line 1 and lst 5: Extra content at the end of the document -- View this message in context: http://r.789695.n4.nabble.com/Custom-XML-Readers-tp4229614p4240320.html Sent from the R help mailing list archive at Nabble.com.
1 day later
I found the source of the error, in my XML document there are some costume tags such us <response> <doc> if I change those tags to <lst> the code work. One other source of error is when the text does not fit on to one line such as: <str?name="fulltext"> MORGANZA, La. (AP) -- Federal officials say they are going to open a Mississippi River floodgate for the first time in nearly four decades at 3 p.m. CDT. The Army Corps of Engineers made the</str> Does anyone know how I can add different tags to sorlDoc? http://www.omegahat.org/RSXML/solrDocs.R And how to deal with the text on more then one line? Just to be clear I'm using XML package, with the sorlDoc code found at the bottom of this page http://www.omegahat.org/RSXML/ Thanks, Andy -- View this message in context: http://r.789695.n4.nabble.com/Custom-XML-Readers-tp4229614p4243739.html Sent from the R help mailing list archive at Nabble.com.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111229/d5e3d2c4/attachment.pl>