XML to CSV
Hello all,
Thank you for the extremely helpful information. As a follow up, some of
the nested elements are of the form below:
-<DischargeMedication>
<Medication MedAdmin="0" MedID="10"/>
<Medication MedAdmin="0" MedID="11"/>
I've been having trouble extracting this information and was wondering if
anyone had any suggestions.
Thank you,
Andrew
On Thu, Jan 5, 2017 at 7:39 AM, Franzini, Gabriele [Nervianoms] <
Gabriele.Franzini at nervianoms.com> wrote:
Hello Andrew, as you are "clean slate" anyway in handling XML files, you could take a look to XSLT processing -- also an off-topic area. There are free tools available around, and many examples of "XML to CSV XSLT" on StackOverflow. HTH, Gabriele -----Original Message----- On January 4, 2017 12:45:08 PM PST, Ben Tupper <btupper at bigelow.org> wrote:
Hi, You should keep replies on the list - you never know when someone will swoop in with the right answer to make your life easier. Below is a simple example that uses xpath syntax to identify (and in this case retrieve) children that match your xpath expression. xpath epxressions are sort of like /a/directory/structure/description so you can visualize elements of XML like nested folders or subdirectories. Hopefully this will get you started. A lot more on xpath here http://www.w3schools.com/xml/xml_xpath.asp There are other extraction tools in xml2 - just type ?xml2 at the command prompt to see more. Since you have more deeply nested elements you'll need to play with this a bit first. library(xml2) uri = 'http://www.w3schools.com/xml/simple.xml' x = read_xml(uri) name_nodes = xml_find_all(x, "//name") name = xml_text(name_nodes) price_nodes = xml_find_all(x, "//price") price = xml_text(price_nodes) calories_nodes = xml_find_all(x, "//calories") calories = xml_double(calories_nodes) X = data.frame(name, price, calories, stringsAsFactors = FALSE) write.csv(X, file = 'foo.csv') Cheers, Ben
On Jan 4, 2017, at 2:13 PM, Andrew Lachance <alachanc at bates.edu>
wrote:
Hello Ben, Thank you for the advice. I am extremely new to any sort of coding so
I have learned a lot already. Essentially, I was given an XML file and was told to convert all of it to a csv so that it could be uploaded into a database. Unfortunately the information I am working with is medical information and can't really share it. I initially tried to convert it using online programs, however that ended up with a large amount of blank spaces that wasn't useful for uploading into the database.
So essentially, my goal is to parse all the data in the XML to a
coherent, succinct CSV that could be uploaded. In the document, there are 361 patient files with 13 subcategories for each patient which further branches off to around 150 categories total. Since I am so new, I have been having a hard time seeing the bigger picture or knowing if there are any intermediary steps that will prevent all the blank spaces that the online conversion programs created.
I will look through the information on the xml2 package. Any advice
or recommendations would be greatly appreciated as I have felt fairly stuck. Once again, thank you very much for your help.
Best, Andrew On Tue, Jan 3, 2017 at 2:29 PM, Ben Tupper <btupper at bigelow.org
<mailto:btupper at bigelow.org>> wrote:
Hi, It's hard to know what to advise - much depends upon the XML data you
have and what you want to extract from it. Without knowing about those two things there is little anyone could do to help. Can you post to the internet a to example data and provide the link here? Then state explicitly what you want to have in hand at the end.
If you are just starting out I suggest that you try xml2 package (
https://cran.r-project.org/web/packages/xml2/ <https://cran.r-project.org/web/packages/xml2/> ) rather than XML package ( https://cran.r-project.org/web/packages/XML/ <https://cran.r-project.org/web/packages/XML/> ). I have been using it much more since the authors added the ability to create xml nodes (rather than just extracting data from existing xml nodes).
Cheers, Ben P.S. Hello to my niece Olivia S on the Bates EMS team.
On Jan 3, 2017, at 11:27 AM, Andrew Lachance <alachanc at bates.edu
<mailto:alachanc at bates.edu>> wrote:
up votdown votefavorite
convert-a-large-xml-file-to-a-csv-file-using-r?noredirect=1#
convert-a-large-xml-file-to-a-csv-file-using-r?noredirect=1#>>
I am completely new to R and have tried to use several functions
within the
xml packages to convert an XML to a csv and have had little
success. Since
I am so new, I am not sure what the necessary steps are to complete
this
conversion without a lot of NA. -- Andrew D. Lachance Chief of Service, Bates Emergency Medical Service Residence Coordinator, Hopkins House Bates College Class of 2017 alachanc at bates.edu <mailto:alachanc at bates.edu> <wcurley at bates.edu
<mailto:wcurley at bates.edu>>
(207) 620-4854
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
To UNSUBSCRIBE and more, see
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org <http://www.bigelow.org/> -- Andrew D. Lachance Chief of Service, Bates Emergency Medical Service Residence Coordinator, Hopkins House Bates College Class of 2017 alachanc at bates.edu <mailto:wcurley at bates.edu> (207) 620-4854 Ben Tupper Bigelow Laboratory for Ocean Sciences 60 Bigelow Drive, P.O. Box 380 East Boothbay, Maine 04544 http://www.bigelow.org [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Andrew D. Lachance Chief of Service, Bates Emergency Medical Service Residence Coordinator, Hopkins House Bates College Class of 2017 alachanc at bates.edu <wcurley at bates.edu> (207) 620-4854 [[alternative HTML version deleted]]