Hello Andrew,
as you are "clean slate" anyway in handling XML files, you could take a
look to XSLT processing -- also an off-topic area.
There are free tools available around, and many examples of "XML to CSV
XSLT" on StackOverflow.
HTH,
Gabriele
-----Original Message-----
On January 4, 2017 12:45:08 PM PST, Ben Tupper <btupper at bigelow.org>
wrote:
Hi,
You should keep replies on the list - you never know when someone will
swoop in with the right answer to make your life easier.
Below is a simple example that uses xpath syntax to identify (and in
this case retrieve) children that match your xpath expression. xpath
epxressions are sort of like /a/directory/structure/description so you
can visualize elements of XML like nested folders or subdirectories.
Hopefully this will get you started. A lot more on xpath here
http://www.w3schools.com/xml/xml_xpath.asp There are other extraction
tools in xml2 - just type ?xml2 at the command prompt to see more.
Since you have more deeply nested elements you'll need to play with
this a bit first.
library(xml2)
uri = 'http://www.w3schools.com/xml/simple.xml'
x = read_xml(uri)
name_nodes = xml_find_all(x, "//name")
name = xml_text(name_nodes)
price_nodes = xml_find_all(x, "//price")
price = xml_text(price_nodes)
calories_nodes = xml_find_all(x, "//calories")
calories = xml_double(calories_nodes)
X = data.frame(name, price, calories, stringsAsFactors = FALSE)
write.csv(X, file = 'foo.csv')
Cheers,
Ben
On Jan 4, 2017, at 2:13 PM, Andrew Lachance <alachanc at bates.edu>
Hello Ben,
Thank you for the advice. I am extremely new to any sort of coding so
I have learned a lot already. Essentially, I was given an XML file and
was told to convert all of it to a csv so that it could be uploaded
into a database. Unfortunately the information I am working with is
medical information and can't really share it. I initially tried to
convert it using online programs, however that ended up with a large
amount of blank spaces that wasn't useful for uploading into the
database.
So essentially, my goal is to parse all the data in the XML to a
coherent, succinct CSV that could be uploaded. In the document, there
are 361 patient files with 13 subcategories for each patient which
further branches off to around 150 categories total. Since I am so new,
I have been having a hard time seeing the bigger picture or knowing if
there are any intermediary steps that will prevent all the blank spaces
that the online conversion programs created.
I will look through the information on the xml2 package. Any advice
or recommendations would be greatly appreciated as I have felt fairly
stuck. Once again, thank you very much for your help.
Best,
Andrew
On Tue, Jan 3, 2017 at 2:29 PM, Ben Tupper <btupper at bigelow.org
<mailto:btupper at bigelow.org>> wrote:
Hi,
It's hard to know what to advise - much depends upon the XML data you
have and what you want to extract from it. Without knowing about those
two things there is little anyone could do to help. Can you post to
the internet a to example data and provide the link here? Then state
explicitly what you want to have in hand at the end.
If you are just starting out I suggest that you try xml2 package (
Cheers,
Ben
P.S. Hello to my niece Olivia S on the Bates EMS team.
On Jan 3, 2017, at 11:27 AM, Andrew Lachance <alachanc at bates.edu
<mailto:alachanc at bates.edu>> wrote:
convert-a-large-xml-file-to-a-csv-file-using-r?noredirect=1#
convert-a-large-xml-file-to-a-csv-file-using-r?noredirect=1#>>
I am completely new to R and have tried to use several functions
xml packages to convert an XML to a csv and have had little
I am so new, I am not sure what the necessary steps are to complete
conversion without a lot of NA.
--
Andrew D. Lachance
Chief of Service, Bates Emergency Medical Service
Residence Coordinator, Hopkins House
Bates College Class of 2017
alachanc at bates.edu <mailto:alachanc at bates.edu> <wcurley at bates.edu
<mailto:wcurley at bates.edu>>
(207) 620-4854
[[alternative HTML version deleted]]