Back to formatted view
Raw Message

Message-ID: <20220728094922.5f2ab19a@trisector>
Date: 2022-07-28T06:49:22Z
From: Ivan Krylov
Subject: Parsing XML?
In-Reply-To: <7743581c-0959-1df6-bb31-f140161f83e1@effectivedefense.org>

On Wed, 27 Jul 2022 15:50:55 -0500
Spencer Graves <spencer.graves at effectivedefense.org> wrote:

> What would you suggest I do to parse the following XML file into a
> list that I can understand:
> 
> XMLfile <-
> "https://chroniclingamerica.loc.gov/data/bib/worldcat_titles/bulk5/ndnp_Alabama_all-yrs_e_0001_0050.xml" 

> XMLdat <- XML::xmlParse(XMLdata)
> str(XMLdat)

Isn't XMLdat already a tree-like list? For example,
XMLdat[[1]][[1]][[3]][[1]] is the first <record> tag in the file, which
you can further pick apart.

What information do you need from this file and how would you like to
access it? Parsing XML files is typically achieved with XPath
expressions (e.g. 'under every <record> tag, extract the <datafield>
tags containing attribute tag="042"' would look like
'record/datafield[tag="042"]') and/or handlers on specific tags, not by
extracting all text nodes and performing string operations on them.

-- 
Best regards,
Ivan