Skip to content
Prev 315897 / 398506 Next

Creating a Data Frame from an XML

On Jan 22, 2013, at 3:11 PM, Adam Gabbert wrote:

            
Hi,

You are so close!

You have a number of nodes with the name 'row'.  The "[[" function selects just one item from a list, and when there's a number that have that name it returns just the first.  So you really want to use the "[" function instead and then select by order index using "[["

library(XML)
" <row BRAND=\"FORD\" NUM=\"1\" YEAR=\"2000\" VALUE=\"12000\" />", 
" <row BRAND=\"GMC\" NUM=\"1\" YEAR=\"2001\" VALUE=\"12500\" />", 
" <row BRAND=\"FORD\" NUM=\"1\" YEAR=\"2002\" VALUE=\"13000\" />", 
" <row BRAND=\"GMC\" NUM=\"1\" YEAR=\"2003\" VALUE=\"14000\" />", 
" <row BRAND=\"FORD\" NUM=\"1\" YEAR=\"2004\" VALUE=\"17000\" />", 
" <row BRAND=\"GMC\" NUM=\"1\" YEAR=\"2005\" VALUE=\"15000\" />", 
" <row BRAND=\"GMC\" NUM=\"1\" YEAR=\"1967\" VALUE=\"PRICLESS\" />", 
" <row BRAND=\"FORD\" NUM=\"1\" YEAR=\"2007\" VALUE=\"17500\" />", 
" <row BRAND=\"GMC\" NUM=\"1\" YEAR=\"2008\" VALUE=\"22000\" />", 
" </data>")
<row BRAND="GMC" NUM="1" YEAR="1999" VALUE="10000"/>
<row BRAND="FORD" NUM="1" YEAR="2000" VALUE="12000"/> 

Your rows are set up so the attributes have the values you want - use xmlAttrs to retrieve them.
BRAND     NUM    YEAR   VALUE 
 "FORD"     "1"  "2000" "12000" 


You can use lapply to iterate through each row and apply the xmlAttrs function.  You'll end up with a list if character vectors.
List of 10
 $ row: Named chr [1:4] "GMC" "1" "1999" "10000"
  ..- attr(*, "names")= chr [1:4] "BRAND" "NUM" "YEAR" "VALUE"
 $ row: Named chr [1:4] "FORD" "1" "2000" "12000"
  ..- attr(*, "names")= chr [1:4] "BRAND" "NUM" "YEAR" "VALUE"
 $ row: Named chr [1:4] "GMC" "1" "2001" "12500"
  ..- attr(*, "names")= chr [1:4] "BRAND" "NUM" "YEAR" "VALUE"
	.
	.
	.

Next make a character matrix using do.call and rbind ...
chr [1:10, 1:4] "GMC" "FORD" "GMC" "FORD" "GMC" "FORD" "GMC" "GMC" "FORD" ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:10] "row" "row" "row" "row" ...
  ..$ : chr [1:4] "BRAND" "NUM" "YEAR" "VALUE"

And then on to a data.frame...
'data.frame':	10 obs. of  4 variables:
 $ BRAND: chr  "GMC" "FORD" "GMC" "FORD" ...
 $ NUM  : chr  "1" "1" "1" "1" ...
 $ YEAR : chr  "1999" "2000" "2001" "2002" ...
 $ VALUE: chr  "10000" "12000" "12500" "13000" ...

Cheers,
Ben
Ben Tupper
Bigelow Laboratory for Ocean Sciences
180 McKown Point Rd. P.O. Box 475
West Boothbay Harbor, Maine   04575-0475 
http://www.bigelow.org