Merging data frame for SpPolDF
Hi It might be we are talking about different things. What I understand is that you have an original shp-file with unique IDs associated with each Polygon. In the shape-file these are named "letras". In addition to this you have a data.frame with variables "letters" which matches the ones in "letras". However in the letters column there are fewer IDs that in letras. This is what is generated by: #First we extract the CNTY_ID (let this be letras) to get some similar IDs for the dummy data.frame. At this stage it should contain one column named ID with the IDs, and one named Ndata with some random values extra <- data.frame(ID=slot(nc, 'data')$CNTY_ID, Ndata=runif(length(slot(nc, 'data')$CNTY_ID))) first:
str(slot(nc, 'data'))
'data.frame': 100 obs. of 14 variables: . . $ CNTY_ID : num 1825 1827 1828 1831 1832 ... . . $ NWBIR79 : num 19 12 260 145 1197 ... . #The next step is to remove some of these IDs (1-3, and 68-100). At this stage we have removed quite a number of IDs extra <- extra[4:67, 1:2] #In addition we want to sort them in a different way, to make things more realistic extra <- extra[order(extra$ID, decreasing=TRUE),] #And finally we change the value of one of the IDs to have a value that is not present in the original CNTY_ID. Also this to make it more realistic extra[1,1] <- 342
str(extra)
'data.frame': 64 obs. of 2 variables: $ ID : num 342 2039 2034 2032 2030 ... $ Ndata: num 0.8272 0.0255 0.5633 0.1834 0.8208 ...
str(slot(nc, 'data')$CNTY_ID)
num [1:100] 1825 1827 1828 1831 1832 ... So we see they look different, just like a1 and a2 in the example provided. So far we have only been worried about making the dummy data. Skip the merge() function and move directly to the match() function. This is what you want to do (only the next steps): extra <- extra[match(slot(nc, 'data')$CNTY_ID, extra$ID), 1:2]
str(extra)
'data.frame': 100 obs. of 2 variables: $ ID : num NA NA NA 1831 1832 ... $ Ndata: num NA NA NA 0.252 0.842 ... We now have the data frame you wanted to add with the same number of rows, and ordered the same way as the data in the SpatialPolygonsDataFrame. We add it to the data slot in the SpatialPolygonsDataFrame. slot(nc, 'data')$Ndata <- extra$Ndata
str(slot(nc, 'data'))
'data.frame': 100 obs. of 15 variables: . . $ CNTY_ID : num 1825 1827 1828 1831 1832 ... . . $ NWBIR79 : num 19 12 260 145 1197 ... $ Ndata : num NA NA NA 0.252 0.842 ... . The point is that you sort the data by the IDs in the original shape file, and hence you can simply add the data back to the data slot and they are than located in the right place. Finally export the data, and the new shapefile has one more variable. writeOGR(nc,dsn="/home/lunde/MMAMBmuni2",layer="MMAMBmuni2", driver="ESRI Shapefile") Am I still wrong? In that, could someone else assist me? Best wishes Torleif
On Thursday 19 March 2009 08:21:25 pm Agustin Lobo wrote:
Thanks. I might be wrong, but I think that your example is different. The problem comes up when the second dataframe does not have values for all cases that are present in the first one. For example
> a1 <-data.frame(letras=c("A","B","C","D"),nums=c("1","2","3","4"))
> a2 <-data.frame(letras=c("A","C","D"),nums=c("10","30","40"))
> a1
letras nums 1 A 1 2 B 2 3 C 3 4 D 4
> a2
letras nums 1 A 10 2 C 30 3 D 40
> a2 <-data.frame(letters=c("A","C","D"),cods=c("10","30","40"))
> merge(a1,a2,by.x="letras",by.y="letters",all.x=T,sort=F)
letras nums cods 1 A 1 10 2 C 3 30 3 D 4 40 4 B 2 <NA> which disrupts the ordering in a1 and thus creates a risk for puting the merged dataframe in the SpPolDF And what you say would be:
> a2[match(a1$letras, a2$letters), ]
letters cods 1 A 10 NA <NA> <NA> 2 C 30 3 D 40 which would not solve the problem. Perhaps I did not correctly interpret your solution? Agus Torleif Markussen Lunde wrote:
Hi
Maybe this can help? Please correct me if this is not what you wanted.
require(maptools)
nc <- readShapePoly(system.file("shapes/sids.shp",
package="maptools")[1], proj4string=CRS("+proj=longlat +datum=NAD27"))
#Create dummy data. Do some changes to make it look different (subset and
order)
extra <- data.frame(ID=slot(nc, 'data')$CNTY_ID,
Ndata=runif(length(slot(nc, 'data')$CNTY_ID)))
extra <- extra[4:67, 1:2]
extra <- extra[order(extra$ID, decreasing=TRUE),]
extra[1,1] <- 342
#add the dummy data(.frame) (this part is what you want to do)
extra <- extra[match(slot(nc, 'data')$CNTY_ID, extra$ID), 1:2]
slot(nc, 'data')$Ndata <- extra$Ndata
#or for the data frame
slot(nc, 'data') <- cbind(slot(nc, 'data'), extra[-1])
Best wishes
Torleif
On Thursday 19 March 2009 01:08:15 pm Agustin Lobo wrote:
Hi!
I often have to add more information to the data slot of
a SpPolDF imported from a shp file. I do it in this way, don't like
it too much and would like feed-back on a better way-
#Import shp
MMAMBmuni <- readOGR("C:/Pruebas/DUNS/MMAMBmuni", layer="MMAMBmuni")
#Extract the DF
MMAMBmuniDFori <- MMAMBmuni at data
#Make a new dataframe by merging with another DF
MMAMBmuniDFnew <-
merge(MMAMBmuniDFori,MMAMBempleados,by.x="MUNICIPI",by.y="CODMUN",all.x=
T,s ort=F)
The problem here is that there are a couple of towns in the by.x field
for which we do not any in by.y
As we have set all.x=T, we get a line for which the values from the
second dataframe are NA. But, despite stating sort=F, those cases are
not in the same row as they are in the first data.frame but appended at
the end of the new dataframe. This is bad news for us, as breaks
the order required for including the new dataframe as the data slot
of a new SpPolDF. Therefore, I have to reorder the new dataframe, thanks
to another field, IDgrafic:
MMAMBmuniDFnew<- MMAMBmuniDFnew[order(MMAMBmuniDFnew$ID_GRAFIC),]
and then copy the original row.names, required because the row.names are
the ones
making the link to the polygons in the future SpPolDF:
row.names(MMAMBmuniDFnew) <- row.names(MMAMBmuniDFori)
#Now we put the new DF in lieu of the older one:
MMAMBmuni2 at data <- MMAMBmuniDFnew
#and finally save as shp
writeOGR(MMAMBmuni2,dsn="C:/Pruebas/DUNS/MMAMBmuni2",layer="MMAMBmuni2",
driver="ESRI Shapefile")
Any suggestions on a better procedure? The problem is that sometimes I
forget reordering and get a wrong shp. Until now, I have always realized
the error, but I'm terrified by the idea of not realizing the error
sometime and using true garbage after that point...
Thanks
Agus
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo