Hello,
I am having a spatial object by reading a shp-file by the
readShapePoly command. I now want to add data into the
SpatialPolygonsDataFrame from an external source (its a dataframe
which has a common key SHN_N). I am using the merge command, which I
thought works fine, unitl I realized that "map.kreise" now became just
a data.frame.
My greatest fear in working with spatial objects is that, if I don't
have them in one object, I end up plotting a value for the wrong
region. So is there a way to use merge or what is the most appropriate
way to combine data?
map.kreise<-readShapePoly("vg250krs",IDvar="KRS_ID",verbos=TRUE) #
reads the shape file
map.kreise<-merge(map.kreise,INC,sort = FALSE,by.map.kreise="SHN_N",
by.INC="SHN_N",all.map.kreise=T,all.INC=T)
Thanks a lot in advance,
Stefan
merging data with SpatialPolygonsDataFrame
8 messages · stefan lhachimi, Dylan Beaudette, Roger Bivand
On Wed, 23 Jan 2008, stefan lhachimi wrote:
Hello, I am having a spatial object by reading a shp-file by the readShapePoly command. I now want to add data into the SpatialPolygonsDataFrame from an external source (its a dataframe which has a common key SHN_N). I am using the merge command, which I thought works fine, unitl I realized that "map.kreise" now became just a data.frame.
merge() is a "false friend", because a SpatialPolygonsDataFrame can be coerced to a data.frame, which is what merge() does without asking. It may be best to say corece first, merge() the two data.frames, and rebuild the SpatialPolygonsDataFrame with SpatialPolygonsDataFrame(). map.kreise.df <- as(map.kreise, "data.frame") map.kreise.df1 <- merge(map.kreise.df, INC, sort=FALSE, by.x="SHN_N", by.y="SHN_N", all.x=TRUE, all.y=TRUE) map.kreise1 <- SpatialPolygonsDataFrame(as(map.kreise, "SpatialPolygons"), data=map.kreise.df1) which may fail if INC and map.kreise.df have different row.names. The argument match.ID= (default TRUE) to SpatialPolygonsDataFrame matches the polygon IDs to the data= row.names - they must agree exactly though the data frame rows will be re-ordered to match the polygons if only the orders differ. Your concern here is real, which is why the argument is there. If you don't need merge to manipulate the data frames, you can go straight for: all.equal(row.names(INC), row.names(as(map.kreise, "data.frame")) map.kreise1 <- spCbind(map.kreise, INC) if the row.names of INC match the polygon IDs, and thus the row.names of as(map.kreise, "data.frame"), see ?spCbind in maptools, and the example included there. Hope this helps, Roger
My greatest fear in working with spatial objects is that, if I don't
have them in one object, I end up plotting a value for the wrong
region. So is there a way to use merge or what is the most appropriate
way to combine data?
map.kreise<-readShapePoly("vg250krs",IDvar="KRS_ID",verbos=TRUE) #
reads the shape file
map.kreise<-merge(map.kreise,INC,sort = FALSE,by.map.kreise="SHN_N",
by.INC="SHN_N",all.map.kreise=T,all.INC=T)
Thanks a lot in advance,
Stefan
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
On Thursday 24 January 2008, Roger Bivand wrote:
On Wed, 23 Jan 2008, stefan lhachimi wrote:
Hello, I am having a spatial object by reading a shp-file by the readShapePoly command. I now want to add data into the SpatialPolygonsDataFrame from an external source (its a dataframe which has a common key SHN_N). I am using the merge command, which I thought works fine, unitl I realized that "map.kreise" now became just a data.frame.
merge() is a "false friend", because a SpatialPolygonsDataFrame can be coerced to a data.frame, which is what merge() does without asking. It may be best to say corece first, merge() the two data.frames, and rebuild the SpatialPolygonsDataFrame with SpatialPolygonsDataFrame(). map.kreise.df <- as(map.kreise, "data.frame") map.kreise.df1 <- merge(map.kreise.df, INC, sort=FALSE, by.x="SHN_N", by.y="SHN_N", all.x=TRUE, all.y=TRUE) map.kreise1 <- SpatialPolygonsDataFrame(as(map.kreise, "SpatialPolygons"), data=map.kreise.df1) which may fail if INC and map.kreise.df have different row.names. The argument match.ID= (default TRUE) to SpatialPolygonsDataFrame matches the polygon IDs to the data= row.names - they must agree exactly though the data frame rows will be re-ordered to match the polygons if only the orders differ. Your concern here is real, which is why the argument is there. If you don't need merge to manipulate the data frames, you can go straight for: all.equal(row.names(INC), row.names(as(map.kreise, "data.frame")) map.kreise1 <- spCbind(map.kreise, INC) if the row.names of INC match the polygon IDs, and thus the row.names of as(map.kreise, "data.frame"), see ?spCbind in maptools, and the example included there. Hope this helps, Roger
My greatest fear in working with spatial objects is that, if I don't
have them in one object, I end up plotting a value for the wrong
region. So is there a way to use merge or what is the most appropriate
way to combine data?
map.kreise<-readShapePoly("vg250krs",IDvar="KRS_ID",verbos=TRUE) #
reads the shape file
map.kreise<-merge(map.kreise,INC,sort = FALSE,by.map.kreise="SHN_N",
by.INC="SHN_N",all.map.kreise=T,all.INC=T)
Thanks a lot in advance,
Stefan
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Hi Roger. Thanks for contributing some answers to this.
I was recently working with a colleague on developing some sample exercises
for new students. Since joining new attribute data to a GIS layer's table is
a very common operation we included some samples on how to do this within R.
You have hinted at some possible ways to do it above, but do you have a 'best
practices' approach to doing this using 'sp' methods and objects?
For example:
# contains an attribute col named 'veg_code'
veg <- readOGR(something.shp)
# code meanings: indexed by 'veg_code'
codes <- read.dbf(table.dbf)
# what is the best way to join up the attributes in 'veg' with the rows
in 'codes' ?
As of now we are using merge to replace the dataframe slot of the original
file. We first re-order the results from merge to match the original row
ordering:
# an example file:
veg <- readOGR(dsn='ArcGISLabData/BrownsPond/', layer='vegevector')
# some example codes
veg_codes <- data.frame(code=1:4, meaning=c('code 1','code 2','code 3','code
4'))
# join the original data table with the veg codes table
combined <- merge(x=veg at data, y=veg_codes, by.x='CODE', by.y='code')
# overwrite the original data frame with the combined version
# note that the original order needs to be restored
# since the original data was sorted on 'ID', we can use that to restore
# the correct order in the 'combined' dataframe:
v at data <- combined[order(combined$ID),]
In summary, is there a safer or preferred way to do this?
thanks,
Dylan
Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341
On Thu, 24 Jan 2008, Dylan Beaudette wrote:
Hi Roger. Thanks for contributing some answers to this. I was recently working with a colleague on developing some sample exercises for new students. Since joining new attribute data to a GIS layer's table is a very common operation we included some samples on how to do this within R. You have hinted at some possible ways to do it above, but do you have a 'best practices' approach to doing this using 'sp' methods and objects? For example: # contains an attribute col named 'veg_code' veg <- readOGR(something.shp) # code meanings: indexed by 'veg_code' codes <- read.dbf(table.dbf) # what is the best way to join up the attributes in 'veg' with the rows in 'codes' ?
Hi Dylan,
This is a different question, but I won't break it out of this thread yet.
You are doing look-up on codes to give labels to veg$veg_code, right?
veg$veg_code are integer indices to codes$V1 (say V1, I don't know what it
is). If length(unique(veg$veg_code)) == length(codes$V1), and
sort(unique(veg$veg_code)) is 1:length(codes$V1), you should think of the
factor as your friend:
veg$veg_code_factor <- factor(veg$veg_code, labels=as.character(codes$V1))
If not, you need another layer using perhaps order() or match() on the
matching substring of codes$V1 to find out which value of veg$veg_code
should have which label in as.character(codes$V1). Alternatively use the
levels= argument to factor().
Something like:
set.seed(1)
veg_code <- rpois(100, 4)
table(veg_code)
V1 <- paste("code", 0:10)
V1
levs <- 0:10
veg_code_factor <- factor(veg_code, levels=levs, labels=V1)
table(veg_code_factor, veg_code)
No merging or messing with veg itself is needed, apart from adding a
single extra factor column. The factor abstraction is a great strength of
the S language.
Have I misunderstood you?
Roger
As of now we are using merge to replace the dataframe slot of the original
file. We first re-order the results from merge to match the original row
ordering:
# an example file:
veg <- readOGR(dsn='ArcGISLabData/BrownsPond/', layer='vegevector')
# some example codes
veg_codes <- data.frame(code=1:4, meaning=c('code 1','code 2','code 3','code
4'))
# join the original data table with the veg codes table
combined <- merge(x=veg at data, y=veg_codes, by.x='CODE', by.y='code')
# overwrite the original data frame with the combined version
# note that the original order needs to be restored
# since the original data was sorted on 'ID', we can use that to restore
# the correct order in the 'combined' dataframe:
v at data <- combined[order(combined$ID),]
In summary, is there a safer or preferred way to do this?
thanks,
Dylan
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
On Thursday 24 January 2008, Roger Bivand wrote:
On Thu, 24 Jan 2008, Dylan Beaudette wrote:
Hi Roger. Thanks for contributing some answers to this. I was recently working with a colleague on developing some sample exercises for new students. Since joining new attribute data to a GIS layer's table is a very common operation we included some samples on how to do this within R. You have hinted at some possible ways to do it above, but do you have a 'best practices' approach to doing this using 'sp' methods and objects? For example: # contains an attribute col named 'veg_code' veg <- readOGR(something.shp) # code meanings: indexed by 'veg_code' codes <- read.dbf(table.dbf) # what is the best way to join up the attributes in 'veg' with the rows in 'codes' ?
Hi Dylan,
This is a different question, but I won't break it out of this thread yet.
You are doing look-up on codes to give labels to veg$veg_code, right?
veg$veg_code are integer indices to codes$V1 (say V1, I don't know what it
is). If length(unique(veg$veg_code)) == length(codes$V1), and
sort(unique(veg$veg_code)) is 1:length(codes$V1), you should think of the
factor as your friend:
veg$veg_code_factor <- factor(veg$veg_code, labels=as.character(codes$V1))
If not, you need another layer using perhaps order() or match() on the
matching substring of codes$V1 to find out which value of veg$veg_code
should have which label in as.character(codes$V1). Alternatively use the
levels= argument to factor().
Something like:
set.seed(1)
veg_code <- rpois(100, 4)
table(veg_code)
V1 <- paste("code", 0:10)
V1
levs <- 0:10
veg_code_factor <- factor(veg_code, levels=levs, labels=V1)
table(veg_code_factor, veg_code)
No merging or messing with veg itself is needed, apart from adding a
single extra factor column. The factor abstraction is a great strength of
the S language.
Have I misunderstood you?
Roger
I think so. I was (trying to) describe the process of joining, either 1:1 or many:1, the att table associated with an sp object and some other data frame. merge() seems to work fine, but the order of the rows are different from the original data frame attached to the sp object. My question was on the best way to 'update' the data frame attached to an sp object, based on the results from a merge() with some other data. does that help? cheers, Dylan
As of now we are using merge to replace the dataframe slot of the
original file. We first re-order the results from merge to match the
original row ordering:
# an example file:
veg <- readOGR(dsn='ArcGISLabData/BrownsPond/', layer='vegevector')
# some example codes
veg_codes <- data.frame(code=1:4, meaning=c('code 1','code 2','code
3','code 4'))
# join the original data table with the veg codes table
combined <- merge(x=veg at data, y=veg_codes, by.x='CODE', by.y='code')
# overwrite the original data frame with the combined version
# note that the original order needs to be restored
# since the original data was sorted on 'ID', we can use that to restore
# the correct order in the 'combined' dataframe:
v at data <- combined[order(combined$ID),]
In summary, is there a safer or preferred way to do this?
thanks,
Dylan
Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341
On Thu, 24 Jan 2008, Dylan Beaudette wrote:
On Thursday 24 January 2008, Roger Bivand wrote:
On Thu, 24 Jan 2008, Dylan Beaudette wrote:
Hi Roger. Thanks for contributing some answers to this. I was recently working with a colleague on developing some sample exercises for new students. Since joining new attribute data to a GIS layer's table is a very common operation we included some samples on how to do this within R. You have hinted at some possible ways to do it above, but do you have a 'best practices' approach to doing this using 'sp' methods and objects? For example: # contains an attribute col named 'veg_code' veg <- readOGR(something.shp) # code meanings: indexed by 'veg_code' codes <- read.dbf(table.dbf) # what is the best way to join up the attributes in 'veg' with the rows in 'codes' ?
Hi Dylan,
This is a different question, but I won't break it out of this thread yet.
You are doing look-up on codes to give labels to veg$veg_code, right?
veg$veg_code are integer indices to codes$V1 (say V1, I don't know what it
is). If length(unique(veg$veg_code)) == length(codes$V1), and
sort(unique(veg$veg_code)) is 1:length(codes$V1), you should think of the
factor as your friend:
veg$veg_code_factor <- factor(veg$veg_code, labels=as.character(codes$V1))
If not, you need another layer using perhaps order() or match() on the
matching substring of codes$V1 to find out which value of veg$veg_code
should have which label in as.character(codes$V1). Alternatively use the
levels= argument to factor().
Something like:
set.seed(1)
veg_code <- rpois(100, 4)
table(veg_code)
V1 <- paste("code", 0:10)
V1
levs <- 0:10
veg_code_factor <- factor(veg_code, levels=levs, labels=V1)
table(veg_code_factor, veg_code)
No merging or messing with veg itself is needed, apart from adding a
single extra factor column. The factor abstraction is a great strength of
the S language.
Have I misunderstood you?
Roger
I think so. I was (trying to) describe the process of joining, either 1:1 or many:1, the att table associated with an sp object and some other data frame. merge() seems to work fine, but the order of the rows are different from the original data frame attached to the sp object.
My experience is that merge() is often a false friend, because of object coercion and sorting, as you suggest. The rules for SpatialLinesDataFrame and SpatialPolygonsDataFrame are simple, and SpatialPointsDataFrame (and by extension SpatialPixelsDataFrame) can be made as simple. The first two constructor functions take a match.ID= argument, default TRUE, which matches the geometry ID - the ID slot in the component Lines or Polygons objects - to the data.frame row.name. For SpatialPointsDataFrame(), the same argument exists, but is only used if the coordinate matrix has row names to be matched to the data.frame row names. If it has been used, the data slot row names are the geometry IDs.
From there, it gets harder. merge() takes lots of arguments, and a lot of
trial and error is needed to reach a satisfactory result, that is a single data frame with row names matching the geometry IDs (not necessarily in the right order, but the same set of IDs).
My question was on the best way to 'update' the data frame attached to an sp object, based on the results from a merge() with some other data.
For one to one, there isn't really an alternative to extracting the data slot, do the merge, check the geometry IDs against the output data frame row names, and re-construct the object. That is, essentially where this thread began. If there is less data than geometries, merge() should fill out with NAs. If there is more data than geometries, the output data frame will need subsetting. Both match(), order(), and %in% are very handy here. For one geometry list to many - see reshape() first, for example to flatten a "tall" set of space/time observations so that the time values become new attribute columns. This could be met stations stacked by station and date, which need widening to stations by date*attributes If need be, the IDs of the geometries can be changed too, see ?"spChFIDs-methods" in maptools. Any closer? Roger
does that help? cheers, Dylan
As of now we are using merge to replace the dataframe slot of the
original file. We first re-order the results from merge to match the
original row ordering:
# an example file:
veg <- readOGR(dsn='ArcGISLabData/BrownsPond/', layer='vegevector')
# some example codes
veg_codes <- data.frame(code=1:4, meaning=c('code 1','code 2','code
3','code 4'))
# join the original data table with the veg codes table
combined <- merge(x=veg at data, y=veg_codes, by.x='CODE', by.y='code')
# overwrite the original data frame with the combined version
# note that the original order needs to be restored
# since the original data was sorted on 'ID', we can use that to restore
# the correct order in the 'combined' dataframe:
v at data <- combined[order(combined$ID),]
In summary, is there a safer or preferred way to do this?
thanks,
Dylan
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
On Friday 25 January 2008, Roger Bivand wrote:
On Thu, 24 Jan 2008, Dylan Beaudette wrote:
On Thursday 24 January 2008, Roger Bivand wrote:
On Thu, 24 Jan 2008, Dylan Beaudette wrote:
Hi Roger. Thanks for contributing some answers to this. I was recently working with a colleague on developing some sample exercises for new students. Since joining new attribute data to a GIS layer's table is a very common operation we included some samples on how to do this within R. You have hinted at some possible ways to do it above, but do you have a 'best practices' approach to doing this using 'sp' methods and objects? For example: # contains an attribute col named 'veg_code' veg <- readOGR(something.shp) # code meanings: indexed by 'veg_code' codes <- read.dbf(table.dbf) # what is the best way to join up the attributes in 'veg' with the rows in 'codes' ?
Hi Dylan,
This is a different question, but I won't break it out of this thread
yet.
You are doing look-up on codes to give labels to veg$veg_code, right?
veg$veg_code are integer indices to codes$V1 (say V1, I don't know what
it is). If length(unique(veg$veg_code)) == length(codes$V1), and
sort(unique(veg$veg_code)) is 1:length(codes$V1), you should think of
the factor as your friend:
veg$veg_code_factor <- factor(veg$veg_code,
labels=as.character(codes$V1))
If not, you need another layer using perhaps order() or match() on the
matching substring of codes$V1 to find out which value of veg$veg_code
should have which label in as.character(codes$V1). Alternatively use the
levels= argument to factor().
Something like:
set.seed(1)
veg_code <- rpois(100, 4)
table(veg_code)
V1 <- paste("code", 0:10)
V1
levs <- 0:10
veg_code_factor <- factor(veg_code, levels=levs, labels=V1)
table(veg_code_factor, veg_code)
No merging or messing with veg itself is needed, apart from adding a
single extra factor column. The factor abstraction is a great strength
of the S language.
Have I misunderstood you?
Roger
I think so. I was (trying to) describe the process of joining, either 1:1 or many:1, the att table associated with an sp object and some other data frame. merge() seems to work fine, but the order of the rows are different from the original data frame attached to the sp object.
Thanks for the detailed response Roger.
My experience is that merge() is often a false friend, because of object coercion and sorting, as you suggest. The rules for SpatialLinesDataFrame and SpatialPolygonsDataFrame are simple, and SpatialPointsDataFrame (and by extension SpatialPixelsDataFrame) can be made as simple. The first two constructor functions take a match.ID= argument, default TRUE, which matches the geometry ID - the ID slot in the component Lines or Polygons objects - to the data.frame row.name. For SpatialPointsDataFrame(), the same argument exists, but is only used if the coordinate matrix has row names to be matched to the data.frame row names. If it has been used, the data slot row names are the geometry IDs. From there, it gets harder. merge() takes lots of arguments, and a lot of trial and error is needed to reach a satisfactory result, that is a single data frame with row names matching the geometry IDs (not necessarily in the right order, but the same set of IDs).
My question was on the best way to 'update' the data frame attached to an sp object, based on the results from a merge() with some other data.
For one to one, there isn't really an alternative to extracting the data slot, do the merge, check the geometry IDs against the output data frame row names, and re-construct the object. That is, essentially where this thread began. If there is less data than geometries, merge() should fill out with NAs. If there is more data than geometries, the output data frame will need subsetting. Both match(), order(), and %in% are very handy here. For one geometry list to many - see reshape() first, for example to flatten a "tall" set of space/time observations so that the time values become new attribute columns. This could be met stations stacked by station and date, which need widening to stations by date*attributes If need be, the IDs of the geometries can be changed too, see ?"spChFIDs-methods" in maptools. Any closer?
I think that you have answered my questions and reaffirmed my thoughts on how to proceed. I am mostly interested in 1:1 and 1:many joins (geom:attr), and I think that the following approach should work in the general case: # read vector data: v <- readOGR(dsn='...', layer='...') # safer approach to 1:1 or 1:many (geom:atts) joins # add a sorting id for later use v at data$sorting_id <- 1:nrow(v at data) # make a copy of the original table orig.table <- v at data # make the table with 'joined' data new.table <- merge(x=orig.table, y=veg_codes, by.x='CODE', by.y='code') # re-order this table based on the original row order new.table.ordered <- new.table[order(new.table$sorting_id), ] # restore the origina row names row.names(new.table.ordered) <- row.names(orig.table) # replace the data table v at data <- new.table.ordered This should ensure that the merged rows are in the original order, and have the original row.names. Does that sound correct? Thanks, Dylan
Roger
does that help? cheers, Dylan
As of now we are using merge to replace the dataframe slot of the
original file. We first re-order the results from merge to match the
original row ordering:
# an example file:
veg <- readOGR(dsn='ArcGISLabData/BrownsPond/', layer='vegevector')
# some example codes
veg_codes <- data.frame(code=1:4, meaning=c('code 1','code 2','code
3','code 4'))
# join the original data table with the veg codes table
combined <- merge(x=veg at data, y=veg_codes, by.x='CODE', by.y='code')
# overwrite the original data frame with the combined version
# note that the original order needs to be restored
# since the original data was sorted on 'ID', we can use that to
restore # the correct order in the 'combined' dataframe:
v at data <- combined[order(combined$ID),]
In summary, is there a safer or preferred way to do this?
thanks,
Dylan
Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341
On Fri, 25 Jan 2008, Dylan Beaudette wrote:
On Friday 25 January 2008, Roger Bivand wrote:
On Thu, 24 Jan 2008, Dylan Beaudette wrote:
On Thursday 24 January 2008, Roger Bivand wrote:
On Thu, 24 Jan 2008, Dylan Beaudette wrote:
Hi Roger. Thanks for contributing some answers to this. I was recently working with a colleague on developing some sample exercises for new students. Since joining new attribute data to a GIS layer's table is a very common operation we included some samples on how to do this within R. You have hinted at some possible ways to do it above, but do you have a 'best practices' approach to doing this using 'sp' methods and objects? For example: # contains an attribute col named 'veg_code' veg <- readOGR(something.shp) # code meanings: indexed by 'veg_code' codes <- read.dbf(table.dbf) # what is the best way to join up the attributes in 'veg' with the rows in 'codes' ?
Hi Dylan,
This is a different question, but I won't break it out of this thread
yet.
You are doing look-up on codes to give labels to veg$veg_code, right?
veg$veg_code are integer indices to codes$V1 (say V1, I don't know what
it is). If length(unique(veg$veg_code)) == length(codes$V1), and
sort(unique(veg$veg_code)) is 1:length(codes$V1), you should think of
the factor as your friend:
veg$veg_code_factor <- factor(veg$veg_code,
labels=as.character(codes$V1))
If not, you need another layer using perhaps order() or match() on the
matching substring of codes$V1 to find out which value of veg$veg_code
should have which label in as.character(codes$V1). Alternatively use the
levels= argument to factor().
Something like:
set.seed(1)
veg_code <- rpois(100, 4)
table(veg_code)
V1 <- paste("code", 0:10)
V1
levs <- 0:10
veg_code_factor <- factor(veg_code, levels=levs, labels=V1)
table(veg_code_factor, veg_code)
No merging or messing with veg itself is needed, apart from adding a
single extra factor column. The factor abstraction is a great strength
of the S language.
Have I misunderstood you?
Roger
I think so. I was (trying to) describe the process of joining, either 1:1 or many:1, the att table associated with an sp object and some other data frame. merge() seems to work fine, but the order of the rows are different from the original data frame attached to the sp object.
Thanks for the detailed response Roger.
My experience is that merge() is often a false friend, because of object coercion and sorting, as you suggest. The rules for SpatialLinesDataFrame and SpatialPolygonsDataFrame are simple, and SpatialPointsDataFrame (and by extension SpatialPixelsDataFrame) can be made as simple. The first two constructor functions take a match.ID= argument, default TRUE, which matches the geometry ID - the ID slot in the component Lines or Polygons objects - to the data.frame row.name. For SpatialPointsDataFrame(), the same argument exists, but is only used if the coordinate matrix has row names to be matched to the data.frame row names. If it has been used, the data slot row names are the geometry IDs. From there, it gets harder. merge() takes lots of arguments, and a lot of trial and error is needed to reach a satisfactory result, that is a single data frame with row names matching the geometry IDs (not necessarily in the right order, but the same set of IDs).
My question was on the best way to 'update' the data frame attached to an sp object, based on the results from a merge() with some other data.
For one to one, there isn't really an alternative to extracting the data slot, do the merge, check the geometry IDs against the output data frame row names, and re-construct the object. That is, essentially where this thread began. If there is less data than geometries, merge() should fill out with NAs. If there is more data than geometries, the output data frame will need subsetting. Both match(), order(), and %in% are very handy here. For one geometry list to many - see reshape() first, for example to flatten a "tall" set of space/time observations so that the time values become new attribute columns. This could be met stations stacked by station and date, which need widening to stations by date*attributes If need be, the IDs of the geometries can be changed too, see ?"spChFIDs-methods" in maptools. Any closer?
I think that you have answered my questions and reaffirmed my thoughts on how to proceed. I am mostly interested in 1:1 and 1:many joins (geom:attr), and I think that the following approach should work in the general case: # read vector data: v <- readOGR(dsn='...', layer='...') # safer approach to 1:1 or 1:many (geom:atts) joins # add a sorting id for later use v at data$sorting_id <- 1:nrow(v at data)
v$sorting_id <- 1:nrow(as(v, "data.frame")) is cleaner, there is absolutely no need to access the slot directly with the @ operator. Almost all R functions do lazy evaluation, so do not create new objects unless needed. If you are using SpatialLines* or SpatialPolygons*, this is better, because it uses the geometry IDs (for v SpatialPolygons*): v$sorting_id <- sapply(slot(v, "polygons"), function(x) slot(x, "ID"))
# make a copy of the original table orig.table <- v at data
orig.table <- as(v, "data.frame")
# make the table with 'joined' data new.table <- merge(x=orig.table, y=veg_codes, by.x='CODE', by.y='code')
maybe sort=FALSE, you can also try by.x=0, which are the geometry IDs
# re-order this table based on the original row order new.table.ordered <- new.table[order(new.table$sorting_id), ] # restore the origina row names row.names(new.table.ordered) <- row.names(orig.table)
First check the identity of the two, recalling that character strings may be cast to factors when constructing data frames.
# replace the data table v at data <- new.table.ordered
Rather not, make a v1 and leave the input v in peace. Reading history files a couple of days later where objects get overwritten is ten times harder than creating new objects and if need be deleting old ones. I find that copying and pasting out-of-order history files can lead to lots of confusion if object names are overwritten.
This should ensure that the merged rows are in the original order, and have the original row.names.
Use of all.equal() and identical() on the IDs, row names, etc., is still very comforting. Think locales, and you can see what might creep in when swapping data with others - just checking takes little time compared to recovering from an overconfident assumption that the data and the geometries are correctly attached. It can be done through databases too, the aRT R/Terralib interface offers stronger control, and similar effects can be had with OGR/PostGIS. Best wishes, Roger
Does that sound correct? Thanks, Dylan
Roger
does that help? cheers, Dylan
As of now we are using merge to replace the dataframe slot of the
original file. We first re-order the results from merge to match the
original row ordering:
# an example file:
veg <- readOGR(dsn='ArcGISLabData/BrownsPond/', layer='vegevector')
# some example codes
veg_codes <- data.frame(code=1:4, meaning=c('code 1','code 2','code
3','code 4'))
# join the original data table with the veg codes table
combined <- merge(x=veg at data, y=veg_codes, by.x='CODE', by.y='code')
# overwrite the original data frame with the combined version
# note that the original order needs to be restored
# since the original data was sorted on 'ID', we can use that to
restore # the correct order in the 'combined' dataframe:
v at data <- combined[order(combined$ID),]
In summary, is there a safer or preferred way to do this?
thanks,
Dylan
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no