merging data with SpatialPolygonsDataFrame

Fri, Jan 25, 2008 8:57 AM

On Thu, 24 Jan 2008, Dylan Beaudette wrote:

On Thursday 24 January 2008, Roger Bivand wrote:

On Thu, 24 Jan 2008, Dylan Beaudette wrote:

Hi Roger. Thanks for contributing some answers to this.

I was recently working with a colleague on developing some sample
exercises for new students. Since joining new attribute data to a GIS
layer's table is a very common operation we included some samples on how
to do this within R. You have hinted at some possible ways to do it
above, but do you have a 'best practices' approach to doing this using
'sp' methods and objects?

For example:

# contains an attribute col named 'veg_code'
veg <- readOGR(something.shp)

# code meanings: indexed by 'veg_code'
codes <- read.dbf(table.dbf)

# what is the best way to join up the attributes in 'veg' with the rows
in 'codes' ?

Hi Dylan,

This is a different question, but I won't break it out of this thread yet.

You are doing look-up on codes to give labels to veg$veg_code, right?

veg$veg_code are integer indices to codes$V1 (say V1, I don't know what it
is). If length(unique(veg$veg_code)) == length(codes$V1), and
sort(unique(veg$veg_code)) is 1:length(codes$V1), you should think of the
factor as your friend:

veg$veg_code_factor <- factor(veg$veg_code, labels=as.character(codes$V1))

If not, you need another layer using perhaps order() or match() on the
matching substring of codes$V1 to find out which value of veg$veg_code
should have which label in as.character(codes$V1). Alternatively use the
levels= argument to factor().

Something like:

set.seed(1)
veg_code <- rpois(100, 4)
table(veg_code)
V1 <- paste("code", 0:10)
V1
levs <- 0:10
veg_code_factor <- factor(veg_code, levels=levs, labels=V1)
table(veg_code_factor, veg_code)

No merging or messing with veg itself is needed, apart from adding a
single extra factor column. The factor abstraction is a great strength of
the S language.

Have I misunderstood you?

Roger

I think so.

I was (trying to) describe the process of joining, either 1:1 or many:1, the
att table associated with an sp object and some other data frame.

merge() seems to work fine, but the order of the rows are different from the
original data frame attached to the sp object.

My experience is that merge() is often a false friend, because of object 
coercion and sorting, as you suggest. The rules for SpatialLinesDataFrame 
and SpatialPolygonsDataFrame are simple, and SpatialPointsDataFrame (and 
by extension SpatialPixelsDataFrame) can be made as simple. The first two
constructor functions take a match.ID= argument, default TRUE, which 
matches the geometry ID - the ID slot in the component Lines or Polygons 
objects - to the data.frame row.name. For SpatialPointsDataFrame(), the 
same argument exists, but is only used if the coordinate matrix has row 
names to be matched to the data.frame row names. If it has been used, the 
data slot row names are the geometry IDs.

trial and error is needed to reach a satisfactory result, that is a single 
data frame with row names matching the geometry IDs (not necessarily in 
the right order, but the same set of IDs).

For one to one, there isn't really an alternative to extracting the data 
slot, do the merge, check the geometry IDs against the output data frame 
row names, and re-construct the object. That is, essentially where this 
thread began.

If there is less data than geometries, merge() should fill out with NAs. 
If there is more data than geometries, the output data frame will need 
subsetting. Both match(), order(), and %in% are very handy here.

For one geometry list to many - see reshape() first, for example to 
flatten a "tall" set of space/time observations so that the time values 
become new attribute columns. This could be met stations stacked by 
station and date, which need widening to stations by date*attributes

If need be, the IDs of the geometries can be changed too, see 
?"spChFIDs-methods" in maptools.

Any closer?

Roger

Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

merging data with SpatialPolygonsDataFrame

Thread (8 messages)