Skip to content

"Merge" shapefiles

4 messages · Steven Ranney, Tyler Frazier, Roger Bivand +1 more

#
All -

I am slowly learning more about spatial data in R.  However, I am still a
relative neophyte.

What I want to do:

I have two shapefiles, shpA has ~401,000 individual polygons with
attributes.  shpB is a subset of those polygons with different attribute
data.  Even though shpB is a subset of those data, there may be multiple
rows for a given polyon, thus giving shpB more total rows (~780,000).

Effectively, I want to merge these two shapefiles.  With two dataFrame
objects in R, I would merge them like

merge(shpA, shpB, by = "APN_LABEL", all = TRUE)

but apparently, this doesn't work with shapefiles.  I have tried

merge(shpA at data, shpB at data, by = "APN_LABEL", all = TRUE)

which creates a dataFrame of the the two files but drops all of the spatial
geometries.

I've looked into gUnion() as it seems like that may be what I'm looking
for, but I get the following error:

tmp <- gUnion(shpA, shpB)
Error in RGEOSBinTopoFunc(spgeom1, spgeom2, byid, id, drop_lower_td,
"rgeos_union") :
  std::bad_alloc

Ultimately, I want a shapeFile of all ~401,000 geometries in shpA that
includes ALL of the attribute data from shpB that may exist in multiple
rows for a given polygon.

Is this possible?  Is this simple?

Steven H. Ranney
#
Not to belittle the spatial capabilities of R, but this sounds like a function that would be better addressed with PostgreSQL/postgis.  Integrating r & pgsql can be a good combination.

Sent from my iPhone
#
On Fri, 14 Nov 2014, Tyler Frazier wrote:

            
Maybe, but clarity of thinking is perhaps what is needed, it always helps 
more than guesswork. If you already know PostGIS, you'd also need clarity 
of thinking, and the steps would be very similar, although with the 
possibility to link identical objects.
You must decide what you want to do in detail, for instance whether these 
representations make any sense. You do not provide a motivation or an 
affiliation, which make it hard to guess your application domain (ecology, 
real estate, whatever).

You have ~401,000 individual polygons with IDs and some data, are they 
unique? Do they overlap? Are they home ranges (which may overlap), census 
blocks (which shouldn't)?

Then you have extra data that happens to be in a messy shapefile with 
repeated geometries, all of which match some of those in the the first 
data set (it never needed to be a shapefile, and probably never should 
have been). Can you match them by ID (match() is much stronger than 
merge(), because it shows you what is matching)?

Note that you expect to get >=0 matches on each geometry from the first 
object, you need to control what is going on, because the maximum number 
of matches will determine the number of columns in the output (with lots 
of missing values where there are fewer than this. Are the repeat 
geometries there because the repeats are at different times? Should you be 
trying to construct an appropriate space-time object if this is the case?
Yes, of course, what did you expect? The only references available say 
that there is no merge method for Spatial* objects, and you are anyway 
taking their data slots, which are data frames. If the output object has 
the same number of rows as shpA, and its row.names() matches that of shpA, 
you may have what you want (create a new SPDF object with the 
SpatialPolygons from shpA, and the output from merge as its data slot), 
but beware of merge() re-ordering rows. This is, however dependent on 
prior checking for consistency in the IDs.
Just fishing without understanding is always pretty hopeless. Why would 
you expect that a function that is declared to only handle geometries 
could sort out your data cleaning problem?
Yes, but you need to think first; I'm not even sure why these polygons 
might be meaningful anyway - you didn't say. Guessing by function name 
really doesn't help. Did reading the "combine_maptools" vignette help?

http://cran.r-project.org/web/packages/maptools/vignettes/combine_maptools.pdf

Hope this clarifies,

Roger

  
    
#
Steven,

Please do provide a self-contained example when you ask a question
(see my example below) and try to use correct terms. You are dealing
with SpatialPolygonDataFrame objects (shpA, shpB), these objects are
perhaps derived from shapefiles, but it is not what they are.  Also,
providing a motivation is very important, particularly if you are self
proclaimed newbie. I think that merge should be able to deal with this
case, for if you really want it, through an argument in the function
that with give you an error for the default value, because in most
practical cases this truly reveals an error.

Here is an example of a route you might take. Please carefully check
if it does what you need (I did not do that).

Robert


library(raster)

# Get a SpatialPolygonsDataFrame
p <- shapefile(system.file("external/lux.shp", package="raster"))

# make a subset
s <- p[1:3, ]

# merge (combines a Spatial* object with a data.frame. ID_2 is a unique ID
m <- merge(p, data.frame(s), all=TRUE, by='ID_2')

# makes a subset with multiple instances of the same polygon
ss <- bind(s, s[1:2,])
# add something unique to each record
ss$newvar <- 1:nrow(ss)

# merge fails now
# you should have shown us something like this. There very little
value in talking about shpA and shpB as we do not have not have (or
want) access to the data

m <- merge(p, data.frame(ss), all=TRUE, by='ID_2')

# this is the error.
# Error in .local(x, y, ...) :
#  'y' has multiple records for one or more 'by.y' key(s)


# so let's merge the data.frames (using two Spatial Objects)
d <- merge(p, ss, all=TRUE, by='ID_2')

# link table d to SpatialPolgyons object with all records, p
i <- match(d$ID_2, p$ID_2)

# get the polygons we need
x <- p[i, ]

# link the polygons to the merged table
y <- SpatialPolygonsDataFrame(as(x, 'SpatialPolygons'), d, match.ID=FALSE)

# inspect
p
y
data.frame(p)
data.frame(ss)
data.frame(y)
On Fri, Nov 14, 2014 at 2:23 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote: