All - I am slowly learning more about spatial data in R. However, I am still a relative neophyte. What I want to do: I have two shapefiles, shpA has ~401,000 individual polygons with attributes. shpB is a subset of those polygons with different attribute data. Even though shpB is a subset of those data, there may be multiple rows for a given polyon, thus giving shpB more total rows (~780,000). Effectively, I want to merge these two shapefiles. With two dataFrame objects in R, I would merge them like merge(shpA, shpB, by = "APN_LABEL", all = TRUE) but apparently, this doesn't work with shapefiles. I have tried merge(shpA at data, shpB at data, by = "APN_LABEL", all = TRUE) which creates a dataFrame of the the two files but drops all of the spatial geometries. I've looked into gUnion() as it seems like that may be what I'm looking for, but I get the following error: tmp <- gUnion(shpA, shpB) Error in RGEOSBinTopoFunc(spgeom1, spgeom2, byid, id, drop_lower_td, "rgeos_union") : std::bad_alloc Ultimately, I want a shapeFile of all ~401,000 geometries in shpA that includes ALL of the attribute data from shpB that may exist in multiple rows for a given polygon. Is this possible? Is this simple? Steven H. Ranney
"Merge" shapefiles
4 messages · Steven Ranney, Tyler Frazier, Roger Bivand +1 more
Not to belittle the spatial capabilities of R, but this sounds like a function that would be better addressed with PostgreSQL/postgis. Integrating r & pgsql can be a good combination. Sent from my iPhone
On Nov 14, 2014, at 1:11 AM, Steven Ranney <steven.ranney at gmail.com> wrote: All - I am slowly learning more about spatial data in R. However, I am still a relative neophyte. What I want to do: I have two shapefiles, shpA has ~401,000 individual polygons with attributes. shpB is a subset of those polygons with different attribute data. Even though shpB is a subset of those data, there may be multiple rows for a given polyon, thus giving shpB more total rows (~780,000). Effectively, I want to merge these two shapefiles. With two dataFrame objects in R, I would merge them like merge(shpA, shpB, by = "APN_LABEL", all = TRUE) but apparently, this doesn't work with shapefiles. I have tried merge(shpA at data, shpB at data, by = "APN_LABEL", all = TRUE) which creates a dataFrame of the the two files but drops all of the spatial geometries. I've looked into gUnion() as it seems like that may be what I'm looking for, but I get the following error: tmp <- gUnion(shpA, shpB) Error in RGEOSBinTopoFunc(spgeom1, spgeom2, byid, id, drop_lower_td, "rgeos_union") : std::bad_alloc Ultimately, I want a shapeFile of all ~401,000 geometries in shpA that includes ALL of the attribute data from shpB that may exist in multiple rows for a given polygon. Is this possible? Is this simple? Steven H. Ranney [[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
On Fri, 14 Nov 2014, Tyler Frazier wrote:
Not to belittle the spatial capabilities of R, but this sounds like a function that would be better addressed with PostgreSQL/postgis. Integrating r & pgsql can be a good combination.
Maybe, but clarity of thinking is perhaps what is needed, it always helps more than guesswork. If you already know PostGIS, you'd also need clarity of thinking, and the steps would be very similar, although with the possibility to link identical objects.
Sent from my iPhone
On Nov 14, 2014, at 1:11 AM, Steven Ranney <steven.ranney at gmail.com> wrote: All - I am slowly learning more about spatial data in R. However, I am still a relative neophyte. What I want to do: I have two shapefiles, shpA has ~401,000 individual polygons with attributes. shpB is a subset of those polygons with different attribute data. Even though shpB is a subset of those data, there may be multiple rows for a given polyon, thus giving shpB more total rows (~780,000).
You must decide what you want to do in detail, for instance whether these representations make any sense. You do not provide a motivation or an affiliation, which make it hard to guess your application domain (ecology, real estate, whatever). You have ~401,000 individual polygons with IDs and some data, are they unique? Do they overlap? Are they home ranges (which may overlap), census blocks (which shouldn't)? Then you have extra data that happens to be in a messy shapefile with repeated geometries, all of which match some of those in the the first data set (it never needed to be a shapefile, and probably never should have been). Can you match them by ID (match() is much stronger than merge(), because it shows you what is matching)? Note that you expect to get >=0 matches on each geometry from the first object, you need to control what is going on, because the maximum number of matches will determine the number of columns in the output (with lots of missing values where there are fewer than this. Are the repeat geometries there because the repeats are at different times? Should you be trying to construct an appropriate space-time object if this is the case?
Effectively, I want to merge these two shapefiles. With two dataFrame objects in R, I would merge them like merge(shpA, shpB, by = "APN_LABEL", all = TRUE) but apparently, this doesn't work with shapefiles. I have tried merge(shpA at data, shpB at data, by = "APN_LABEL", all = TRUE) which creates a dataFrame of the the two files but drops all of the spatial geometries.
Yes, of course, what did you expect? The only references available say that there is no merge method for Spatial* objects, and you are anyway taking their data slots, which are data frames. If the output object has the same number of rows as shpA, and its row.names() matches that of shpA, you may have what you want (create a new SPDF object with the SpatialPolygons from shpA, and the output from merge as its data slot), but beware of merge() re-ordering rows. This is, however dependent on prior checking for consistency in the IDs.
I've looked into gUnion() as it seems like that may be what I'm looking for, but I get the following error:
Just fishing without understanding is always pretty hopeless. Why would you expect that a function that is declared to only handle geometries could sort out your data cleaning problem?
tmp <- gUnion(shpA, shpB) Error in RGEOSBinTopoFunc(spgeom1, spgeom2, byid, id, drop_lower_td, "rgeos_union") : std::bad_alloc Ultimately, I want a shapeFile of all ~401,000 geometries in shpA that includes ALL of the attribute data from shpB that may exist in multiple rows for a given polygon.
Yes, but you need to think first; I'm not even sure why these polygons might be meaningful anyway - you didn't say. Guessing by function name really doesn't help. Did reading the "combine_maptools" vignette help? http://cran.r-project.org/web/packages/maptools/vignettes/combine_maptools.pdf Hope this clarifies, Roger
Is this possible? Is this simple? Steven H. Ranney [[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 91 00 e-mail: Roger.Bivand at nhh.no
Steven,
Please do provide a self-contained example when you ask a question
(see my example below) and try to use correct terms. You are dealing
with SpatialPolygonDataFrame objects (shpA, shpB), these objects are
perhaps derived from shapefiles, but it is not what they are. Also,
providing a motivation is very important, particularly if you are self
proclaimed newbie. I think that merge should be able to deal with this
case, for if you really want it, through an argument in the function
that with give you an error for the default value, because in most
practical cases this truly reveals an error.
Here is an example of a route you might take. Please carefully check
if it does what you need (I did not do that).
Robert
library(raster)
# Get a SpatialPolygonsDataFrame
p <- shapefile(system.file("external/lux.shp", package="raster"))
# make a subset
s <- p[1:3, ]
# merge (combines a Spatial* object with a data.frame. ID_2 is a unique ID
m <- merge(p, data.frame(s), all=TRUE, by='ID_2')
# makes a subset with multiple instances of the same polygon
ss <- bind(s, s[1:2,])
# add something unique to each record
ss$newvar <- 1:nrow(ss)
# merge fails now
# you should have shown us something like this. There very little
value in talking about shpA and shpB as we do not have not have (or
want) access to the data
m <- merge(p, data.frame(ss), all=TRUE, by='ID_2')
# this is the error.
# Error in .local(x, y, ...) :
# 'y' has multiple records for one or more 'by.y' key(s)
# so let's merge the data.frames (using two Spatial Objects)
d <- merge(p, ss, all=TRUE, by='ID_2')
# link table d to SpatialPolgyons object with all records, p
i <- match(d$ID_2, p$ID_2)
# get the polygons we need
x <- p[i, ]
# link the polygons to the merged table
y <- SpatialPolygonsDataFrame(as(x, 'SpatialPolygons'), d, match.ID=FALSE)
# inspect
p
y
data.frame(p)
data.frame(ss)
data.frame(y)
On Fri, Nov 14, 2014 at 2:23 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
On Fri, 14 Nov 2014, Tyler Frazier wrote:
Not to belittle the spatial capabilities of R, but this sounds like a function that would be better addressed with PostgreSQL/postgis. Integrating r & pgsql can be a good combination.
Maybe, but clarity of thinking is perhaps what is needed, it always helps more than guesswork. If you already know PostGIS, you'd also need clarity of thinking, and the steps would be very similar, although with the possibility to link identical objects.
Sent from my iPhone
On Nov 14, 2014, at 1:11 AM, Steven Ranney <steven.ranney at gmail.com> wrote: All - I am slowly learning more about spatial data in R. However, I am still a relative neophyte. What I want to do: I have two shapefiles, shpA has ~401,000 individual polygons with attributes. shpB is a subset of those polygons with different attribute data. Even though shpB is a subset of those data, there may be multiple rows for a given polyon, thus giving shpB more total rows (~780,000).
You must decide what you want to do in detail, for instance whether these representations make any sense. You do not provide a motivation or an affiliation, which make it hard to guess your application domain (ecology, real estate, whatever). You have ~401,000 individual polygons with IDs and some data, are they unique? Do they overlap? Are they home ranges (which may overlap), census blocks (which shouldn't)? Then you have extra data that happens to be in a messy shapefile with repeated geometries, all of which match some of those in the the first data set (it never needed to be a shapefile, and probably never should have been). Can you match them by ID (match() is much stronger than merge(), because it shows you what is matching)? Note that you expect to get >=0 matches on each geometry from the first object, you need to control what is going on, because the maximum number of matches will determine the number of columns in the output (with lots of missing values where there are fewer than this. Are the repeat geometries there because the repeats are at different times? Should you be trying to construct an appropriate space-time object if this is the case?
Effectively, I want to merge these two shapefiles. With two dataFrame objects in R, I would merge them like merge(shpA, shpB, by = "APN_LABEL", all = TRUE) but apparently, this doesn't work with shapefiles. I have tried merge(shpA at data, shpB at data, by = "APN_LABEL", all = TRUE) which creates a dataFrame of the the two files but drops all of the spatial geometries.
Yes, of course, what did you expect? The only references available say that there is no merge method for Spatial* objects, and you are anyway taking their data slots, which are data frames. If the output object has the same number of rows as shpA, and its row.names() matches that of shpA, you may have what you want (create a new SPDF object with the SpatialPolygons from shpA, and the output from merge as its data slot), but beware of merge() re-ordering rows. This is, however dependent on prior checking for consistency in the IDs.
I've looked into gUnion() as it seems like that may be what I'm looking for, but I get the following error:
Just fishing without understanding is always pretty hopeless. Why would you expect that a function that is declared to only handle geometries could sort out your data cleaning problem?
tmp <- gUnion(shpA, shpB) Error in RGEOSBinTopoFunc(spgeom1, spgeom2, byid, id, drop_lower_td, "rgeos_union") : std::bad_alloc Ultimately, I want a shapeFile of all ~401,000 geometries in shpA that includes ALL of the attribute data from shpB that may exist in multiple rows for a given polygon.
Yes, but you need to think first; I'm not even sure why these polygons might be meaningful anyway - you didn't say. Guessing by function name really doesn't help. Did reading the "combine_maptools" vignette help? http://cran.r-project.org/web/packages/maptools/vignettes/combine_maptools.pdf Hope this clarifies, Roger
Is this possible? Is this simple? Steven H. Ranney [[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 91 00 e-mail: Roger.Bivand at nhh.no
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo