All,
I'm working with the 2010 TIGER/Line shapefile road data, and I'm
dealing with alternative road names for the same road segment. There are
two relevant files in the TIGER collection, the first is the road edge
shapefile, which has a unique record (and geometry) for each road
segment. The other is a featnames dbf file which can have multiple
records for the same road segment (but an ID that identifies the road
segment). One of the reasons this second file was created was to deal
with situations where a portion of a road is known by two or more
different names (for example, Hwy 50 and Main Street). My goal is to
create a SpatialLinesDataFrame object that contains the unique road
segment / road name combinations, which will result in a set of line
geometries that are not unique. I've looked at the spCbind methods, but
my reading of the documentation suggests it will not address this case
directly since the feature IDs would not be unique.
I can create a new SpatialLinesDataFrame that has a row for each
possible unique road segment and road name combination, and I can then
use spCbind to attach the needed attribute information to this object.
Unfortunately, the way I can think of creating the SpatialLinesDataFrame
object is a great example of what *not* to do in S language programming,
specifically, use a for-loop to "grow" a data frame like object using
rbind. Below is a snippet of my present code:
geom1 <- readOGR(dsn=roads_dsn, layer=roads_layer)
geom1 <- geom1[, "TLID"]
# Let the nastiness begin. Time to build-up the needed geometries
geom <- spChFIDs(geom1[geom1$TLID == roadsf$TLID[1],], "1")
for(i in 2:nrow(roadsf)) {
new_geom <- spChFIDs(geom1[geom1$TLID == roadsf$TLID[i],],
as.character(i))
geom <- rbind(geom, new_geom)
}
In the code, the data frame "roadsf" contains the attribute data for
each unique road segment / road name pair, and the variable TLID is the
unique ID for the road segment geometries. This works, but it is really
slow (it is CPU bound, but not I/O bound). My question is have I missed
an easier solution? If I haven't, how could I go about doing this more
cleverly? Given the need to alter the feature IDs, I don't see a nice
way to use one the apply family of functions. Alternatively, is this
just something that is better suited to the use of a spatial database
tool like PostGIS or SpatiaLite? I can guess the query looks like:
SELECT * FROM table1 AS a, the_geom FROM table2 AS b WHERE a.id = b.id
Dan
A one to many merge involving a spatial data frame object
2 messages · Dan Putler, Paul Hiemstra
5 days later
On 08/09/2011 08:23 PM, Dan Putler wrote:
All,
I'm working with the 2010 TIGER/Line shapefile road data, and I'm
dealing with alternative road names for the same road segment. There
are two relevant files in the TIGER collection, the first is the road
edge shapefile, which has a unique record (and geometry) for each road
segment. The other is a featnames dbf file which can have multiple
records for the same road segment (but an ID that identifies the road
segment). One of the reasons this second file was created was to deal
with situations where a portion of a road is known by two or more
different names (for example, Hwy 50 and Main Street). My goal is to
create a SpatialLinesDataFrame object that contains the unique road
segment / road name combinations, which will result in a set of line
geometries that are not unique. I've looked at the spCbind methods,
but my reading of the documentation suggests it will not address this
case directly since the feature IDs would not be unique.
I can create a new SpatialLinesDataFrame that has a row for each
possible unique road segment and road name combination, and I can then
use spCbind to attach the needed attribute information to this object.
Unfortunately, the way I can think of creating the
SpatialLinesDataFrame object is a great example of what *not* to do in
S language programming, specifically, use a for-loop to "grow" a data
frame like object using rbind. Below is a snippet of my present code:
geom1 <- readOGR(dsn=roads_dsn, layer=roads_layer)
geom1 <- geom1[, "TLID"]
# Let the nastiness begin. Time to build-up the needed geometries
geom <- spChFIDs(geom1[geom1$TLID == roadsf$TLID[1],], "1")
for(i in 2:nrow(roadsf)) {
new_geom <- spChFIDs(geom1[geom1$TLID == roadsf$TLID[i],],
as.character(i))
geom <- rbind(geom, new_geom)
In general in R concatenating objects in this way is quite a slow process (assuming that nrow(roadsf) is quite big). The object geom keeps on growing and the memory allocated to that object is continuously updated to allow for the growing object. This is very slow and should be avoided. Pre-allocating the space needed for geom could speed this code up quite a bit. Alternatively, I often use functions from the plyr package to circumvent this problem. However, without a working example from your side I cannot provide sample code that uses the plyr package. regards, Paul
} In the code, the data frame "roadsf" contains the attribute data for each unique road segment / road name pair, and the variable TLID is the unique ID for the road segment geometries. This works, but it is really slow (it is CPU bound, but not I/O bound). My question is have I missed an easier solution? If I haven't, how could I go about doing this more cleverly? Given the need to alter the feature IDs, I don't see a nice way to use one the apply family of functions. Alternatively, is this just something that is better suited to the use of a spatial database tool like PostGIS or SpatiaLite? I can guess the query looks like: SELECT * FROM table1 AS a, the_geom FROM table2 AS b WHERE a.id = b.id Dan
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770