On Sat, 30 Apr 2011, Dylan Beaudette wrote:
On Friday, April 29, 2011, Ned Horning wrote:
Hi - I am trying to select 1000 random samples from each set of ESRI Shapefile polygons with the same attribute value (attName). This has worked well in the past with other shapefiles. The file I'm using now has 69 relatively simple polygons with 9 unique values for the attribute I'm using to group the polygons. When I run spsample it takes several minutes to collect the samples for some of the unique attribute values and it is using well over 12GB of memory. By the time it start collecting samples from the last set of polygons my swap space is nearly exhausted and the spsample process goes on for hours until I kill it. Do these symptoms sound familiar to anyone? Any ideas about what the problem might be? I pasted my code below. Thanks in advance for any help. Ned
Hi, Not sure if this is the problem, but it could be that dynamically appending to the object 'xy' is causing some of the memory consumption / slowness. I would suggest iterating over collections of polygons, and saving the results to a pre-allocated list object. Then, rbind the elements of the list back together at the end.
Hi,
The polygons are very small, spread out over a very large area, so
spsample tries to generate many points for the bounding box of the object,
rejecting most of them. As Dylan suggests, you can get a long way by
stepping only to the polygons you need:
for (x in 1:length(uniqueAtt)) {
class_data<- vec[vec[[attName]]==uniqueAtt[x],]
areas <- sapply(slot(class_data, "polygons"), slot, "area")
nsamps <- ceiling(numsamps*(areas/sum(areas)))
for (i in 1:dim(class_data)[1]) {
XY <- spsample(class_data[i,], type="random", n=nsamps[i])
# step Polygons object by Polygons object, dividing the numsamps by area
if (i == 1) cpts <- XY
else cpts <- rbind(cpts, XY)
}
# maybe need to modify the number of points to match numsamps exactly.
classpts <- cpts
if (x == 1) {
xy<- classpts
} else {
xy<- rbind(xy, classpts)
}
}
The spsample() method for SpatialPolygons does really expect them to be
close to each other, and to fill the object bounding box at least in some
large part.
Hope this helps,
Roger
Cheers, Dylan
--
shapefile <- '/home/nedhorning/AMNH/Galapagos/quickbird_train_4.shp'
numsamps <- 1000
attName <- 'type_id'
vec <- readShapePoly(shapefile)
#
# Create vector of unique land cover attribute values
allAtt <- slot(vec, "data")
tabAtt <-table(allAtt[[attName]])
uniqueAtt <-as.numeric(names(tabAtt))
# Create input data from a Shapefile with training data and the image
for (x in 1:length(uniqueAtt)) {
# Create a vector of all date for attName=x
class_data <- vec[vec[[attName]]==uniqueAtt[x],]
# Get random points for the class
classpts <- spsample(class_data, type="random", n=numsamps)
#Append data for all but the first run
if (x == 1) {
xy <- classpts
} else {
xy <- rbind(xy, classpts)
}
}
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no