Hi Robert,
My apologies for an even longer delay this time around. I have gotten
back to this now and followed your suggestions. Responses are
interspersed in (shortened) text of previous messages.
That is *horrible*. I am not sure what is going on here. This is my
rather empirical test to see if you have enough RAM for a given
computation (and to work with chunks if not). Although it is somewhat
inefficient, I have never seen anything crazy like this. I wonder if
you have insufficient RAM and that virtual RAM is created on-disk as
the object grows. I may lower the default maxmemory setting because of
this. You can also set it with setOptions. You can also do
setOptions(todisk=TRUE) ?before using this function (and set it back
afterwards, to force the function to use chunks (canProcessInMemory
will return FALSE before doing the memory consuming test).
I used the setOptions function as you have suggested by embedding it
in my function (full code pasted at the foot of this email):
# Force to write to disk
if(raster:::.toDisk() != TRUE) {
setOptions(todisk = TRUE)
cat("We don't want memory problems--forcing write to disk.\n")
}
With this option, my code completed in 33 minutes (with the largest
raster being ~1 GB). I also reran my code with toDisk = FALSE, and
watched R memory usage via top. As you suspected, memory was
constantly increasing, both actual and virtual. I believe I had enough
RAM available at the time (~2 GB), so there seems to be some problem
there, but perhaps this is an issue for R-Sig-Mac?
all raster functions that start with an "." are hidden but are
accessible via raster:::
e.g.
raster:::.maxmemory
Thanks, I guess I should have already known that...
In some functions that is true, but not in this one. Even if you
supply a filename, the whole thing will be done in memory if possible,
and the only at the end will the resulting raster data be written to
disk. Perhaps it would be safer if it would always work as you
expected.
It's easy enough for me to write the toDIsk = TRUE option into
functions, so thanks again for this tip.
Lastly, there is one small thing that I noticed while working on this
issue, based on the function that follows. I noticed that in a case
such as this:
cat("Substituting...")
subs(spatialmaster, CSMtable, by = LUfield, which = outfields[i],
subsWithNA = TRUE, filename =
out.nm, datatype = type, overwrite = TRUE, progress = "text")
That the progress bar and cat don't play together, in that cat is not
displayed. I am not sure why, but perhaps it is simply being
overwritten in the console by the progress bar? I opted to turn off
the progress bar in favor of the messages produced by cat in this
case. This isn't something I am terribly concerned with, but I thought
I would mention it in case it is of interest.
Thanks again for all your help.
Cheers, Lyndon
spatCSM <- function(spatialmaster, resamplegrid, res.factor, CSMtable,
LUfield, outfields, outnames, type) {
# Creates spatial output grids from results produced by runCSM
function. Path needs to be set in advance
# Args:
# spatialmaster: A grid defining the location of each spatial units
# resamplegrid: An optional grid defining the resolution to which
results should be resampled
# resamplefactor: Factor by which to aggregate (e.g. 10 times
current pixel size, the default if not
# specified)
# CSMtable: The output table of CSM statistics generated by runCSM
# LUfield: The field in CSMtable containing the spatial unit codes
that match values in spatialmaster grid
# outfields: A vector of column names in CSMtable for which gridded
outputs are wanted, e.g. yield, CV yield
# outnames: A vector of names for writing output grids, one for each
specifed grid
# type: Output datatype for grid, e.g. INT2S
# Returns: R format raster grids, saved to disk, of mean yield and
other optional grids
library(raster)
if(missing(spatialmaster) | missing(CSMtable) | missing(LUfield) |
missing(outfields) | missing(type)) {
stop("Missing parameter, check function list.")
}
if(length(outfields) != length(outnames)) {
stop("Number of names for output grids does not match number of
specified output grids")
}
cat("Running...")
# Force to write to disk
if(raster:::.toDisk() != TRUE) {
setOptions(todisk = TRUE)
cat("We don't want memory problems--forcing write to disk.\n")
}
for(i in 1:length(outfields)) {
out.nm <- paste(outnames[i], ".grd", sep = "")
cat("Substituting...")
subs(spatialmaster, CSMtable, by = LUfield, which = outfields[i],
subsWithNA = TRUE, filename =
out.nm, datatype = type, overwrite = TRUE)#,
progress = "text")
if(!missing(resamplegrid)) {
cat("Aggregating...")
if(missing(res.factor)) {
res.factor <- 10
}
subs.g <- raster(out.nm)
out.nm2 <- paste("agg.", out.nm, sep = "")
aggregate(subs.g, res.factor, fun = mean, na.rm = TRUE,
filename = out.nm2, datatype = type, overwrite =
TRUE)#, progress = "text")
subs.g.agg <- raster(out.nm2)
out.nm3 <- paste(outnames[i], ".",
as.integer(res(resamplegrid)[1]), ".grd", sep = "")
cat("Resampling...")
resample(subs.g.agg, resamplegrid, method = "bilinear", filename
= out.nm3, datatype = type,
overwrite = TRUE)#, progress = "text")
}
}
cat("Done.")
setOptions(todisk = FALSE) # Return to default option allowing
raster processing in memory
}