subs (raster) on large dataset
With this option, my code completed in 33 minutes (with the largest raster being ~1 GB). ?I also reran my code with toDisk = FALSE, and watched R memory usage via top. ?As you suspected, memory was constantly increasing, both actual and virtual. I believe I had enough RAM available at the time (~2 GB), so there seems to be some problem there, but perhaps this is an issue for R-Sig-Mac?
Hi Lyndon, Any suggestion about how to determine the amount of RAM (true RAM, without going to disk) on a given operating system would be very useful. My approach is too empirical, and clearly not watertight. I have lowered the default expectations (options), to avoid this type of behavior. For better speed those with a lot of RAM can use setOptions(maxmemory= ) to use larger chunks of data.
Lastly, there is one small thing that I noticed while working on this
issue, based on the function that follows. ?I noticed that in a case
such as this:
? ?cat("Substituting...")
? ?subs(spatialmaster, CSMtable, by = LUfield, which = outfields[i],
subsWithNA = TRUE, filename =
? ? ? ? ? ? ? ? ? out.nm, datatype = type, overwrite = TRUE, progress = "text")
That the progress bar and cat don't play together, in that cat is not
displayed. I am not sure why, but perhaps it is simply being
overwritten in the console by the progress bar? ?I opted to turn off
the progress bar in favor of the messages produced by cat in this
case. This isn't something I am terribly concerned with, but I thought
I would mention it in case it is of interest.
In this case you can use progress = "window" Best, Robert
On Fri, May 20, 2011 at 10:48 AM, Lyndon Estes <lestes at princeton.edu> wrote:
Hi Robert, My apologies for an even longer delay this time around. ?I have gotten back to this now and followed your suggestions. ?Responses are interspersed in (shortened) text of previous messages.
That is *horrible*. I am not sure what is going on here. This is my rather empirical test to see if you have enough RAM for a given computation (and to work with chunks if not). Although it is somewhat inefficient, I have never seen anything crazy like this. I wonder if you have insufficient RAM and that virtual RAM is created on-disk as the object grows. I may lower the default maxmemory setting because of this. You can also set it with setOptions. You can also do setOptions(todisk=TRUE) ?before using this function (and set it back afterwards, to force the function to use chunks (canProcessInMemory will return FALSE before doing the memory consuming test).
I used the setOptions function as you have suggested by embedding it
in my function (full code pasted at the foot of this email):
# Force to write to disk
if(raster:::.toDisk() != TRUE) {
? setOptions(todisk = TRUE)
? cat("We don't want memory problems--forcing write to disk.\n")
}
With this option, my code completed in 33 minutes (with the largest
raster being ~1 GB). ?I also reran my code with toDisk = FALSE, and
watched R memory usage via top. ?As you suspected, memory was
constantly increasing, both actual and virtual. I believe I had enough
RAM available at the time (~2 GB), so there seems to be some problem
there, but perhaps this is an issue for R-Sig-Mac?
all raster functions that start with an "." are hidden but are accessible via raster::: e.g. raster:::.maxmemory
Thanks, I guess I should have already known that...
In some functions that is true, but not in this one. Even if you supply a filename, the whole thing will be done in memory if possible, and the only at the end will the resulting raster data be written to disk. Perhaps it would be safer if it would always work as you expected.
It's easy enough for me to write the toDIsk = TRUE option into
functions, so thanks again for this tip.
Lastly, there is one small thing that I noticed while working on this
issue, based on the function that follows. ?I noticed that in a case
such as this:
? ?cat("Substituting...")
? ?subs(spatialmaster, CSMtable, by = LUfield, which = outfields[i],
subsWithNA = TRUE, filename =
? ? ? ? ? ? ? ? ? out.nm, datatype = type, overwrite = TRUE, progress = "text")
That the progress bar and cat don't play together, in that cat is not
displayed. I am not sure why, but perhaps it is simply being
overwritten in the console by the progress bar? ?I opted to turn off
the progress bar in favor of the messages produced by cat in this
case. This isn't something I am terribly concerned with, but I thought
I would mention it in case it is of interest.
Thanks again for all your help.
Cheers, Lyndon
spatCSM <- function(spatialmaster, resamplegrid, res.factor, CSMtable,
LUfield, outfields, outnames, type) {
# Creates spatial output grids from results produced by runCSM
function. Path needs to be set in advance
# Args:
# ? spatialmaster: A grid defining the location of each spatial units
# ? resamplegrid: An optional grid defining the resolution to which
results should be resampled
# ? resamplefactor: Factor by which to aggregate (e.g. 10 times
current pixel size, the default if not
# ? ? specified)
# ? CSMtable: The output table of CSM statistics generated by runCSM
# ? LUfield: The field in CSMtable containing the spatial unit codes
that match values in spatialmaster grid
# ? outfields: A vector of column names in CSMtable for which gridded
outputs are wanted, e.g. yield, CV yield
# ? outnames: A vector of names for writing output grids, one for each
specifed grid
# ? type: Output datatype for grid, e.g. INT2S
# Returns: R format raster grids, saved to disk, of mean yield and
other optional grids
?library(raster)
?if(missing(spatialmaster) | missing(CSMtable) | missing(LUfield) |
? ? missing(outfields) | missing(type)) {
? ? stop("Missing parameter, check function list.")
?}
?if(length(outfields) != length(outnames)) {
? ? stop("Number of names for output grids does not match number of
specified output grids")
?}
?cat("Running...")
?# Force to write to disk
?if(raster:::.toDisk() != TRUE) {
? ? setOptions(todisk = TRUE)
? ? cat("We don't want memory problems--forcing write to disk.\n")
?}
?for(i in 1:length(outfields)) {
? ?out.nm <- paste(outnames[i], ".grd", sep = "")
? ?cat("Substituting...")
? ?subs(spatialmaster, CSMtable, by = LUfield, which = outfields[i],
subsWithNA = TRUE, filename =
? ? ? ? ? ? ? ? ? out.nm, datatype = type, overwrite = TRUE)#,
progress = "text")
? ?if(!missing(resamplegrid)) {
? ? ?cat("Aggregating...")
? ? ?if(missing(res.factor)) {
? ? ? ?res.factor <- 10
? ? ?}
? ? ?subs.g <- raster(out.nm)
? ? ?out.nm2 <- paste("agg.", out.nm, sep = "")
? ? ?aggregate(subs.g, res.factor, fun = mean, na.rm = TRUE,
? ? ? ? ? ? ? ?filename = out.nm2, datatype = type, overwrite =
TRUE)#, progress = "text")
? ? ?subs.g.agg <- raster(out.nm2)
? ? ?out.nm3 <- paste(outnames[i], ".",
as.integer(res(resamplegrid)[1]), ".grd", sep = "")
? ? ?cat("Resampling...")
? ? ?resample(subs.g.agg, resamplegrid, method = "bilinear", filename
= out.nm3, datatype = type,
? ? ? ? ? ? ? overwrite = TRUE)#, progress = "text")
? ?}
?}
?cat("Done.")
?setOptions(todisk = FALSE) ?# Return to default option allowing
raster processing in memory
}