Skip to content
Back to formatted view

Raw Message

Message-ID: <BANLkTin5ZYRynaL9aqhg33SU2ANWq+dMQw@mail.gmail.com>
Date: 2011-05-24T18:19:05Z
From: Robert J. Hijmans
Subject: subs (raster) on large dataset
In-Reply-To: <BANLkTinXP5AJDW9yHWp7Co_pFGizw=+YqQ@mail.gmail.com>

> With this option, my code completed in 33 minutes (with the largest
> raster being ~1 GB). ?I also reran my code with toDisk = FALSE, and
> watched R memory usage via top. ?As you suspected, memory was
> constantly increasing, both actual and virtual. I believe I had enough
> RAM available at the time (~2 GB), so there seems to be some problem
> there, but perhaps this is an issue for R-Sig-Mac?
Hi Lyndon,

Any suggestion about how to determine the amount of RAM (true RAM,
without going to disk) on a given operating system would be very
useful. My approach is too empirical, and clearly not watertight.

I have lowered the default expectations (options), to avoid this type
of behavior. For better speed those with a lot of RAM can use
setOptions(maxmemory= ) to use larger chunks of data.


> Lastly, there is one small thing that I noticed while working on this
> issue, based on the function that follows. ?I noticed that in a case
> such as this:
>
> ? ?cat("Substituting...")
> ? ?subs(spatialmaster, CSMtable, by = LUfield, which = outfields[i],
> subsWithNA = TRUE, filename =
> ? ? ? ? ? ? ? ? ? out.nm, datatype = type, overwrite = TRUE, progress = "text")
>
> That the progress bar and cat don't play together, in that cat is not
> displayed. I am not sure why, but perhaps it is simply being
> overwritten in the console by the progress bar? ?I opted to turn off
> the progress bar in favor of the messages produced by cat in this
> case. This isn't something I am terribly concerned with, but I thought
> I would mention it in case it is of interest.


In this case you can use progress = "window"

Best, Robert
On Fri, May 20, 2011 at 10:48 AM, Lyndon Estes <lestes at princeton.edu> wrote:
> Hi Robert,
>
> My apologies for an even longer delay this time around. ?I have gotten
> back to this now and followed your suggestions. ?Responses are
> interspersed in (shortened) text of previous messages.
>
>> That is *horrible*. I am not sure what is going on here. This is my
>> rather empirical test to see if you have enough RAM for a given
>> computation (and to work with chunks if not). Although it is somewhat
>> inefficient, I have never seen anything crazy like this. I wonder if
>> you have insufficient RAM and that virtual RAM is created on-disk as
>> the object grows. I may lower the default maxmemory setting because of
>> this. You can also set it with setOptions. You can also do
>> setOptions(todisk=TRUE) ?before using this function (and set it back
>> afterwards, to force the function to use chunks (canProcessInMemory
>> will return FALSE before doing the memory consuming test).
>
> I used the setOptions function as you have suggested by embedding it
> in my function (full code pasted at the foot of this email):
>
> # Force to write to disk
> if(raster:::.toDisk() != TRUE) {
> ? setOptions(todisk = TRUE)
> ? cat("We don't want memory problems--forcing write to disk.\n")
> }
>
> With this option, my code completed in 33 minutes (with the largest
> raster being ~1 GB). ?I also reran my code with toDisk = FALSE, and
> watched R memory usage via top. ?As you suspected, memory was
> constantly increasing, both actual and virtual. I believe I had enough
> RAM available at the time (~2 GB), so there seems to be some problem
> there, but perhaps this is an issue for R-Sig-Mac?
>
>> all raster functions that start with an "." are hidden but are
>> accessible via raster:::
>> e.g.
>> raster:::.maxmemory
>
> Thanks, I guess I should have already known that...
>
>> In some functions that is true, but not in this one. Even if you
>> supply a filename, the whole thing will be done in memory if possible,
>> and the only at the end will the resulting raster data be written to
>> disk. Perhaps it would be safer if it would always work as you
>> expected.
>
> It's easy enough for me to write the toDIsk = TRUE option into
> functions, so thanks again for this tip.
>
> Lastly, there is one small thing that I noticed while working on this
> issue, based on the function that follows. ?I noticed that in a case
> such as this:
>
> ? ?cat("Substituting...")
> ? ?subs(spatialmaster, CSMtable, by = LUfield, which = outfields[i],
> subsWithNA = TRUE, filename =
> ? ? ? ? ? ? ? ? ? out.nm, datatype = type, overwrite = TRUE, progress = "text")
>
> That the progress bar and cat don't play together, in that cat is not
> displayed. I am not sure why, but perhaps it is simply being
> overwritten in the console by the progress bar? ?I opted to turn off
> the progress bar in favor of the messages produced by cat in this
> case. This isn't something I am terribly concerned with, but I thought
> I would mention it in case it is of interest.
>
> Thanks again for all your help.
>
> Cheers, Lyndon
>
>
>
> spatCSM <- function(spatialmaster, resamplegrid, res.factor, CSMtable,
> LUfield, outfields, outnames, type) {
> # Creates spatial output grids from results produced by runCSM
> function. Path needs to be set in advance
> # Args:
> # ? spatialmaster: A grid defining the location of each spatial units
> # ? resamplegrid: An optional grid defining the resolution to which
> results should be resampled
> # ? resamplefactor: Factor by which to aggregate (e.g. 10 times
> current pixel size, the default if not
> # ? ? specified)
> # ? CSMtable: The output table of CSM statistics generated by runCSM
> # ? LUfield: The field in CSMtable containing the spatial unit codes
> that match values in spatialmaster grid
> # ? outfields: A vector of column names in CSMtable for which gridded
> outputs are wanted, e.g. yield, CV yield
> # ? outnames: A vector of names for writing output grids, one for each
> specifed grid
> # ? type: Output datatype for grid, e.g. INT2S
> # Returns: R format raster grids, saved to disk, of mean yield and
> other optional grids
>
> ?library(raster)
> ?if(missing(spatialmaster) | missing(CSMtable) | missing(LUfield) |
> ? ? missing(outfields) | missing(type)) {
> ? ? stop("Missing parameter, check function list.")
> ?}
> ?if(length(outfields) != length(outnames)) {
> ? ? stop("Number of names for output grids does not match number of
> specified output grids")
> ?}
>
> ?cat("Running...")
>
> ?# Force to write to disk
> ?if(raster:::.toDisk() != TRUE) {
> ? ? setOptions(todisk = TRUE)
> ? ? cat("We don't want memory problems--forcing write to disk.\n")
> ?}
>
> ?for(i in 1:length(outfields)) {
>
> ? ?out.nm <- paste(outnames[i], ".grd", sep = "")
> ? ?cat("Substituting...")
> ? ?subs(spatialmaster, CSMtable, by = LUfield, which = outfields[i],
> subsWithNA = TRUE, filename =
> ? ? ? ? ? ? ? ? ? out.nm, datatype = type, overwrite = TRUE)#,
> progress = "text")
>
> ? ?if(!missing(resamplegrid)) {
> ? ? ?cat("Aggregating...")
> ? ? ?if(missing(res.factor)) {
> ? ? ? ?res.factor <- 10
> ? ? ?}
> ? ? ?subs.g <- raster(out.nm)
> ? ? ?out.nm2 <- paste("agg.", out.nm, sep = "")
> ? ? ?aggregate(subs.g, res.factor, fun = mean, na.rm = TRUE,
> ? ? ? ? ? ? ? ?filename = out.nm2, datatype = type, overwrite =
> TRUE)#, progress = "text")
> ? ? ?subs.g.agg <- raster(out.nm2)
> ? ? ?out.nm3 <- paste(outnames[i], ".",
> as.integer(res(resamplegrid)[1]), ".grd", sep = "")
> ? ? ?cat("Resampling...")
> ? ? ?resample(subs.g.agg, resamplegrid, method = "bilinear", filename
> = out.nm3, datatype = type,
> ? ? ? ? ? ? ? overwrite = TRUE)#, progress = "text")
> ? ?}
> ?}
> ?cat("Done.")
> ?setOptions(todisk = FALSE) ?# Return to default option allowing
> raster processing in memory
> }
>