Skip to content

Subsetting RasterBrick very slow

6 messages · Christian Kamenik, Rick Reeves, steven mosher +2 more

#
Dear all,

The package 'raster' provides excellent tools to work on geographic 
data. Subsetting a RasterBrick object, however, takes ages.

I have the following RasterBrick object:

     Temp.CRU.cropped

     class:                  RasterBrick
     filename:
     dimensions:        43, 99, 1200  (nrow, ncol, nlayers)
     ncell:                  4257
     projection:         +proj=longlat +ellps=WGS84
     min values:        -14.8 -22.1 -15.0  -8.6  -3.8   4.7   8.7   
6.0   1.6  -2.4 ...
     max values:       2.1 -3.9 -2.1  1.8  6.6 15.7 18.0 14.8 10.7  7.7 ...
     extent:               15, 31.5, 64, 71.16667  (xmin, xmax, ymin, ymax)
     resolution:         0.1666667, 0.1666667  (x, y)

These are temperature readings from northern Fennoscandia, and each 
layer represents a month, thus each grid cell has monthly readings over 
the last 100 years.

Now I want to extract the readings for only July:

     Months <- rep(month.abb,nlayers(Temp.CRU.cropped)/12)
     Temp.CRU.J <- subset(Temp.CRU.cropped,which(Months=='Jul'))

But as I said, this takes ages. It is much faster to do the subsetting 
on an array:

     CRU.values <- getValues(Temp.CRU.cropped)
     Temp.CRU.J <- CRU.values[,which(Months=='Jul')]
     Temp.CRU.J <- 
array(Temp.CRU.J,dim=c(nrow(Temp.CRU.cropped),ncol(Temp.CRU.cropped),nlayers(Temp.CRU.cropped)/12))
     Temp.CRU.J <- brick(Temp.CRU.J)
     Temp.CRU.J <- setExtent(Temp.CRU.J,extent(Temp.CRU.cropped))
     projection(Temp.CRU.J) <- projection(Temp.CRU.cropped)

It is strange that several lines of code run much faster than the 
'subset' command, and I was wondering why 'subset' takes at least a 1000 
times longer.

I would be interested in your opinion.

Many thanks, Christian
#
Hello Christian:
Yes indeed, my experience is that Raster package operations on very 
large datasets take a long time,
on either Linux or Windows desktops.

As an alternative to the Raster package, I have had good experience 
using the GDAL raster image utilities,
embedded within shell or Python scripts to perform first-order raster 
subsampling and subimage generation.
Check out the gdalinfo, gdal_translate, and gdalwarp utilities. I have 
been using these to work with 100 gigabyte-class
raster DEM images, with great success and relatively low execution times 
(considering the size of the data sets).

HTH,
Rick Reeves
NCEAS
On 7/1/2011 7:02 AM, Christian Kamenik wrote:
2 days later
#
Hi,

Navigating through the code, I learn that subset first makes a
conversion from a RasterBrick to a RasterStack:
Method Definition:

function (x, ...) 
{
    .local <- function (x, subset, drop = TRUE, ...) 
    {
        x <- stack(x)
        subset(x, subset = subset, drop = drop, ...)
    }
    .local(x, ...)
}
<environment: namespace:raster>

Signatures:
        x       
target  "Raster"
defined "Raster"

With a toy example I find that the slow part is due to this conversion:

mat <- array(runif(12e4), dim=c(10, 10, 1200))
b <- brick(mat)
idx <- seq(from=as.Date('1910-01-01'), to=as.Date('2009-12-31'),
by='month')

month <- function(x)as.numeric(format(x, '%m'))
idxJuly <- which(month(idx)==7)

##conversion from RasterBrick to RasterStack
system.time(s <- stack(b))
   user  system elapsed 
 35.570   0.000  36.603 

##subset with the index
system.time(sJuly <- subset(s, idxJuly))
   user  system elapsed 
  0.080   0.000   0.098 


Another approach is to build a list of RasterLayers with *only* the
layers to keep:

system.time({
  l <- lapply(idxJuly, function(n)raster(b, n))
  sJuly2 <- do.call(stack, l)
  }
)           
   user  system elapsed 
  1.684   0.000   1.755 
 
Anyway, if your final objective is to calculate over subsets of the
RasterBrick, then you may find useful the new zApply function:

##First a time index is added to the RasterBrick
b <- setZ(b, idx)

system.time(meanMonthly <- zApply(b, by=month, FUN=mean))
   user  system elapsed 
  0.696   0.000   0.718 

Cheers,

Oscar.


--------------
Oscar Perpi??n Lamigueiro
Dpto. de Ingenier?a El?ctrica
EUITI-UPM

http://procomun.wordpress.com
#
Christian, 

Thanks for reporting this. I have improved the subset routine in raster
version 1.8-39 (on CRAN now for linux, but not built yet for win and mac).
In some cases nearly 1000x faster, as you also reported, but other cases
need further optimization. 

Best, 
Robert


--
View this message in context: http://r-sig-geo.2731867.n2.nabble.com/Subsetting-RasterBrick-very-slow-tp6537890p6549140.html
Sent from the R-sig-geo mailing list archive at Nabble.com.
1 day later