Odd behavior of dismo's extract function
On Mon, 25 Jul 2016 at 11:35 Dan Warren <dan.l.warren at gmail.com> wrote:
Just realized I pasted in the results backwards. It should have been system.time(extract.test(env, 250)) user system elapsed 124.562 0.516 125.061 system.time(extract.test(env, 251)) user system elapsed 2.807 0.084 2.891
I don't see the effect.
Perhaps it was fixed in recent version of raster?
Please post reproducible details, I downloaded your data files to
"test/testdata/" to try this.
Cheers, Mike.
library(raster)
library(dismo)
extract.test <- function(env, N){
extract(env, dismo::randomPoints(env, N))
}
env.files <- list.files(path = "test/testdata/", pattern = "pc", full.names
=
TRUE)
env <- raster::stack(env.files)
library(rbenchmark)
benchmark(n250 = extract.test(env, 250),
n251 = extract.test(env, 251), replications = 4)
# test replications elapsed relative user.self sys.self user.child sys.child
# 1 n250 4 6.31 1.008 5.13 1.14 NA
NA
# 2 n251 4 6.26 1.000 5.02 1.22 NA
NA
devtools::session_info()
# Session info
-------------------------------------------------------------------------------------------------------------------------------
# setting value
# version R version 3.3.1 Patched (2016-07-09 r70874)
# system x86_64, mingw32
# ui RStudio (0.99.1261)
# language (EN)
# collate English_Australia.1252
# tz Australia/Hobart
# date 2016-07-25
#
# Packages
-----------------------------------------------------------------------------------------------------------------------------------
# package * version date source
# devtools * 1.12.0 2016-06-24 CRAN (R 3.3.1)
# digest 0.6.9 2016-01-08 CRAN (R 3.3.1)
# dismo * 1.1-1 2016-06-16 CRAN (R 3.3.1)
# evaluate 0.9 2016-04-29 CRAN (R 3.3.1)
# htmltools 0.3.5 2016-03-21 CRAN (R 3.3.1)
# knitr 1.13 2016-05-09 CRAN (R 3.3.1)
# lattice 0.20-33 2015-07-14 CRAN (R 3.3.1)
# magrittr 1.5 2014-11-22 CRAN (R 3.3.1)
# memoise 1.0.0 2016-01-29 CRAN (R 3.3.1)
# raster * 2.5-8 2016-06-02 CRAN (R 3.3.1)
# rbenchmark * 1.0.0 2012-08-30 CRAN (R 3.3.0)
# Rcpp 0.12.5 2016-05-14 CRAN (R 3.3.1)
# rgdal 1.1-10 2016-05-12 CRAN (R 3.3.1)
# rmarkdown 1.0.2 2016-07-19 Github (rstudio/rmarkdown at b65e177)
# sp * 1.2-3 2016-04-14 CRAN (R 3.3.1)
# stringi 1.1.1 2016-05-27 CRAN (R 3.3.0)
# stringr 1.0.0 2015-04-30 CRAN (R 3.3.1)
# withr 1.0.2 2016-06-20 CRAN (R 3.3.1)
Dan Warren, Ph.D. Department of Biology Macquarie University Email: dan.warren at mq.edu.au <dan.warren at anu.edu.au> Phone (US): 530-848-3809 Phone (Australia): 0468 696 897 Phone (Work): 02 9850 8587 Skype: dan.l.warren Google Scholar <https://scholar.google.com/citations?user=NTzu9c8AAAAJ&hl=en> Orcid <http://orcid.org/0000-0002-8747-2451> ResearcherID <http://www.researcherid.com/rid/B-3821-2010> Scopus <http://www.scopus.com/authid/detail.url?authorId=7202133982> On Mon, Jul 25, 2016 at 10:34 AM, Dan Warren <dan.l.warren at gmail.com> wrote:
This is not an error per se so much as just something very weird that I have noticed with a project I've been working on recently. I'm wondering if anyone here has any insight as to what may be causing this behavior.
I
haven't yet been able to duplicate it with simulated rasters (more info
on
that below), but it appears very reliably with real environmental data including the PC rasters for Cuba I have hosted here: https://github.com/danlwarren/ENMTools/tree/master/test/testdata What's happening is this: if I go to extract data from those rasters
using
occurrence points, the amount of time it takes increases very rapidly up
to
exactly 250 points, and falls dramatically after that. So dramatically that it takes over two minutes to extract data for 250 points but just under three seconds for 251. I've established that it's not a question
of
the points themselves being wonky, because it happens with random points
as
well.
extract.test <- function(env, N){
extract(env, randomPoints(env, N))
}
env.files <- list.files(path = "testdata/", pattern = "pc", full.names =
TRUE)
env <- stack(env.files)
system.time(extract.test(env, 250))
user system elapsed
2.807 0.084 2.891
system.time(extract.test(env, 251))
user system elapsed
124.562 0.516 125.061
numpoints,time
1,1.54
5,3.93
10,6.764
50,29.939
100,61.431
150,79.295
200,110.283
250,120.118
251,2.748
252,2.756
254,2.767
500,2.876
1000,3.153
The data being extracted looks perfectly reasonable in all cases. It's
not just these layers, either. Although (as I mentioned above) I have
yet
to come up with simulated rasters that show this behavior, I see this behavior for both of the sets of rasters for real environmental data that I've tried. The results above are from a PCA on Worldclim data for Cuba, but I just tried them on some Climond data I've got for Australia and I
get
the same behavior. Those rasters are much larger, though, and a result
the
times are longer; 251 points took about 43 seconds, whereas I just had to give up and stop the 250 point extraction after about 30 minutes. As for those simulated rasters, I've tried the following: Plain grids of sequential numbers As above, but with a bunch of NAs added Filling the Cuban rasters with sequential numbers Filling the Cuban rasters with random numbers from a uniform (0,1) distribution None of those show this issue. Anyone have any thoughts about what might be going on here?
[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Dr. Michael Sumner Software and Database Engineer Australian Antarctic Division 203 Channel Highway Kingston Tasmania 7050 Australia [[alternative HTML version deleted]]