Skip to content

Odd behavior of dismo's extract function

6 messages · Dan Warren, Michael Sumner, Nick Matzke

#
This is not an error per se so much as just something very weird that I
have noticed with a project I've been working on recently.  I'm wondering
if anyone here has any insight as to what may be causing this behavior.  I
haven't yet been able to duplicate it with simulated rasters (more info on
that below), but it appears very reliably with real environmental data
including the PC rasters for Cuba I have hosted here:

https://github.com/danlwarren/ENMTools/tree/master/test/testdata

What's happening is this: if I go to extract data from those rasters using
occurrence points, the amount of time it takes increases very rapidly up to
exactly 250 points, and falls dramatically after that.  So dramatically
that it takes over two minutes to extract data for 250 points but just
under three seconds for 251.  I've established that it's not a question of
the points themselves being wonky, because it happens with random points as
well.


extract.test <- function(env, N){
      extract(env, randomPoints(env, N))
}

env.files <- list.files(path = "testdata/", pattern = "pc", full.names =
TRUE)
env <- stack(env.files)

system.time(extract.test(env, 250))

   user  system elapsed
  2.807   0.084   2.891

system.time(extract.test(env, 251))

   user  system elapsed
124.562   0.516 125.061

numpoints,time
1,1.54
5,3.93
10,6.764
50,29.939
100,61.431
150,79.295
200,110.283
250,120.118
251,2.748
252,2.756
254,2.767
500,2.876
1000,3.153

The data being extracted looks perfectly reasonable in all cases.  It's not
just these layers, either.  Although (as I mentioned above) I have yet to
come up with simulated rasters that show this behavior, I see this behavior
for both of the sets of rasters for real environmental data that I've
tried.  The results above are from a PCA on Worldclim data for Cuba, but I
just tried them on some Climond data I've got for Australia and I get the
same behavior.  Those rasters are much larger, though, and a result the
times are longer; 251 points took about 43 seconds, whereas I just had to
give up and stop the 250 point extraction after about 30 minutes.

As for those simulated rasters, I've tried the following:

Plain grids of sequential numbers
As above, but with a bunch of NAs added
Filling the Cuban rasters with sequential numbers
Filling the Cuban rasters with random numbers from a uniform (0,1)
distribution

None of those show this issue.  Anyone have any thoughts about what might
be going on here?
#
Just realized I pasted in the results backwards.  It should have been

system.time(extract.test(env, 250))

   user  system elapsed
124.562   0.516 125.061

system.time(extract.test(env, 251))

   user  system elapsed
  2.807   0.084   2.891



Dan Warren, Ph.D.
Department of Biology
Macquarie University
Email: dan.warren at mq.edu.au <dan.warren at anu.edu.au>
Phone (US): 530-848-3809
Phone (Australia): 0468 696 897
Phone (Work): 02 9850 8587
Skype: dan.l.warren
Google Scholar
<https://scholar.google.com/citations?user=NTzu9c8AAAAJ&hl=en>  Orcid
<http://orcid.org/0000-0002-8747-2451>  ResearcherID
<http://www.researcherid.com/rid/B-3821-2010>  Scopus
<http://www.scopus.com/authid/detail.url?authorId=7202133982>
On Mon, Jul 25, 2016 at 10:34 AM, Dan Warren <dan.l.warren at gmail.com> wrote:

            

  
  
#
On Mon, 25 Jul 2016 at 11:35 Dan Warren <dan.l.warren at gmail.com> wrote:

            
I don't see the effect.

Perhaps it was fixed in recent version of raster?

Please post reproducible details, I downloaded your data files to
"test/testdata/" to try this.

Cheers, Mike.


library(raster)
library(dismo)
extract.test <- function(env, N){
  extract(env, dismo::randomPoints(env, N))
}

env.files <- list.files(path = "test/testdata/", pattern = "pc", full.names
=
                          TRUE)
env <- raster::stack(env.files)

library(rbenchmark)
benchmark(n250 = extract.test(env, 250),
          n251 = extract.test(env, 251), replications = 4)
# test replications elapsed relative user.self sys.self user.child sys.child
# 1 n250            4    6.31    1.008      5.13     1.14         NA
 NA
# 2 n251            4    6.26    1.000      5.02     1.22         NA
 NA
devtools::session_info()
# Session info
-------------------------------------------------------------------------------------------------------------------------------
#   setting  value
# version  R version 3.3.1 Patched (2016-07-09 r70874)
# system   x86_64, mingw32
# ui       RStudio (0.99.1261)
# language (EN)
# collate  English_Australia.1252
# tz       Australia/Hobart
# date     2016-07-25
#
# Packages
-----------------------------------------------------------------------------------------------------------------------------------
#   package    * version date       source
# devtools   * 1.12.0  2016-06-24 CRAN (R 3.3.1)
# digest       0.6.9   2016-01-08 CRAN (R 3.3.1)
# dismo      * 1.1-1   2016-06-16 CRAN (R 3.3.1)
# evaluate     0.9     2016-04-29 CRAN (R 3.3.1)
# htmltools    0.3.5   2016-03-21 CRAN (R 3.3.1)
# knitr        1.13    2016-05-09 CRAN (R 3.3.1)
# lattice      0.20-33 2015-07-14 CRAN (R 3.3.1)
# magrittr     1.5     2014-11-22 CRAN (R 3.3.1)
# memoise      1.0.0   2016-01-29 CRAN (R 3.3.1)
# raster     * 2.5-8   2016-06-02 CRAN (R 3.3.1)
# rbenchmark * 1.0.0   2012-08-30 CRAN (R 3.3.0)
# Rcpp         0.12.5  2016-05-14 CRAN (R 3.3.1)
# rgdal        1.1-10  2016-05-12 CRAN (R 3.3.1)
# rmarkdown    1.0.2   2016-07-19 Github (rstudio/rmarkdown at b65e177)
# sp         * 1.2-3   2016-04-14 CRAN (R 3.3.1)
# stringi      1.1.1   2016-05-27 CRAN (R 3.3.0)
# stringr      1.0.0   2015-04-30 CRAN (R 3.3.1)
# withr        1.0.2   2016-06-20 CRAN (R 3.3.1)
#
How very odd.  I'm using R 3.3.0, but as far as I can tell I'm using the
same package versions as you.  I've tried this on two machines (12 core Mac
Pro and an older Macbook Pro) and I'm getting the same phenomenon on both.
Could it be a weird OSX thing?  I'll try updating R and then if it still
persists I'll bootcamp over into Windows and see if it's happening for me
there.


My session info (sorry for not including that the first time):

Session info
----------------------------------------------------------------------------------------------------------------------------------------
 setting  value
 version  R version 3.3.0 (2016-05-03)
 system   x86_64, darwin13.4.0
 ui       RStudio (0.99.491)
 language (EN)
 collate  en_AU.UTF-8
 tz       Australia/Sydney
 date     2016-07-25

Packages
--------------------------------------------------------------------------------------------------------------------------------------------
 package    * version date       source
 colorspace   1.2-6   2015-03-11 CRAN (R 3.3.0)
 devtools     1.12.0  2016-06-24 CRAN (R 3.3.0)
 digest       0.6.9   2016-01-08 CRAN (R 3.3.0)
 dismo      * 1.1-1   2016-06-16 CRAN (R 3.3.0)
 ENMTools   * 0.1     2016-07-25 local
 ggplot2    * 2.1.0   2016-03-01 CRAN (R 3.3.0)
 gridExtra  * 2.2.1   2016-02-29 CRAN (R 3.3.0)
 gtable       0.2.0   2016-02-26 CRAN (R 3.3.0)
 highr        0.6     2016-05-09 CRAN (R 3.3.0)
 knitr      * 1.13    2016-05-09 CRAN (R 3.3.0)
 lattice      0.20-33 2015-07-14 CRAN (R 3.3.0)
 memoise      1.0.0   2016-01-29 CRAN (R 3.3.0)
 munsell      0.4.3   2016-02-13 CRAN (R 3.3.0)
 plyr       * 1.8.4   2016-06-08 CRAN (R 3.3.0)
 raster     * 2.5-8   2016-06-02 CRAN (R 3.3.0)
 Rcpp         0.12.5  2016-05-14 CRAN (R 3.3.0)
 rgeos      * 0.3-19  2016-04-04 CRAN (R 3.3.0)
 rJava        0.9-8   2016-01-07 CRAN (R 3.3.0)
 scales       0.4.0   2016-02-26 CRAN (R 3.3.0)
 sp         * 1.2-3   2016-04-14 CRAN (R 3.3.0)
 viridis    * 0.3.4   2016-03-12 CRAN (R 3.3.0)
 withr        1.0.2   2016-06-20 CRAN (R 3.3.0)
On Mon, Jul 25, 2016 at 12:15 PM, Michael Sumner <mdsumner at gmail.com> wrote:

            

  
  
#
Updating to R 3.3.1 fixed it.  Thanks!  Still baffled as to why the sudden
dropoff between 250 and 251, but as long as it's working all is well.

Cheers!
On Mon, Jul 25, 2016 at 12:24 PM, Dan Warren <dan.l.warren at gmail.com> wrote:

            

  
  
#
I am on R 3.3.1 and don't get the huge time difference.  There is a tiny
decrease though from 250 to 251:

=============
wd = "~/Downloads/extract_weirdness/"
setwd(wd)

library(raster)
library(dismo)


extract.test <- function(env, N){
      extract(env, randomPoints(env, N))
}

env.files <- list.files(path = ".", pattern = "pc", full.names =
TRUE)
env <- stack(env.files)


system.time(extract.test(env, 250))

   user  system elapsed
  1.455   0.043   1.492
Warning message:
In couldBeLonLat(mask) : CRS is NA. Assuming it is longitude/latitude


system.time(extract.test(env, 251))   user  system elapsed
  1.137   0.033   1.158
Warning message:
In couldBeLonLat(mask) : CRS is NA. Assuming it is longitude/latitude
=============

...but I won't worry about it myself.  Thanks for the solution though, that
was weird behavior!
Nick
On Mon, Jul 25, 2016 at 12:29 PM, Dan Warren <dan.l.warren at gmail.com> wrote: