Dear list
I am searching alternatives to ESRI shapefiles for the storage of GPS data, i.e. tagged point features, and came across SpatialLite or Geopackage. Unfortunately writing to both formats is very slow compared to shapefiles making practical use impossible.
library(sf)
library(rgdal)
library(RSQLite)
n<- 1000
d <-data.frame(a=1:n, X=rnorm(n,1,1), Y=rnorm(n,1,1))
mp1 <- st_as_sf(d, coords=c("X","Y"))
t1 <- system.time(st_write(mp1, dsn = 'C:/Temp/data1.shp', driver = 'ESRI Shapefile'))
t2 <- system.time(st_write(mp1, dsn = 'C:/Temp/test.sqlite', layer = 'data1', driver = 'SQLite'))
t3 <- system.time(st_write(mp1, "C:/Temp/data1.gpkg"))
rbind(t1,t2,t3)[,1:3]
user.self sys.self elapsed
t1 0.03 0.03 0.09
t2 0.53 5.04 29.33
t3 0.48 4.29 32.19
As n increases, processing time explodes for SpatialLite and Geopackage, and I usually have a couple of 10000 points to store. Any experiences of others would be highly appreciated.
Many thanks
Manuel
------
R version 3.4.1 (2017-06-30)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252
[3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C
[5] LC_TIME=German_Switzerland.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] sf_0.5-3 RSQLite_2.0 rgdal_1.2-8 sp_1.2-5
loaded via a namespace (and not attached):
[1] Rcpp_0.12.12 lattice_0.20-35 digest_0.6.12 grid_3.4.1 DBI_0.7
[6] magrittr_1.5 units_0.4-5 rlang_0.1.2 blob_1.1.0 tools_3.4.1
[11] udunits2_0.13 bit64_0.9-7 bit_1.1-12 compiler_3.4.1 memoise_1.1.0
[16] tibble_1.3.4
Slow writing of point features to SpatialLite-DB or Geopackage
4 messages · Manuel Schneider, Roger Bivand, Loïc Dutrieux +1 more
On Thu, 24 Aug 2017, manuel.schneider at agroscope.admin.ch wrote:
Dear list
I am searching alternatives to ESRI shapefiles for the storage of GPS data, i.e. tagged point features, and came across SpatialLite or Geopackage. Unfortunately writing to both formats is very slow compared to shapefiles making practical use impossible.
library(sf)
library(rgdal)
library(RSQLite)
n<- 1000
d <-data.frame(a=1:n, X=rnorm(n,1,1), Y=rnorm(n,1,1))
mp1 <- st_as_sf(d, coords=c("X","Y"))
t1 <- system.time(st_write(mp1, dsn = 'C:/Temp/data1.shp', driver = 'ESRI Shapefile'))
t2 <- system.time(st_write(mp1, dsn = 'C:/Temp/test.sqlite', layer = 'data1', driver = 'SQLite'))
t3 <- system.time(st_write(mp1, "C:/Temp/data1.gpkg"))
rbind(t1,t2,t3)[,1:3]
user.self sys.self elapsed
t1 0.03 0.03 0.09
t2 0.53 5.04 29.33
t3 0.48 4.29 32.19
As n increases, processing time explodes for SpatialLite and Geopackage,
and I usually have a couple of 10000 points to store. Any experiences of
others would be highly appreciated.
Fedora 26 64-bit: n 1000
rbind(t1,t2,t3)[,1:3]
user.self sys.self elapsed t1 0.007 0.001 0.010 t2 0.067 0.035 0.103 t3 0.029 0.042 0.073 n 25000
rbind(t1,t2,t3)[,1:3]
user.self sys.self elapsed t1 0.120 0.032 0.153 t2 0.412 0.829 1.247 t3 0.645 0.834 1.487 R version 3.4.1 (2017-06-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Fedora 26 (Workstation Edition) other attached packages: [1] sf_0.5-3 loaded via a namespace (and not attached): [1] compiler_3.4.1 magrittr_1.5 tools_3.4.1 DBI_0.7 units_0.4-5 [6] Rcpp_0.12.12 udunits2_0.13 grid_3.4.1 There is no need to load rgdal or RSQLite, neither are needed or used. For portability use tempdir(): t1 <- system.time(st_write(mp1, dsn = paste0(td, 'data1.shp'))) t2 <- system.time(st_write(mp1, dsn = paste0(td, 'test.sqlite'), layer = 'data1', driver = 'SQLite')) t3 <- system.time(st_write(mp1, paste0(td, 'data1.gpkg'))) Maybe an order of magnitude difference because the databases need initialising, but nothing like your scale; does 32/64 bit make a difference? I'm assuming that you installed sf as a Windows binary from CRAN? Consider using a github issue when others have tried tis out on other platforms. Roger
Many thanks Manuel ------ R version 3.4.1 (2017-06-30) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 7 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C [5] LC_TIME=German_Switzerland.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] sf_0.5-3 RSQLite_2.0 rgdal_1.2-8 sp_1.2-5 loaded via a namespace (and not attached): [1] Rcpp_0.12.12 lattice_0.20-35 digest_0.6.12 grid_3.4.1 DBI_0.7 [6] magrittr_1.5 units_0.4-5 rlang_0.1.2 blob_1.1.0 tools_3.4.1 [11] udunits2_0.13 bit64_0.9-7 bit_1.1-12 compiler_3.4.1 memoise_1.1.0 [16] tibble_1.3.4
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no Editor-in-Chief of The R Journal, https://journal.r-project.org/index.html http://orcid.org/0000-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
On 24/08/17 10:23, Roger Bivand wrote:
On Thu, 24 Aug 2017, manuel.schneider at agroscope.admin.ch wrote:
Dear list
I am searching alternatives to ESRI shapefiles for the storage of GPS
data, i.e. tagged point features, and came across SpatialLite or
Geopackage. Unfortunately writing to both formats is very slow
compared to shapefiles making practical use impossible.
library(sf)
library(rgdal)
library(RSQLite)
n<- 1000
d <-data.frame(a=1:n, X=rnorm(n,1,1), Y=rnorm(n,1,1))
mp1 <- st_as_sf(d, coords=c("X","Y"))
t1 <- system.time(st_write(mp1, dsn = 'C:/Temp/data1.shp', driver =
'ESRI Shapefile'))
t2 <- system.time(st_write(mp1, dsn = 'C:/Temp/test.sqlite', layer =
'data1', driver = 'SQLite'))
t3 <- system.time(st_write(mp1, "C:/Temp/data1.gpkg"))
rbind(t1,t2,t3)[,1:3]
user.self sys.self elapsed
t1 0.03 0.03 0.09
t2 0.53 5.04 29.33
t3 0.48 4.29 32.19
As n increases, processing time explodes for SpatialLite and
Geopackage, and I usually have a couple of 10000 points to store. Any
experiences of others would be highly appreciated.
Fedora 26 64-bit: n 1000
rbind(t1,t2,t3)[,1:3]
user.self sys.self elapsed t1 0.007 0.001 0.010 t2 0.067 0.035 0.103 t3 0.029 0.042 0.073 n 25000
rbind(t1,t2,t3)[,1:3]
user.self sys.self elapsed t1 0.120 0.032 0.153 t2 0.412 0.829 1.247 t3 0.645 0.834 1.487 R version 3.4.1 (2017-06-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Fedora 26 (Workstation Edition) other attached packages: [1] sf_0.5-3 loaded via a namespace (and not attached): [1] compiler_3.4.1 magrittr_1.5 tools_3.4.1 DBI_0.7 units_0.4-5 [6] Rcpp_0.12.12 udunits2_0.13 grid_3.4.1
I also get large differences on ubuntu 16.04 64-bits with ssd;
particularly when writing a second layer to an existing geopackage
library(sf)
n <- 1000
d <- data.frame(a=1:n, X=rnorm(n,1,1), Y=rnorm(n,1,1))
mp1 <- st_as_sf(d, coords=c("X","Y"))
td <- tempdir()
file.remove(list.files(td, full.names = TRUE))
t1 <- system.time(st_write(mp1, dsn = file.path(td, 'data1.shp'), driver
= 'ESRI Shapefile'))
t2 <- system.time(st_write(mp1, dsn = file.path(td, 'data2.sqlite'),
layer = 'layer1', driver = 'SQLite'))
t3 <- system.time(st_write(mp1, dsn = file.path(td, 'data2.sqlite'),
layer = 'layer2', driver = 'SQLite'))
t4 <- system.time(st_write(mp1, dsn = file.path(td, 'data3.gpkg'), layer
= 'layer1'))
t5 <- system.time(st_write(mp1, dsn = file.path(td, 'data3.gpkg'), layer
= 'layer2'))
rbind(t1,t2,t3,t4,t5)[,1:3]
user.self sys.self elapsed
t1 0.012 0.000 0.010
t2 0.180 0.456 8.993
t3 0.220 0.460 10.637
t4 0.016 0.064 0.082
t5 0.200 0.472 9.199
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS
other attached packages:
[1] sf_0.5-3 raster_2.5-8 sp_1.2-4
loaded via a namespace (and not attached):
[1] compiler_3.4.0 magrittr_1.5 DBI_0.6-1 tools_3.4.0
units_0.4-5 yaml_2.1.14 Rcpp_0.12.10 udunits2_0.13
grid_3.4.0 lattice_0.20-35
Cheers,
Lo?c
There is no need to load rgdal or RSQLite, neither are needed or used. For portability use tempdir(): t1 <- system.time(st_write(mp1, dsn = paste0(td, 'data1.shp'))) t2 <- system.time(st_write(mp1, dsn = paste0(td, 'test.sqlite'), layer = 'data1', driver = 'SQLite')) t3 <- system.time(st_write(mp1, paste0(td, 'data1.gpkg'))) Maybe an order of magnitude difference because the databases need initialising, but nothing like your scale; does 32/64 bit make a difference? I'm assuming that you installed sf as a Windows binary from CRAN? Consider using a github issue when others have tried tis out on other platforms. Roger
Many thanks Manuel ------ R version 3.4.1 (2017-06-30) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 7 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C [5] LC_TIME=German_Switzerland.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] sf_0.5-3 RSQLite_2.0 rgdal_1.2-8 sp_1.2-5 loaded via a namespace (and not attached): [1] Rcpp_0.12.12 lattice_0.20-35 digest_0.6.12 grid_3.4.1 DBI_0.7 [6] magrittr_1.5 units_0.4-5 rlang_0.1.2 blob_1.1.0 tools_3.4.1 [11] udunits2_0.13 bit64_0.9-7 bit_1.1-12 compiler_3.4.1 memoise_1.1.0 [16] tibble_1.3.4
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
According to http://gdal.org/drv_sqlite.html you can unset OGR_SQLITE_SYNCHRONOUS and it should improve performance. So, use Sys.setenv(OGR_SQLITE_SYNCHRONOUS="OFF") in your script/session before you wish to call writes.? The SQLite and GPKG writes are still 11x and 13x respectively slower than shapefile writes but not terribly bad. Below is what resulted in these numbers. Output from my .Rout file follows:
library(sf)
Linking to GEOS 3.5.1, GDAL 2.2.0, proj.4 4.9.2
library(rbenchmark)
n <- 1000
d <- data.frame(a=seq_len(n),x=rnorm(n,1,1),y=rnorm(n,1,1))
mp1 <- st_as_sf(d,coords=c('x','y'))
td <- tempdir()
file.remove(list.files(td,full.names=TRUE))
logical(0)
stwrite <- function(dat, dsn, ...) {
+ if (file.exists(dsn)) unlink(dsn) + st_write(dat,dsn=dsn, ...) + }
Sys.setenv(OGR_SQLITE_SYNCHRONOUS="OFF") benchmark(
+ stwrite(mp1, dsn=file.path(td,'data1.shp'), driver='ESRI Shapefile'), + stwrite(mp1, dsn=file.path(td,'data2.sqlite'), layer="layer1", driver='SQLite'), + stwrite(mp1, dsn=file.path(td,'data2.sqlite'), layer="layer2", driver='SQLite'), + stwrite(mp1, dsn=file.path(td,'data3.gpkg'), layer="layer1"), + stwrite(mp1, dsn=file.path(td,'data3.gpkg'), layer="layer2"), + replications=5 + ) test 1 stwrite(mp1, dsn = file.path(td, "data1.shp"), driver = "ESRI Shapefile") 2 stwrite(mp1, dsn = file.path(td, "data2.sqlite"), layer = "layer1", driver = "SQLite") 3 stwrite(mp1, dsn = file.path(td, "data2.sqlite"), layer = "layer2", driver = "SQLite") 4 stwrite(mp1, dsn = file.path(td, "data3.gpkg"), layer = "layer1") 5 stwrite(mp1, dsn = file.path(td, "data3.gpkg"), layer = "layer2") replications elapsed relative user.self sys.self user.child sys.child 1 5 0.034 1.000 0.028 0.008 0 0 2 5 0.374 11.000 0.060 0.312 0 0 3 5 0.359 10.559 0.084 0.272 0 0 4 5 0.454 13.353 0.192 0.260 0 0 5 5 0.451 13.265 0.172 0.280 0 0
sessionInfo()
R version 3.4.1 (2017-06-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 16.04.3 LTS Matrix products: default BLAS: /usr/lib/libblas/libblas.so.3.6.0 LAPACK: /usr/lib/lapack/liblapack.so.3.6.0 locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] tools stats graphics grDevices utils datasets methods [8] base other attached packages: [1] rbenchmark_1.0.0 sf_0.5-4 loaded via a namespace (and not attached): [1] compiler_3.4.1 magrittr_1.5 DBI_0.7 units_0.4-5 Rcpp_0.12.12 [6] udunits2_0.13 grid_3.4.1
proc.time()
user system elapsed 1.436 1.340 2.791 On Thu, Aug 24, 2017 at 5:44 PM, Lo?c Dutrieux <loic.dutrieux at conabio.gob.mx
wrote:
On 24/08/17 10:23, Roger Bivand wrote:
On Thu, 24 Aug 2017, manuel.schneider at agroscope.admin.ch wrote: Dear list
I am searching alternatives to ESRI shapefiles for the storage of GPS
data, i.e. tagged point features, and came across SpatialLite or
Geopackage. Unfortunately writing to both formats is very slow compared to
shapefiles making practical use impossible.
library(sf)
library(rgdal)
library(RSQLite)
n<- 1000
d <-data.frame(a=1:n, X=rnorm(n,1,1), Y=rnorm(n,1,1))
mp1 <- st_as_sf(d, coords=c("X","Y"))
t1 <- system.time(st_write(mp1, dsn = 'C:/Temp/data1.shp', driver =
'ESRI Shapefile'))
t2 <- system.time(st_write(mp1, dsn = 'C:/Temp/test.sqlite', layer =
'data1', driver = 'SQLite'))
t3 <- system.time(st_write(mp1, "C:/Temp/data1.gpkg"))
rbind(t1,t2,t3)[,1:3]
user.self sys.self elapsed
t1 0.03 0.03 0.09
t2 0.53 5.04 29.33
t3 0.48 4.29 32.19
As n increases, processing time explodes for SpatialLite and Geopackage,
and I usually have a couple of 10000 points to store. Any experiences of
others would be highly appreciated.
Fedora 26 64-bit: n 1000 rbind(t1,t2,t3)[,1:3]
user.self sys.self elapsed t1 0.007 0.001 0.010 t2 0.067 0.035 0.103 t3 0.029 0.042 0.073 n 25000 rbind(t1,t2,t3)[,1:3]
user.self sys.self elapsed t1 0.120 0.032 0.153 t2 0.412 0.829 1.247 t3 0.645 0.834 1.487 R version 3.4.1 (2017-06-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Fedora 26 (Workstation Edition) other attached packages: [1] sf_0.5-3 loaded via a namespace (and not attached): [1] compiler_3.4.1 magrittr_1.5 tools_3.4.1 DBI_0.7 units_0.4-5 [6] Rcpp_0.12.12 udunits2_0.13 grid_3.4.1
I also get large differences on ubuntu 16.04 64-bits with ssd;
particularly when writing a second layer to an existing geopackage
library(sf)
n <- 1000
d <- data.frame(a=1:n, X=rnorm(n,1,1), Y=rnorm(n,1,1))
mp1 <- st_as_sf(d, coords=c("X","Y"))
td <- tempdir()
file.remove(list.files(td, full.names = TRUE))
t1 <- system.time(st_write(mp1, dsn = file.path(td, 'data1.shp'), driver =
'ESRI Shapefile'))
t2 <- system.time(st_write(mp1, dsn = file.path(td, 'data2.sqlite'), layer
= 'layer1', driver = 'SQLite'))
t3 <- system.time(st_write(mp1, dsn = file.path(td, 'data2.sqlite'), layer
= 'layer2', driver = 'SQLite'))
t4 <- system.time(st_write(mp1, dsn = file.path(td, 'data3.gpkg'), layer =
'layer1'))
t5 <- system.time(st_write(mp1, dsn = file.path(td, 'data3.gpkg'), layer =
'layer2'))
rbind(t1,t2,t3,t4,t5)[,1:3]
user.self sys.self elapsed
t1 0.012 0.000 0.010
t2 0.180 0.456 8.993
t3 0.220 0.460 10.637
t4 0.016 0.064 0.082
t5 0.200 0.472 9.199
R version 3.4.0 (2017-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS
other attached packages:
[1] sf_0.5-3 raster_2.5-8 sp_1.2-4
loaded via a namespace (and not attached):
[1] compiler_3.4.0 magrittr_1.5 DBI_0.6-1 tools_3.4.0
units_0.4-5 yaml_2.1.14 Rcpp_0.12.10 udunits2_0.13 grid_3.4.0
lattice_0.20-35
Cheers,
Lo?c
There is no need to load rgdal or RSQLite, neither are needed or used. For
portability use tempdir(): t1 <- system.time(st_write(mp1, dsn = paste0(td, 'data1.shp'))) t2 <- system.time(st_write(mp1, dsn = paste0(td, 'test.sqlite'), layer = 'data1', driver = 'SQLite')) t3 <- system.time(st_write(mp1, paste0(td, 'data1.gpkg'))) Maybe an order of magnitude difference because the databases need initialising, but nothing like your scale; does 32/64 bit make a difference? I'm assuming that you installed sf as a Windows binary from CRAN? Consider using a github issue when others have tried tis out on other platforms. Roger Many thanks
Manuel ------ R version 3.4.1 (2017-06-30) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 7 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=German_Switzerland.1252 LC_CTYPE=German_Switzerland.1252 [3] LC_MONETARY=German_Switzerland.1252 LC_NUMERIC=C [5] LC_TIME=German_Switzerland.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] sf_0.5-3 RSQLite_2.0 rgdal_1.2-8 sp_1.2-5 loaded via a namespace (and not attached): [1] Rcpp_0.12.12 lattice_0.20-35 digest_0.6.12 grid_3.4.1 DBI_0.7 [6] magrittr_1.5 units_0.4-5 rlang_0.1.2 blob_1.1.0 tools_3.4.1 [11] udunits2_0.13 bit64_0.9-7 bit_1.1-12 compiler_3.4.1 memoise_1.1.0 [16] tibble_1.3.4
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo