Skip to content
Prev 6121 / 29559 Next

Problem with size of dataset

Dear Emmanuel,

I have the same problem. I either can not run processing with large data set in R or I can not even
load such data to R. Then, if I want to do any geostatistics, it takes forever. R (gstat/geoR) is
simply not that efficient with large spatial data as e.g. GIS software.

What you can definitively try is to subset your point data randomly by using e.g.:
This will allow you to fit variograms etc.

Then, if you really want to interpolate all of your 157k points, you might consider using SAGA. For
example, you can also sub-set randomly points from a shape file:

# too many points; subset to 5% ("Split Shapes Layer Randomly" in SAGA):
A="part_A.shp", B="part_B.shp", PERCENT=5))
# Learn more about geostatistics in SAGA:
$geostatistics_kriging
   code                           name
1     0          Ordinary Kriging (VF)
2     1  Ordinary Kriging (VF, Global)
3     2         Universal Kriging (VF)
4     3 Universal Kriging (VF, Global)
5     4        Semivariogram (Dialog))
6     5               Ordinary Kriging
7     6      Ordinary Kriging (Global)
8     7              Universal Kriging
9     8     Universal Kriging (Global)
10   NA                           <NA>
11   NA                           <NA>

# Read the mask map:
# Ordinary kriging in SAGA:
SHAPES="var.shp", BVARIANCE=F, BLOCK=F, FIELD=1, BLOG=F, MODEL=1, TARGET=0, NPOINTS_MIN=10,
NPOINTS_MAX=60, NUGGET=rvgm.Pb$psill[1], SILL=1.65, RANGE=1238, MAXRADIUS=50000,
USER_CELL_SIZE=cell.size, USER_X_EXTENT_MIN=gridmaps at bbox[1,1]+cell.size/2,
USER_X_EXTENT_MAX=gridmaps at bbox[1,2]-cell.size/2, USER_Y_EXTENT_MIN=gridmaps at bbox[2,1]+cell.size/2,
USER_Y_EXTENT_MAX=gridmaps at bbox[2,2]-cell.size/2))
# the same way you can run regression-kriging/universal kriging;

You will soon find out that there is a big difference in the efficiency between SAGA and R - SAGA
will interpolate your 157k points within few minutes or less. On the other hand, SAGA has very very
limited geostatistical functionality (for example it can not fit variograms etc.), so what you
really need is a combination of SAGA and R!

Here are more examples:
http://geomorphometry.org/view_scripts.asp?id=24 

HTH,

T. Hengl
http://spatial-analyst.net