Skip to content

How does variogramST handle NAs?

3 messages · Nick Hamm, Edzer Pebesma

#
Dear list

We have been working with an STFDF object.  The dataset contains
several NAs, that is to say that there are some locations where the
time-series record is incomplete.  When we computed space-time
variograms for this dataset we found some unexpected results.

To explore this further, we took a second dataset of measurements from
the same location.  This dataset is complete (i.e., there are no NAs).
 We computed the space-time variogram.  We then inserted NAs at the
location that they occur for the first dataset and get quite different
variograms!

My question is:  How does variogramST handle NAs?

Demo code is below.  You can download the data from here:

https://dl.dropbox.com/u/15122401/demoData.zip

Nick


sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] gstat_1.0-15    spacetime_1.0-3 sp_1.0-5

loaded via a namespace (and not attached):
[1] grid_2.15.2      intervals_0.13.3 lattice_0.20-10  tools_2.15.2
[5] xts_0.9-3        zoo_1.7-9
# Load required libraries
require(gstat)
require(spacetime)

# Load the demo data
rm(list=ls())
load("demoData.RData")

# jan.na has some NAs in time
summary(jan.na$value)
# LE is at the same location but has no NAs
summary(LE$value)

# ST variogram for LE - full dataset
v.all <- variogramST(value~1, LE, width=20000, cutoff=500000,tlags=0:6)
plot(v.all, ylab="time lag (days)", xlab="distance (km)")
plot(v.all, map=FALSE, ylab="time lag (days)", xlab="distance (km)")


# Now insert NA's into the full dataset. The NA's correspond to  those
in the "jan.na" dataset
loc.na <- which(jan.na$value > 0)
LE.na  <- LE
LE.na$value[-loc.na] <- NA

# ST variogram - dataset with NA's inserted
v.na <- variogramST(value~1, LE.na, width=20000, cutoff=500000,tlags=0:6)
plot(v.na, ylab="time lag (days)", xlab="distance (km)")
plot(v.na, map=FALSE, ylab="time lag (days)", xlab="distance (km)")



# Remove all locations where there are any NAs in the jan.na time series
any.na <- which(apply(as(jan.na, "xts"), 2, function(x) all(!is.na(x))))
LE.no=LE[any.na,]

# ST variogram - remove all locations where there is an NA
v.any <- variogramST(value~1, LE.no, width=20000, cutoff=500000,tlags=0:6)
plot(v.any, ylab="time lag (days)", xlab="distance (km)")
plot(v.any, map=FALSE, ylab="time lag (days)", xlab="distance (km)")
#
Dear Nick, did you use the r-forge binaries? This was suggested in

https://stat.ethz.ch/pipermail/r-sig-geo/2013-January/017234.html

Please let me know if your unexpected results persist, and also tell us 
what was unexpected.
On 02/03/2013 01:40 PM, Nick Hamm wrote:
#
Dear Edzer

I was using the CRAN binaries (gstat version 1.0-15). The unexpected
result was that the variograms were totally different depending on
whether the NAs were included or not (the variogram decreased with
increasing temporal lag when NAs were excluded but increased when they
were included).

I installed gstat from R-forge (version 1.0-16).  I now get consistent
variograms.  I will look further at this next week.


Nick
On 3 February 2013 17:17, Edzer Pebesma <edzer.pebesma at uni-muenster.de> wrote: