Skip to content

GSTAT - measurement vs microscale variation

7 messages · Zev Ross, Edzer Pebesma, Sam Field

#
Hi All,

I folded this question into a previous post, but I think it may have gotten
missed. Just wondering if someone could tell me how, in GSTAT, one would specify
the nugget as measurement error vs microscale variation in kriging. I have
multiple measurements at the same location and I'd like to use these to
determine the measurement error. I've figured out how to do this in geoR, but as
most of my scripts are written in R using GSTAT, I'd rather use that.

Thank you! Zev
#
Zev,

you can use the "Err" variogram model to denote micro variation as 
opposed to nugget. The only effect it has is that for a new prediction 
on an observation location the measurement error-free process is 
predicted, and not the observation process itself. Semivariance of an 
observation with itself remains zero, so it doesn't allow for duplicate 
observations. In terms of predictions, it is "as if" you predict right 
next to a prediction location in case the prediction location coincides 
with an observation location (implying that the predicted surface is 
continuous); in terms of prediction variance, it is "as if" you predict 
for a very small block, meaning the nugget is removed from the 
prediction variance.

Below is an example for the meuse data set.
--
Edzer

 > library(gstat)
Loading required package: sp
 > data(meuse)
 > meuse0 = meuse
 > coordinates(meuse) = ~x+y
 > # prediction at observation location:
 > krige(log(zinc)~1,meuse,meuse[1,],vgm(.5, "Exp",300,.5))
[using ordinary kriging]
       coordinates var1.pred var1.var
1 (181072, 333611)  6.929517        0
 > krige(log(zinc)~1,meuse,meuse[1,],vgm(.5, 
"Exp",300,add.to=vgm(.5,"Err",0)))
[using ordinary kriging]
       coordinates var1.pred  var1.var
1 (181072, 333611)   6.57884 0.1801634
 > cc = coordinates(meuse)
 > cc[1,] = cc[1,]+0.01 # 1 cm shift on a 5 km region
 > coordinates(meuse0)=cc
 > krige(log(zinc)~1,meuse,meuse[1,],vgm(.5, "Exp",300,.5))
[using ordinary kriging]
       coordinates var1.pred var1.var
1 (181072, 333611)  6.929517        0
 > krige(log(zinc)~1,meuse,meuse0[1,],vgm(.5, "Exp",300,.5))
[using ordinary kriging]
       coordinates var1.pred var1.var
1 (181072, 333611)  6.578803 0.680188
 > krige(log(zinc)~1,meuse,meuse0[1,],vgm(.5, 
"Exp",300,add.to=vgm(.5,"Err",0)))
[using ordinary kriging]
       coordinates var1.pred  var1.var
1 (181072, 333611)  6.578803 0.1801880 # same prediction, variance 0.5 less
 > krige(log(zinc)~1,meuse,meuse[1,],vgm(.5, 
"Exp",300,.5),block=c(0.01,0.01))
[using ordinary kriging]
       coordinates var1.pred  var1.var
1 (181072, 333611)  6.578836 0.1801594
Zev Ross wrote:
#
Hi Edzer,

Very useful, thank you. You might be able to tell from my posts that I'm 
running these in parallel in GSTAT and geoR and comparing. It seems from 
your note and example that it might simply be easier to add a centimeter 
(provided a centimeter doesn't matter in the real world) to all the 
coordinates. This way one would not need to keep track of the different 
variances at coincident locations. Do you think this would be an 
acceptable approach?

In terms of your coding, I'm a little uncertain about the meaning of the 
add.to argument and how one might code, for example, half micro-scale 
and half-measurement error. For the example you give, if there was a 
nugget of 0.1 (half micro and half measurement) does my coding below 
look correct? Why is fit.variogram not fitting on the model with error 
-- it's not fitting any of the models I try with measurement error?

Zev

myvgmA<-vgm(.5, "Exp",300, nugget=0.1)
myvgmB<-vgm(.5, "Exp",300,nugget=0.05, add.to=vgm(.05,"Err",0))

fit.variogram(variogram(log(zinc)~1,meuse),model=myvgmA)
  model     psill    range
1   Nug 0.0000000   0.0000
2   Exp 0.7186526 449.7581


fit.variogram(variogram(log(zinc)~1,meuse),model=myvgmB)
  model psill range
1   Err  0.05     0
2   Nug  0.05     0
3   Exp  0.50   300
Edzer J. Pebesma wrote:

  
    
#
Zev Ross wrote:
Yes
Because both effects are collinear in the fit.

I will need to look a bit closer (meaning, in the code), as it surprised 
me that fitting a "Err" effect with no nugget did not work; I think it 
should.

You can find out fit success (to some degree) by looking at the 
"singular" attribute:

 > myvgmB<-vgm(.5, "Exp",300, add.to=vgm(.05,"Err",0))
 > fit.variogram(variogram(log(zinc)~1,meuse),model=myvgmB)
  model psill range
1   Err  0.05     0
2   Exp  0.50   300
 > x = fit.variogram(variogram(log(zinc)~1,meuse),model=myvgmB)
 > attributes(x)
$names
[1] "model" "psill" "range" "kappa" "ang1"  "ang2"  "ang3"  "anis1" "anis2"

$row.names
[1] 1 2

$class
[1] "variogramModel" "data.frame"   

$singular
[1] TRUE

$SSErr
[1] 0.0001155682

 > attr(x, "singular")
[1] TRUE
--
Edzer
#
Edzer,

Thanks again. If both effects (microscale variation and measurement 
error) are collinear in the fit should one not try to include both? 
Perhaps it's simply better to jitter by a centimeter. And by the way, 
I'm doing universal kriging with the krige function, is there a way to 
get out the beta coefficients? As in:

krige(formula=no2.ppb~traffic, mydata, newdata=mynewdata,model=vgmRaw)

I'd like to get the WLS or GLS estimate of traffic.

This back and forth is asking a lot of your time, I really appreciate 
your quick replies. Thanks!

Zev
Edzer J. Pebesma wrote:

  
    
#
You can.

It is a bit of a hack, but predict.gstat() has a BLUE=TRUE optional 
argument, to return the trend component only rather than trend + 
predicted residual as kriging does. Then, if you specify the predictor 
values for the new location as c(1,0,0,... etc), you get out the trend 
coefficient estimate for that location (location matters if you use 
local search neighbourhoods).
--
Edzer
Zev Ross wrote:
I can't follow you here. Collinear means: unable to fit both. Like 
fitting two intercepts instead of one.
#
List,

When the influence of explanatory variables "spills over" into adjacent or
proximate spatial units, one way to model this would be to include a spatially
lagged explanatory variable (WX). If there exists a significant spatially lagged
association, then (it would seem to me) the influence of X would be biased if it
is correlated with WX (which it would be if X was non_randomly distributed in
space). In other words, the effect of X is confounded with WX if the two are
correlated AND both have independent impacts on the outcome.  It would seem that
a properly specified model would include both the effects of X and WX.  One
potential problem is that X and WX maybe highly correlated leading to
instability in the estimation of their independent effects.  It seems a
solution, analogous to what is often done in multi-level models, is to center X
on its spatial average, WX.  Thus,

yhat = b0 + b1(X - WX) + b2(WX).

where the influence of WX is now a function of two parameters: (b2-b1)WX and the
null H0:b2-b1 = 0

Is there a reason not to do this with spatially lagged explanatory variables? 
Is there any literature on this?  I have an empirical example in which the
results from centering versus non centering differ dramatically, so I want to
make sure that the situation is analogous to the multi-level case before
proceeding.  I could do some simulation, but I thought I would ask the list first.



thanks!



Sam