Skip to content

question about regression kriging

12 messages · David Maxwell (Cefas), ONKELINX, Thierry, Tomislav Hengl +1 more

#
Hi,

Tom and Thierry, Thank you for your advice, the lecture notes are very useful. We will try geoRglm but for now regression kriging using the working residuals gives sensible answers even though there are some issues with using working residuals, i.e. not Normally distributed, occasional very large values and inv.logit(prediction type="link" + working residual) doesn't quite give the observed values.

Our final question about this is how to estimate standard errors for the regression kriging predictions of the binary variable?

On the logit scale we are using
 rk prediction (s0) = glm prediction (s0) + kriged residual prediction (s0) 
for location s0

Is assuming independence of the two components adequate?
 var rk(s0) ~= var glm prediction (s0) + var kriged residual prediction (s0) 

Thanks again,
David Maxwell & Vanessa Stelzenm?ller

david.maxwell at cefas.co.uk

-----Original Message-----
From: r-sig-geo-bounces at stat.math.ethz.ch [mailto:r-sig-geo-bounces at stat.math.ethz.ch] On Behalf Of Tomislav Hengl
Sent: 03 April 2008 09:04
To: r-sig-geo at stat.math.ethz.ch
Subject: Re: [R-sig-Geo] question about regression kriging


I completely agree with Thierry.

Take a look at this also:
https://stat.ethz.ch/pipermail/r-sig-geo/2008-February/003176.html 

The instructions on how to run RK with binary variables in R you can find in sec 4.3.3 (Fig. 4.15)
of my lecture notes.

Hengl, T., 2007. A Practical Guide to Geostatistical Mapping of Environmental Variables. EUR 22904
EN Scientific and Technical Research series, Office for Official Publications of the European
Communities, Luxemburg, 143 pp.
http://bookshop.europa.eu/uri?target=EUB:NOTICE:LBNA22904:EN:HTML 


Tom Hengl
http://spatial-analyst.net 


-----Original Message-----
From: r-sig-geo-bounces at stat.math.ethz.ch [mailto:r-sig-geo-bounces at stat.math.ethz.ch] On Behalf Of
ONKELINX, Thierry
Sent: woensdag 2 april 2008 11:56
To: Vanessa Stelzenmuller (Cefas); r-sig-geo at stat.math.ethz.ch
Subject: Re: [R-sig-Geo] question about regression kriging

Dear Vanessa,

What residuals did you use? The ones in the original scale or in the logit scale? Interpolate the
residuals in the logit scale and add these to the model predictions in the logit scale. And the
transform those values back to the original scale. This will prevent values outside the 0-1 range.

Maybe you should have a loot at the geoRglm package.

HTH,

Thierry

----------------------------------------------------------------------------
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality
assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be 
www.inbo.be 

To call in the statistician after the experiment is done may be no more than asking him to perform a
post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable
answer can be extracted from a given body of data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-sig-geo-bounces at stat.math.ethz.ch [mailto:r-sig-geo-bounces at stat.math.ethz.ch] Namens Vanessa
Stelzenmuller (Cefas)
Verzonden: woensdag 2 april 2008 11:32
Aan: r-sig-geo at stat.math.ethz.ch
Onderwerp: [R-sig-Geo] question about regression kriging

Hello,



We work on the application of regression kriging to presence / absence data in the context of
species distribution modelling. In R in a first step we fit the trend surfaces with logistic
regression models. Then we fit a variogram to the regression residuals and interpolate the residuals
with OK. Now we face the situation that when combining trend surfaces with residual surfaces for
some locations our occurrence probability is <0 or >1. Thus taking into account the spatial
structure of the data (residuals)  has the potential to convert a predicted high occurrence
probability into a low occurrence probability or vice versa. Are there some restriction for
presence/ absence data for this approach? How to deal with these estimations (<0 and >1)?



Many thanks

Vanessa  





________________________________

Dr. Vanessa Stelzenm?ller

Marine Scientist (GIS), CEFAS

Pakefield Road, Lowestoft, NR33 0HT, UK



Tel.: +44 (0)1502 527779



www.cefas.co.uk





***********************************************************************************
This email and any attachments are intended for the =\...{{dropped:20}}
#
David Maxwell (Cefas) wrote:
In principle, no. The extreme case is prediction at observation 
locations, where the correlation is -1 so that the final prediction 
variance becomes zero. I never got to looking how large the correlation 
is otherwise, but that shouldn't be hard to do in the linear case, as 
you can get the first and second separately, and also the combined using 
universal kriging.

Another question: how do you transform this variance back to the 
observation scale?
--
Edzer
#
The two components of the regression-kriging model are not independent, hence you are doing a wrong
thing if you are just summing them. You should use instead the universal kriging variance that is
derived in gstat. The complete derivation of the Universal kriging variance is available in Cressie
(1993; p.154), or even better Papritz and Stein (1999; p.94). See also pages 7-8 of our technical
note:

Hengl T., Heuvelink G.B.M. and Stein A., 2003. Comparison of kriging with external drift and
regression-kriging. Technical report, International Institute for Geo-information Science and Earth
Observation (ITC), Enschede, pp. 18.
http://www.itc.nl/library/Papers_2003/misca/hengl_comparison.pdf

Edzer is right, you can not back-transform prediction variance of the transformed variable (logits).
However, you can standardize/normalize the UK variance by diving it with global variance (see e.g.
http://dx.doi.org/10.1016/j.geoderma.2003.08.018), so that you can evaluate the success of
prediction in relative terms (see also http://spatial-analyst.net/visualization.php).


Tom Hengl
http://spatial-analyst.net 


-----Original Message-----
From: r-sig-geo-bounces at stat.math.ethz.ch [mailto:r-sig-geo-bounces at stat.math.ethz.ch] On Behalf Of
Edzer Pebesma
Sent: dinsdag 8 april 2008 20:50
To: David Maxwell (Cefas)
Cc: r-sig-geo at stat.math.ethz.ch
Subject: Re: [R-sig-Geo] question about regression kriging
David Maxwell (Cefas) wrote:
but for now regression kriging using the working residuals gives sensible answers even though there
are some issues with using working residuals, i.e. not Normally distributed, occasional very large
values and inv.logit(prediction type="link" + working residual) doesn't quite give the observed
values.
predictions of the binary variable?
In principle, no. The extreme case is prediction at observation 
locations, where the correlation is -1 so that the final prediction 
variance becomes zero. I never got to looking how large the correlation 
is otherwise, but that shouldn't be hard to do in the linear case, as 
you can get the first and second separately, and also the combined using 
universal kriging.

Another question: how do you transform this variance back to the 
observation scale?
--
Edzer

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
#
Dear David,

An other option would be to use sequential gaussian simulation. That
will allow to calculate confidence intervals in the logit scale. These
can be back-transformed into the original scale because the logit
transformation is monotone. 

HTH,

Thierry

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be 
www.inbo.be 

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-sig-geo-bounces at stat.math.ethz.ch
[mailto:r-sig-geo-bounces at stat.math.ethz.ch] Namens Edzer Pebesma
Verzonden: dinsdag 8 april 2008 20:50
Aan: David Maxwell (Cefas)
CC: r-sig-geo at stat.math.ethz.ch
Onderwerp: Re: [R-sig-Geo] question about regression kriging
David Maxwell (Cefas) wrote:
useful. We will try geoRglm but for now regression kriging using the
working residuals gives sensible answers even though there are some
issues with using working residuals, i.e. not Normally distributed,
occasional very large values and inv.logit(prediction type="link" +
working residual) doesn't quite give the observed values.
the regression kriging predictions of the binary variable?
(s0)
prediction (s0)
In principle, no. The extreme case is prediction at observation 
locations, where the correlation is -1 so that the final prediction 
variance becomes zero. I never got to looking how large the correlation 
is otherwise, but that shouldn't be hard to do in the linear case, as 
you can get the first and second separately, and also the combined using

universal kriging.

Another question: how do you transform this variance back to the 
observation scale?
--
Edzer

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
#
Tom,

I'm afraid things are harder than you sketch. In glm's, the parameter 
estimation is done using iteratively reweighted least squares, where the 
weights depend on a variance function that links the variance of 
observations to the mean. So, observations (residuals) are assumed to be 
unstationary, in principle, and because of the mean-dependency this 
changes over the iterations. The equations and references you mention 
afaik all assume a known, and fixed variogram, and one-step solutions, 
no iteration.

Also, you falsly accuse me of claiming one cannot back-transform 
prediction variances. I did not claim this (I have seen suggestions on 
how to do this), I just asked how David would do this.
--
Edzer
Tomislav Hengl wrote:
#
Thierry,

how would you setup a Gaussian simulation such that the end result is 
different from the case where these confidence intervals were directly 
computed from the kriged mean and variance and a Gaussian assumption on 
the errors?
--
Edzer
ONKELINX, Thierry wrote:
#
Edzer,

One assumes a normal distribution of the predictions in every point when
calculating a confidence interval based on the kriged mean and variance.
AFAIK sequential Gaussian simulation doesn't have to yield normally
distributed predictions per point. Therefore can the confidence
intervals based on SGS differ from those based on the kriged mean and
variance. Or am I missing something?

Thierry


------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium 
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be 
www.inbo.be 

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: Edzer Pebesma [mailto:edzer.pebesma at uni-muenster.de] 
Verzonden: woensdag 9 april 2008 10:57
Aan: ONKELINX, Thierry
CC: David Maxwell (Cefas); r-sig-geo at stat.math.ethz.ch
Onderwerp: Re: [R-sig-Geo] question about regression kriging

Thierry,

how would you setup a Gaussian simulation such that the end result is 
different from the case where these confidence intervals were directly 
computed from the kriged mean and variance and a Gaussian assumption on 
the errors?
--
Edzer
ONKELINX, Thierry wrote:
------------------------------------------------------------------------
more
to
not
very
prediction
correlation
using
#
Thierry, this is one of the frequent misconceptions in geostatistics.

Sequential Gaussian simulation yields for each point a normal 
distribution, with mean and standard deviation equal (in the limit of 
very many simulations) to the kriged mean and standard deviation.
--
Edzer
ONKELINX, Thierry wrote:
#
Hi,
Both Edzer's example (extreme case when prediction is at observation locations) and Tom's tech report (pg8 "note that the Eq.(19) looks very much like (17), except it will give slightly lower values") suggest that assuming independence of the two variances will give values that are too large, if so this is useful to know.

For the follow-up question: how to present the prediction uncertainty? I would follow the usual approach for a binary glm, calculate a confidence interval on the logit scale, then back-transform the limits to the 0,1 scale. If space to present mapped outputs is limited I plan to calculate the width of the confidence interval on the 0,1 scale and map this. 

Thanks again, this list is an excellent catalyst for learning
David
david.maxwell at cefas.co.uk

-----Original Message-----
From: Edzer Pebesma [mailto:edzer.pebesma at uni-muenster.de]
Sent: 09 April 2008 09:49
To: Tomislav Hengl
Cc: David Maxwell (Cefas); r-sig-geo at stat.math.ethz.ch
Subject: Re: [R-sig-Geo] question about regression kriging


Tom,

I'm afraid things are harder than you sketch. In glm's, the parameter 
estimation is done using iteratively reweighted least squares, where the 
weights depend on a variance function that links the variance of 
observations to the mean. So, observations (residuals) are assumed to be 
unstationary, in principle, and because of the mean-dependency this 
changes over the iterations. The equations and references you mention 
afaik all assume a known, and fixed variogram, and one-step solutions, 
no iteration.

Also, you falsly accuse me of claiming one cannot back-transform 
prediction variances. I did not claim this (I have seen suggestions on 
how to do this), I just asked how David would do this.
--
Edzer
Tomislav Hengl wrote:
***********************************************************************************
This email and any attachments are intended for the name...{{dropped:10}}
#
Edzer,

I do not want to get into too much discussion (this is definitively not a place to discuss basic
mathematical/statistical theories). My experience is - things are complicated when trying to run
regression-kriging with 0/1 variables. 

In practice, I guess you can fit GLM model, then estimate the variogram for residuals. Imagine that
we have a point map showing binary observations (0/1) and a set of raster maps PC1-PC5 (pc.comps).
Then, we can make predictions using glms and your package gstat:

# regression (GLM) modelling:
# variogram modelling (original scale):
# final predictions (A) regression-kriging:
model=vt.rbvgm)
Now you have the predictions that can exceed 0-1 range, and no estimate of the UK variance (this is
the problem David has, I think). What we can do instead is to run UK in gstat on logits first, then
back-transform to original 0-1 scale: 

# Estimate the binary variable at logit-scale (as an average of the output of GLM and original 0/1
value). This is not possible if values are equal to 0/1, so we have to 'smooth' them:
# Now the values can be back-transformed to logit scale
# now fit variogram for residuals (logits):
# final predictions (B) UK in gstat:
model=vt.logitvgm)
# Back-transform (-20 to 20 values) to original scale:
Obviously, predictions with (B) are more attractive than predictions with (A), because the values
are within the 0-1 range and we also have the UK variance (but this is hard to interpret). Then, one
can standardize the prediction variance using the global variance so that you can do some
interpretation:
Of course, (B) is only a short-cut solution. One really needs to develop a sound methodology to
solve such problems (we at least offer a detailed discussion in
http://dx.doi.org/10.1016/j.geoderma.2007.04.022).


PS: How do you back-transform the GLM prediction variance to original scale (I was not aware of
this, apologies)? A reference would do.

Tom


-----Original Message-----
From: Edzer Pebesma [mailto:edzer.pebesma at uni-muenster.de] 
Sent: woensdag 9 april 2008 10:49
To: Tomislav Hengl
Cc: 'David Maxwell (Cefas)'; r-sig-geo at stat.math.ethz.ch
Subject: Re: [R-sig-Geo] question about regression kriging

Tom,

I'm afraid things are harder than you sketch. In glm's, the parameter 
estimation is done using iteratively reweighted least squares, where the 
weights depend on a variance function that links the variance of 
observations to the mean. So, observations (residuals) are assumed to be 
unstationary, in principle, and because of the mean-dependency this 
changes over the iterations. The equations and references you mention 
afaik all assume a known, and fixed variogram, and one-step solutions, 
no iteration.

Also, you falsly accuse me of claiming one cannot back-transform 
prediction variances. I did not claim this (I have seen suggestions on 
how to do this), I just asked how David would do this.
--
Edzer
Tomislav Hengl wrote:
wrong
Cressie
Earth
(logits).
Of
geoRglm
there
#
Tomislav Hengl wrote:
1. The white book, "Statistical Models in S"
2. The source of predict.glm (not for the faint of heart)
3. McCullogh & Nelder (I guess it's in there, but my copy is still in a 
box!)
--
Edzer
#
Sorry, I meant "UK variance of the interpolated logits".

I guess the solution is to either:

(A)
1. Run 100 SG simulations using logits;
2. Back-transform the values to 0-1 scale;
3. Determine the variance per grid node (now in 0-1 scale).

Running simulations could also be rather time-consuming (I often give up running SG simulations in
gstat with multiple raster maps as predictors -- memory limit problems, too time-consuming).

(B)
1. Determine the upper and lower 1-s confidence intervals -- 2 maps (in logit scale);
2. Back-transform the maps and derive the difference in 0-1 scale (estimation error);

But I guess that a method to directly back-transform the UK variance from logit scale to 0-1 scale
does not exist.

Tom Hengl

-----Original Message-----
From: Edzer Pebesma [mailto:edzer.pebesma at uni-muenster.de] 
Sent: woensdag 9 april 2008 14:48
To: Tomislav Hengl
Cc: r-sig-geo at stat.math.ethz.ch
Subject: Re: [R-sig-Geo] question about regression kriging
Tomislav Hengl wrote:
1. The white book, "Statistical Models in S"
2. The source of predict.glm (not for the faint of heart)
3. McCullogh & Nelder (I guess it's in there, but my copy is still in a 
box!)
--
Edzer