question about regression kriging
Edzer, I do not want to get into too much discussion (this is definitively not a place to discuss basic mathematical/statistical theories). My experience is - things are complicated when trying to run regression-kriging with 0/1 variables. In practice, I guess you can fit GLM model, then estimate the variogram for residuals. Imagine that we have a point map showing binary observations (0/1) and a set of raster maps PC1-PC5 (pc.comps). Then, we can make predictions using glms and your package gstat: # regression (GLM) modelling:
occurrences.absences$pr = ifelse(is.na(occurrences.absences$value), 0, 1) vtbin.lm = glm(pr~PC1+PC2+PC3+PC4+PC5, binomial(link=logit), occurrences.absences) vtbin.lm$residual = vtbin.lm$model$occurrence - vtbin.lm$fitted.values
# variogram modelling (original scale):
vt.rebin = variogram(vtbin.lm$residual~1, occurrences.absences) vt.rbvgm = fit.variogram(vt.rebin, vgm(nugget=0.01, model="Exp", range=40000, psill=0.03))
# final predictions (A) regression-kriging:
vt.bin.trend = predict(vtbin.lm, newdata=pc.comps, type="response")
vt.gres = gstat(id=c("occur"), formula=vtbin.lm$residual~1, data=occurrences.absences,
model=vt.rbvgm)
vt.rkbin = predict.gstat(vt.gres, pc.comps, nmax=100, beta=1, BLUE=FALSE) vt.rkbin$pred.bin = vt.bin.trend + vt.rkbin$occur.pred
Now you have the predictions that can exceed 0-1 range, and no estimate of the UK variance (this is the problem David has, I think). What we can do instead is to run UK in gstat on logits first, then back-transform to original 0-1 scale: # Estimate the binary variable at logit-scale (as an average of the output of GLM and original 0/1 value). This is not possible if values are equal to 0/1, so we have to 'smooth' them:
vtbin.lm$prs = (10*occurrences.absences$pr+vtbin.lm$fitted.values)/11
# Now the values can be back-transformed to logit scale
occurrences.absences$logits = log((vtbin.lm$prs)/(1-(vtbin.lm$prs)))
# now fit variogram for residuals (logits):
vt.relogit = variogram(vtbin.lm$residuals~1, occurrences.absences) vt.logitvgm = fit.variogram(vt.relogit, vgm(nugget=0, model="Exp", range=40000, psill=4))
# final predictions (B) UK in gstat:
vt.prs = gstat(id=c("occur"), formula=logits~PC1+PC2+PC3+PC4+PC5, data=occurrences.absences,
model=vt.logitvgm)
vt.rkprs = predict.gstat(vt.prs, pc.comps, nmax=100, beta=1, BLUE=FALSE)
# Back-transform (-20 to 20 values) to original scale:
vt.rkprs$pr = exp(vt.rkprs$occur.pred)/(1+exp(vt.rkprs$occur.pred))
Obviously, predictions with (B) are more attractive than predictions with (A), because the values are within the 0-1 range and we also have the UK variance (but this is hard to interpret). Then, one can standardize the prediction variance using the global variance so that you can do some interpretation:
vt.rkprs$nvar = vt.rkprs$occur.var/var(occurrences.absences$logits)
Of course, (B) is only a short-cut solution. One really needs to develop a sound methodology to solve such problems (we at least offer a detailed discussion in http://dx.doi.org/10.1016/j.geoderma.2007.04.022). PS: How do you back-transform the GLM prediction variance to original scale (I was not aware of this, apologies)? A reference would do. Tom -----Original Message----- From: Edzer Pebesma [mailto:edzer.pebesma at uni-muenster.de] Sent: woensdag 9 april 2008 10:49 To: Tomislav Hengl Cc: 'David Maxwell (Cefas)'; r-sig-geo at stat.math.ethz.ch Subject: Re: [R-sig-Geo] question about regression kriging Tom, I'm afraid things are harder than you sketch. In glm's, the parameter estimation is done using iteratively reweighted least squares, where the weights depend on a variance function that links the variance of observations to the mean. So, observations (residuals) are assumed to be unstationary, in principle, and because of the mean-dependency this changes over the iterations. The equations and references you mention afaik all assume a known, and fixed variogram, and one-step solutions, no iteration. Also, you falsly accuse me of claiming one cannot back-transform prediction variances. I did not claim this (I have seen suggestions on how to do this), I just asked how David would do this. -- Edzer
Tomislav Hengl wrote:
The two components of the regression-kriging model are not independent, hence you are doing a
wrong
thing if you are just summing them. You should use instead the universal kriging variance that is derived in gstat. The complete derivation of the Universal kriging variance is available in
Cressie
(1993; p.154), or even better Papritz and Stein (1999; p.94). See also pages 7-8 of our technical note: Hengl T., Heuvelink G.B.M. and Stein A., 2003. Comparison of kriging with external drift and regression-kriging. Technical report, International Institute for Geo-information Science and
Earth
Observation (ITC), Enschede, pp. 18. http://www.itc.nl/library/Papers_2003/misca/hengl_comparison.pdf Edzer is right, you can not back-transform prediction variance of the transformed variable
(logits).
However, you can standardize/normalize the UK variance by diving it with global variance (see e.g. http://dx.doi.org/10.1016/j.geoderma.2003.08.018), so that you can evaluate the success of prediction in relative terms (see also http://spatial-analyst.net/visualization.php). Tom Hengl http://spatial-analyst.net -----Original Message----- From: r-sig-geo-bounces at stat.math.ethz.ch [mailto:r-sig-geo-bounces at stat.math.ethz.ch] On Behalf
Of
Edzer Pebesma Sent: dinsdag 8 april 2008 20:50 To: David Maxwell (Cefas) Cc: r-sig-geo at stat.math.ethz.ch Subject: Re: [R-sig-Geo] question about regression kriging David Maxwell (Cefas) wrote:
Hi, Tom and Thierry, Thank you for your advice, the lecture notes are very useful. We will try
geoRglm
but for now regression kriging using the working residuals gives sensible answers even though
there
are some issues with using working residuals, i.e. not Normally distributed, occasional very large values and inv.logit(prediction type="link" + working residual) doesn't quite give the observed values.
Our final question about this is how to estimate standard errors for the regression kriging
predictions of the binary variable?
On the logit scale we are using
rk prediction (s0) = glm prediction (s0) + kriged residual prediction (s0)
for location s0
Is assuming independence of the two components adequate?
var rk(s0) ~= var glm prediction (s0) + var kriged residual prediction (s0)
In principle, no. The extreme case is prediction at observation locations, where the correlation is -1 so that the final prediction variance becomes zero. I never got to looking how large the correlation is otherwise, but that shouldn't be hard to do in the linear case, as you can get the first and second separately, and also the combined using universal kriging. Another question: how do you transform this variance back to the observation scale? -- Edzer
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo _______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo