I'm trying to realize e regression kriging with gstat package on my soil samples data. The response variable (ECe measuere) and covariates appear positvely skewed. As Tomislav Hengl suggests in its "framework for RK" [1], a logistic transformation is proposed as a generic way to reduce the skeweness by using the physical limits of the data. Is it really a transformation that can be applied in the generic case of skewed datas? I mean,in my case I have non-normal residuals (from original data regression), and I'm trying to transform the residuals (and not the original values) to do SK on them . Is this approach correct? A related question is how to do normal score transformations (for my residuals) in R and gstat. I know gstat doesn't manage transformations and back-transformations, so it should be done previously in R... but I can't find any package that permit it in a straisghtforward way. I've found something with qqnorm(ppoints(data)) and the approx() function. Is that all? Giovanni [1] "A generic framework for spatial prediction of soil variables based on regressionkriging" Geoderma 122 (1?2), 75?93.
regression kriging in gstat with skewed distributions
3 messages · G. Allegri, Tomislav Hengl
Dear Giovanni, Logit transformation can be automatically applied to any variables which has a lower and upper physical limits (e.g. 0-100%). In R, you can transform a variable to logits by e.g.:
points = read.dbf("points.dbf")
points$SANDt = log((points$SAND/100)/(1-(points$SAND/100)))
After you interpolate your variable, you can back-transform the values by using:
SAND.rk = krige(fsand$call$formula, points[sel,], SPC, sand.rvgm)
SAND.rk$pred=exp(SAND.rk$var1.pred)/(1+exp(SAND.rk$var1.pred))*100
The prediction variance can not be back-transformed, but you can use the normalized prediction variance by dividing it with the sampled variance. See also section 4.2.1 of my lecture notes (http://geostat.pedometrics.org/). There are many transformations that can be applied to force a normality of your target variable (see e.g. http://en.wikipedia.org/wiki/Data_transformation_(statistics) ). The most generic transformation is to work with the probability density function values (see e.g. http://dx.doi.org/10.1016/j.jneumeth.2006.11.004 ), this way you do not have to think about how the histogram looks at all. But then the interpretation of the regression plots becomes rather difficult. In any case, you should apply the transformation already to the target variable because also a requirement for linear regression is that the residuals are normally distributed around the regression line. see also: FITTING DISTRIBUTIONS WITH R (by Vito Ricci) http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf Tom Hengl http://spatial-analyst.net -----Original Message----- From: r-sig-geo-bounces at stat.math.ethz.ch [mailto:r-sig-geo-bounces at stat.math.ethz.ch] On Behalf Of G. Allegri Sent: dinsdag 15 januari 2008 15:28 To: r-sig-geo at stat.math.ethz.ch Subject: [R-sig-Geo] regression kriging in gstat with skewed distributions I'm trying to realize e regression kriging with gstat package on my soil samples data. The response variable (ECe measuere) and covariates appear positvely skewed. As Tomislav Hengl suggests in its "framework for RK" [1], a logistic transformation is proposed as a generic way to reduce the skeweness by using the physical limits of the data. Is it really a transformation that can be applied in the generic case of skewed datas? I mean,in my case I have non-normal residuals (from original data regression), and I'm trying to transform the residuals (and not the original values) to do SK on them . Is this approach correct? A related question is how to do normal score transformations (for my residuals) in R and gstat. I know gstat doesn't manage transformations and back-transformations, so it should be done previously in R... but I can't find any package that permit it in a straisghtforward way. I've found something with qqnorm(ppoints(data)) and the approx() function. Is that all? Giovanni [1] "A generic framework for spatial prediction of soil variables based on regressionkriging" Geoderma 122 (1?2), 75?93. _______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Thank you Tomislav. I will try the logit transformation, but an interesting evaluation is to confront the results reached with a normal score transformation. Wouldn't this one be better suited for a generic transformation method? I know that GSLIB already manages it, but in R I don't know how to do it. qqnorm(ppoints(my_data)) seems to transform, but back-transormation is not documented. Giovanni 2008/1/16, Tomislav Hengl <hengl at science.uva.nl>:
Dear Giovanni, Logit transformation can be automatically applied to any variables which has a lower and upper physical limits (e.g. 0-100%). In R, you can transform a variable to logits by e.g.:
points = read.dbf("points.dbf")
points$SANDt = log((points$SAND/100)/(1-(points$SAND/100)))
After you interpolate your variable, you can back-transform the values by using:
SAND.rk = krige(fsand$call$formula, points[sel,], SPC, sand.rvgm)
SAND.rk$pred=exp(SAND.rk$var1.pred)/(1+exp(SAND.rk$var1.pred))*100
The prediction variance can not be back-transformed, but you can use the normalized prediction variance by dividing it with the sampled variance. See also section 4.2.1 of my lecture notes (http://geostat.pedometrics.org/). There are many transformations that can be applied to force a normality of your target variable (see e.g. http://en.wikipedia.org/wiki/Data_transformation_(statistics) ). The most generic transformation is to work with the probability density function values (see e.g. http://dx.doi.org/10.1016/j.jneumeth.2006.11.004 ), this way you do not have to think about how the histogram looks at all. But then the interpretation of the regression plots becomes rather difficult. In any case, you should apply the transformation already to the target variable because also a requirement for linear regression is that the residuals are normally distributed around the regression line. see also: FITTING DISTRIBUTIONS WITH R (by Vito Ricci) http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf Tom Hengl http://spatial-analyst.net -----Original Message----- From: r-sig-geo-bounces at stat.math.ethz.ch [mailto:r-sig-geo-bounces at stat.math.ethz.ch] On Behalf Of G. Allegri Sent: dinsdag 15 januari 2008 15:28 To: r-sig-geo at stat.math.ethz.ch Subject: [R-sig-Geo] regression kriging in gstat with skewed distributions I'm trying to realize e regression kriging with gstat package on my soil samples data. The response variable (ECe measuere) and covariates appear positvely skewed. As Tomislav Hengl suggests in its "framework for RK" [1], a logistic transformation is proposed as a generic way to reduce the skeweness by using the physical limits of the data. Is it really a transformation that can be applied in the generic case of skewed datas? I mean,in my case I have non-normal residuals (from original data regression), and I'm trying to transform the residuals (and not the original values) to do SK on them . Is this approach correct? A related question is how to do normal score transformations (for my residuals) in R and gstat. I know gstat doesn't manage transformations and back-transformations, so it should be done previously in R... but I can't find any package that permit it in a straisghtforward way. I've found something with qqnorm(ppoints(data)) and the approx() function. Is that all? Giovanni [1] "A generic framework for spatial prediction of soil variables based on regressionkriging" Geoderma 122 (1?2), 75?93.
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo