I have found anomalous behavior in gstat's variogram estimation. I have listed 3 example variograms below for small data sets. In order to better estimate the nugget effect, I slightly perturbed the locations (by 1 foot increments) of duplicate results. The empirical variograms?are given below.
Before I did this (I averaged duplicate values initially), a Gaussian model with 0 nugget was selected for the second variogram and pure nugget models for the first and third. I am using the candidate model list ('Nug', 'Exp', 'Sph', 'Gau', 'Mat', 'Cir', 'Lin', 'Bes') and selecting the model based on SSErr for preliminary testing purposes.?Afterward, the pure nugget models had the lowest SSErr and were selected. Note that the variogram fits are completely controlled by the short range variance, because even the original pure nugget models are substantially different in the estimate of the nugget. The fitted models are listed below. Just by inspection, based on the numbers of pairs in these examples, a pure nugget model should be about halfway between the empirical semivariance of the last lag and the average of the other lags. However, the fitted nuggets are almost identical to the semivariance of the first last (dist = 1.4).
It seems to me that this must be due to a bug in the GSTAT code. I pointed this out to Edzer Pebesma, and he asked me to post it here.
The variograms are
tmp.vgm
[[1]]
? np?????? dist????? gamma dir.hor dir.ver? id
1? 4?? 1.414214 0.14174537?????? 0?????? 0 PC1
2? 2? 44.742603 6.70989788?????? 0?????? 0 PC1
3? 2? 57.707880 1.76351594?????? 0?????? 0 PC1
4? 4? 59.987678 1.52197310?????? 0?????? 0 PC1
5? 3? 71.512518 1.21348268?????? 0?????? 0 PC1
6? 1? 84.852877 0.05381849?????? 0?????? 0 PC1
7? 1? 97.266495 1.21827622?????? 0?????? 0 PC1
8? 3 112.237133 5.07947925?????? 0?????? 0 PC1
9 18 121.478856 1.93707676?????? 0?????? 0 PC1
[[2]]
? np?????? dist????? gamma dir.hor dir.ver? id
1? 4?? 1.414214 0.09725079?????? 0?????? 0 PC2
2? 2? 44.742603 0.33598072?????? 0?????? 0 PC2
3? 2? 57.707880 0.39088727?????? 0?????? 0 PC2
4? 4? 59.987678 0.87315735?????? 0?????? 0 PC2
5? 3? 71.512518 0.14944845?????? 0?????? 0 PC2
6? 1? 84.852877 0.19809863?????? 0?????? 0 PC2
7? 1? 97.266495 0.63557814?????? 0?????? 0 PC2
8? 3 112.237133 1.92063948?????? 0?????? 0 PC2
9 18 121.478856 0.65468693?????? 0 ??????0 PC2
[[3]]
? np?????? dist?????? gamma dir.hor dir.ver? id
1? 4?? 1.414214 0.035250817?????? 0?????? 0 PC3
2? 2? 44.742603 0.105299796?????? 0?????? 0 PC3
3? 2? 57.707880 0.020245674?????? 0?????? 0 PC3
4? 4? 59.987678 0.124159836?????? 0?????? 0 PC3
5? 3? 71.512518 0.008112554?????? 0?????? 0 PC3
6? 1? 84.852877 0.034337591?????? 0?????? 0 PC3
7? 1? 97.266495 0.053879459?????? 0?????? 0 PC3
8? 3 112.237133 0.021922987?????? 0?????? 0 PC3
9 18 121.478856 0.085270969?????? 0?????? 0 PC3
But the fitted models are:
tmp.vgm.fit
[[1]]
? model???? psill range
1?? Nug 0.1483120???? 0
[[2]]
? model????? psill range
1?? Nug 0.09849419???? 0
[[3]]
? model????? psill range
1?? Nug 0.03535234???? 0
John H. Carson Jr., PhD
Senior Statistician
Applied Sciences & Engineering
Shaw Environmental & Infrastructure
16406 US Rte 224 East
Findlay, OH 45840
Phone 419-425-6156
Fax 419-425-6085
john.carson at shawgrp.com
http://www.shawgrp.com/
Shaw(tm) a world of Solutions(tm)
?
****Internet Email Confidentiality Footer****
Privileged/Confidential Information may be contained in this
message. If you are not the addressee indicated in this message (or
responsible for delivery of the message to such person), you may
not copy or deliver this message to anyone. In such case, you
should destroy this message and notify the sender by reply email.
Please advise immediately if you or your employer do not consent to
Internet email for messages of this kind. Opinions, conclusions and
other information in this message that do not relate to the
official business of The Shaw Group Inc. or its subsidiaries shall
be understood as neither given nor endorsed by it.
______________________________________ The Shaw Group Inc.
http://www.shawgrp.com
Problem with gstat variogram estimation
6 messages · Carson, John, ONKELINX, Thierry, Edzer Pebesma +1 more
Dear John,
Isn't your problem rather a result from the unstable empirical variograms? Mainly due to the very low number of pairs per bin. Furthermore have a look at the weights that each bin gets in the fit.variogram() function. The default is N/h^2. In your case the first bin gets a weight of 2 whereas the other bins have weights ranging from 1e-3 tot 1e-4! Hence no surprise that the nugget is nearly identical to the semivariance of the first bin.
In this case I would not trust the results for fit.variogram(). Not because of bugs in gstat, but because I don't trust empirical variogram with that low number of pairs per bin.
HTH,
Thierry
----------------------------------------------------------------------------
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
-----Oorspronkelijk bericht-----
Van: r-sig-geo-bounces at stat.math.ethz.ch [mailto:r-sig-geo-bounces at stat.math.ethz.ch] Namens Carson, John
Verzonden: maandag 12 oktober 2009 15:35
Aan: r-sig-geo at stat.math.ethz.ch
Onderwerp: [R-sig-Geo] Problem with gstat variogram estimation
I have found anomalous behavior in gstat's variogram estimation. I have listed 3 example variograms below for small data sets. In order to better estimate the nugget effect, I slightly perturbed the locations (by 1 foot increments) of duplicate results. The empirical variograms?are given below.
Before I did this (I averaged duplicate values initially), a Gaussian model with 0 nugget was selected for the second variogram and pure nugget models for the first and third. I am using the candidate model list ('Nug', 'Exp', 'Sph', 'Gau', 'Mat', 'Cir', 'Lin', 'Bes') and selecting the model based on SSErr for preliminary testing purposes.?Afterward, the pure nugget models had the lowest SSErr and were selected. Note that the variogram fits are completely controlled by the short range variance, because even the original pure nugget models are substantially different in the estimate of the nugget. The fitted models are listed below. Just by inspection, based on the numbers of pairs in these examples, a pure nugget model should be about halfway between the empirical semivariance of the last lag and the average of the other lags. However, the fitted nuggets are almost identical to the semivariance of the first last (dist = 1.4).
It seems to me that this must be due to a bug in the GSTAT code. I pointed this out to Edzer Pebesma, and he asked me to post it here.
The variograms are
tmp.vgm
[[1]]
? np?????? dist????? gamma dir.hor dir.ver? id
1? 4?? 1.414214 0.14174537?????? 0?????? 0 PC1
2? 2? 44.742603 6.70989788?????? 0?????? 0 PC1
3? 2? 57.707880 1.76351594?????? 0?????? 0 PC1
4? 4? 59.987678 1.52197310?????? 0?????? 0 PC1
5? 3? 71.512518 1.21348268?????? 0?????? 0 PC1
6? 1? 84.852877 0.05381849?????? 0?????? 0 PC1
7? 1? 97.266495 1.21827622?????? 0?????? 0 PC1
8? 3 112.237133 5.07947925?????? 0?????? 0 PC1
9 18 121.478856 1.93707676?????? 0?????? 0 PC1
[[2]]
? np?????? dist????? gamma dir.hor dir.ver? id
1? 4?? 1.414214 0.09725079?????? 0?????? 0 PC2
2? 2? 44.742603 0.33598072?????? 0?????? 0 PC2
3? 2? 57.707880 0.39088727?????? 0?????? 0 PC2
4? 4? 59.987678 0.87315735?????? 0?????? 0 PC2
5? 3? 71.512518 0.14944845?????? 0?????? 0 PC2
6? 1? 84.852877 0.19809863?????? 0?????? 0 PC2
7? 1? 97.266495 0.63557814?????? 0?????? 0 PC2
8? 3 112.237133 1.92063948?????? 0?????? 0 PC2
9 18 121.478856 0.65468693?????? 0 ??????0 PC2
[[3]]
? np?????? dist?????? gamma dir.hor dir.ver? id
1? 4?? 1.414214 0.035250817?????? 0?????? 0 PC3
2? 2? 44.742603 0.105299796?????? 0?????? 0 PC3
3? 2? 57.707880 0.020245674?????? 0?????? 0 PC3
4? 4? 59.987678 0.124159836?????? 0?????? 0 PC3
5? 3? 71.512518 0.008112554?????? 0?????? 0 PC3
6? 1? 84.852877 0.034337591?????? 0?????? 0 PC3
7? 1? 97.266495 0.053879459?????? 0?????? 0 PC3
8? 3 112.237133 0.021922987?????? 0?????? 0 PC3
9 18 121.478856 0.085270969?????? 0?????? 0 PC3
But the fitted models are:
tmp.vgm.fit
[[1]]
? model???? psill range
1?? Nug 0.1483120???? 0
[[2]]
? model????? psill range
1?? Nug 0.09849419???? 0
[[3]]
? model????? psill range
1?? Nug 0.03535234???? 0
John H. Carson Jr., PhD
Senior Statistician
Applied Sciences & Engineering
Shaw Environmental & Infrastructure
16406 US Rte 224 East
Findlay, OH 45840
Phone 419-425-6156
Fax 419-425-6085
john.carson at shawgrp.com
http://www.shawgrp.com/
Shaw(tm) a world of Solutions(tm)
?
****Internet Email Confidentiality Footer**** Privileged/Confidential Information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone. In such case, you should destroy this message and notify the sender by reply email.
Please advise immediately if you or your employer do not consent to Internet email for messages of this kind. Opinions, conclusions and other information in this message that do not relate to the official business of The Shaw Group Inc. or its subsidiaries shall be understood as neither given nor endorsed by it.
______________________________________ The Shaw Group Inc.
http://www.shawgrp.com
_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Druk dit bericht a.u.b. niet onnodig af.
Please do not print this message unnecessarily.
Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer
en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is
door een geldig ondertekend document. The views expressed in this message
and any annex are purely those of the writer and may not be regarded as stating
an official position of INBO, as long as the message is not confirmed by a duly
signed document.
John, thanks for sharing this with r-sig-geo. As Thierry mentioned, the default model fitting procedure (fit.variogram in package gstat) uses weighted least squares, with weights proportional to N_h/(h^2). This explains why the first lag gets so much weight. For pure nugget models, this of course makes little sense; for other models it often does. Argument fit.method gives you somewhat more control. Give it value 1 to have N_h weights; give it value 6 to do unweighted averaging (I agree that this information should be in the fit.variogram documentation). The SSErr values will be uncomparable accross different weighting schemes, as you might expect. -- Edzer
Carson, John wrote:
I have found anomalous behavior in gstat's variogram estimation. I have listed 3 example variograms below for small data sets. In order to better estimate the nugget effect, I slightly perturbed the locations (by 1 foot increments) of duplicate results. The empirical variograms are given below.
Before I did this (I averaged duplicate values initially), a Gaussian model with 0 nugget was selected for the second variogram and pure nugget models for the first and third. I am using the candidate model list ('Nug', 'Exp', 'Sph', 'Gau', 'Mat', 'Cir', 'Lin', 'Bes') and selecting the model based on SSErr for preliminary testing purposes. Afterward, the pure nugget models had the lowest SSErr and were selected. Note that the variogram fits are completely controlled by the short range variance, because even the original pure nugget models are substantially different in the estimate of the nugget. The fitted models are listed below. Just by inspection, based on the numbers of pairs in these examples, a pure nugget model should be about halfway between the empirical semivariance of the last lag and the average of the other lags. However, the fitted nuggets are almost identical to the semivariance of the first last (dist = 1.4).
It seems to me that this must be due to a bug in the GSTAT code. I pointed this out to Edzer Pebesma, and he asked me to post it here.
The variograms are
tmp.vgm
[[1]]
np dist gamma dir.hor dir.ver id
1 4 1.414214 0.14174537 0 0 PC1
2 2 44.742603 6.70989788 0 0 PC1
3 2 57.707880 1.76351594 0 0 PC1
4 4 59.987678 1.52197310 0 0 PC1
5 3 71.512518 1.21348268 0 0 PC1
6 1 84.852877 0.05381849 0 0 PC1
7 1 97.266495 1.21827622 0 0 PC1
8 3 112.237133 5.07947925 0 0 PC1
9 18 121.478856 1.93707676 0 0 PC1
[[2]]
np dist gamma dir.hor dir.ver id
1 4 1.414214 0.09725079 0 0 PC2
2 2 44.742603 0.33598072 0 0 PC2
3 2 57.707880 0.39088727 0 0 PC2
4 4 59.987678 0.87315735 0 0 PC2
5 3 71.512518 0.14944845 0 0 PC2
6 1 84.852877 0.19809863 0 0 PC2
7 1 97.266495 0.63557814 0 0 PC2
8 3 112.237133 1.92063948 0 0 PC2
9 18 121.478856 0.65468693 0 0 PC2
[[3]]
np dist gamma dir.hor dir.ver id
1 4 1.414214 0.035250817 0 0 PC3
2 2 44.742603 0.105299796 0 0 PC3
3 2 57.707880 0.020245674 0 0 PC3
4 4 59.987678 0.124159836 0 0 PC3
5 3 71.512518 0.008112554 0 0 PC3
6 1 84.852877 0.034337591 0 0 PC3
7 1 97.266495 0.053879459 0 0 PC3
8 3 112.237133 0.021922987 0 0 PC3
9 18 121.478856 0.085270969 0 0 PC3
But the fitted models are:
tmp.vgm.fit
[[1]]
model psill range
1 Nug 0.1483120 0
[[2]]
model psill range
1 Nug 0.09849419 0
[[3]]
model psill range
1 Nug 0.03535234 0
John H. Carson Jr., PhD
Senior Statistician
Applied Sciences & Engineering
Shaw Environmental & Infrastructure
16406 US Rte 224 East
Findlay, OH 45840
Phone 419-425-6156
Fax 419-425-6085
john.carson at shawgrp.com
http://www.shawgrp.com/
Shaw(tm) a world of Solutions(tm)
****Internet Email Confidentiality Footer****
Privileged/Confidential Information may be contained in this
message. If you are not the addressee indicated in this message (or
responsible for delivery of the message to such person), you may
not copy or deliver this message to anyone. In such case, you
should destroy this message and notify the sender by reply email.
Please advise immediately if you or your employer do not consent to
Internet email for messages of this kind. Opinions, conclusions and
other information in this message that do not relate to the
official business of The Shaw Group Inc. or its subsidiaries shall
be understood as neither given nor endorsed by it.
______________________________________ The Shaw Group Inc. http://www.shawgrp.com _______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Edzer Pebesma Institute for Geoinformatics (ifgi), University of M?nster Weseler Stra?e 253, 48151 M?nster, Germany. Phone: +49 251 8333081, Fax: +49 251 8339763 http://ifgi.uni-muenster.de/ http://www.springer.com/978-0-387-78170-9 e.pebesma at wwu.de
Edzer Pebesma wrote:
John, thanks for sharing this with r-sig-geo. As Thierry mentioned, the default model fitting procedure (fit.variogram in package gstat) uses weighted least squares, with weights proportional to N_h/(h^2). This explains why the first lag gets so much weight. For pure nugget models, this of course makes little sense; for other models it often does. Argument fit.method gives you somewhat more control. Give it value 1 to have N_h weights; give it value 6 to do unweighted averaging (I agree that this information should be in the
It is listed in the gstat documentation: http://www.gstat.org/gstat.pdf on page 42, in the middle. cheers, Paul
fit.variogram documentation). The SSErr values will be uncomparable accross different weighting schemes, as you might expect. -- Edzer Carson, John wrote:
I have found anomalous behavior in gstat's variogram estimation. I have listed 3 example variograms below for small data sets. In order to better estimate the nugget effect, I slightly perturbed the locations (by 1 foot increments) of duplicate results. The empirical variograms are given below.
Before I did this (I averaged duplicate values initially), a Gaussian model with 0 nugget was selected for the second variogram and pure nugget models for the first and third. I am using the candidate model list ('Nug', 'Exp', 'Sph', 'Gau', 'Mat', 'Cir', 'Lin', 'Bes') and selecting the model based on SSErr for preliminary testing purposes. Afterward, the pure nugget models had the lowest SSErr and were selected. Note that the variogram fits are completely controlled by the short range variance, because even the original pure nugget models are substantially different in the estimate of the nugget. The fitted models are listed below. Just by inspection, based on the numbers of pairs in these examples, a pure nugget model should be about halfway between the empirical semivariance of the last lag and the average of the other lags. However, the fitted nuggets are almost identical to the semivariance of the first last (dist = 1.4).
It seems to me that this must be due to a bug in the GSTAT code. I pointed this out to Edzer Pebesma, and he asked me to post it here.
The variograms are
tmp.vgm
[[1]]
np dist gamma dir.hor dir.ver id
1 4 1.414214 0.14174537 0 0 PC1
2 2 44.742603 6.70989788 0 0 PC1
3 2 57.707880 1.76351594 0 0 PC1
4 4 59.987678 1.52197310 0 0 PC1
5 3 71.512518 1.21348268 0 0 PC1
6 1 84.852877 0.05381849 0 0 PC1
7 1 97.266495 1.21827622 0 0 PC1
8 3 112.237133 5.07947925 0 0 PC1
9 18 121.478856 1.93707676 0 0 PC1
[[2]]
np dist gamma dir.hor dir.ver id
1 4 1.414214 0.09725079 0 0 PC2
2 2 44.742603 0.33598072 0 0 PC2
3 2 57.707880 0.39088727 0 0 PC2
4 4 59.987678 0.87315735 0 0 PC2
5 3 71.512518 0.14944845 0 0 PC2
6 1 84.852877 0.19809863 0 0 PC2
7 1 97.266495 0.63557814 0 0 PC2
8 3 112.237133 1.92063948 0 0 PC2
9 18 121.478856 0.65468693 0 0 PC2
[[3]]
np dist gamma dir.hor dir.ver id
1 4 1.414214 0.035250817 0 0 PC3
2 2 44.742603 0.105299796 0 0 PC3
3 2 57.707880 0.020245674 0 0 PC3
4 4 59.987678 0.124159836 0 0 PC3
5 3 71.512518 0.008112554 0 0 PC3
6 1 84.852877 0.034337591 0 0 PC3
7 1 97.266495 0.053879459 0 0 PC3
8 3 112.237133 0.021922987 0 0 PC3
9 18 121.478856 0.085270969 0 0 PC3
But the fitted models are:
tmp.vgm.fit
[[1]]
model psill range
1 Nug 0.1483120 0
[[2]]
model psill range
1 Nug 0.09849419 0
[[3]]
model psill range
1 Nug 0.03535234 0
John H. Carson Jr., PhD
Senior Statistician
Applied Sciences & Engineering
Shaw Environmental & Infrastructure
16406 US Rte 224 East
Findlay, OH 45840
Phone 419-425-6156
Fax 419-425-6085
john.carson at shawgrp.com
http://www.shawgrp.com/
Shaw(tm) a world of Solutions(tm)
****Internet Email Confidentiality Footer****
Privileged/Confidential Information may be contained in this
message. If you are not the addressee indicated in this message (or
responsible for delivery of the message to such person), you may
not copy or deliver this message to anyone. In such case, you
should destroy this message and notify the sender by reply email.
Please advise immediately if you or your employer do not consent to
Internet email for messages of this kind. Opinions, conclusions and
other information in this message that do not relate to the
official business of The Shaw Group Inc. or its subsidiaries shall
be understood as neither given nor endorsed by it.
______________________________________ The Shaw Group Inc. http://www.shawgrp.com _______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Drs. Paul Hiemstra Department of Physical Geography Faculty of Geosciences University of Utrecht Heidelberglaan 2 P.O. Box 80.115 3508 TC Utrecht Phone: +3130 274 3113 Mon-Tue Phone: +3130 253 5773 Wed-Fri http://intamap.geo.uu.nl/~paul
Dear Thierry, Thank you. You're right. I have small data sets, which are problematic. I need to use the duplicate pair data, but the default weighting in gstat for variograms doesn't work well with closely spaced data. I am looking now at REML instead. Best wishes, John John H. Carson Jr., PhD Senior Statistician Applied Sciences & Engineering Shaw Environmental & Infrastructure 16406 US Rte 224 East Findlay, OH 45840 Phone 419-425-6156 Fax 419-425-6085 john.carson at shawgrp.com http://www.shawgrp.com/ Shaw(tm) a world of Solutions(tm) -----Original Message----- From: ONKELINX, Thierry [mailto:Thierry.ONKELINX at inbo.be] Sent: Monday, October 12, 2009 10:23 AM To: Carson, John; r-sig-geo at stat.math.ethz.ch Subject: RE: [R-sig-Geo] Problem with gstat variogram estimation Dear John, Isn't your problem rather a result from the unstable empirical variograms? Mainly due to the very low number of pairs per bin. Furthermore have a look at the weights that each bin gets in the fit.variogram() function. The default is N/h^2. In your case the first bin gets a weight of 2 whereas the other bins have weights ranging from 1e-3 tot 1e-4! Hence no surprise that the nugget is nearly identical to the semivariance of the first bin. In this case I would not trust the results for fit.variogram(). Not because of bugs in gstat, but because I don't trust empirical variogram with that low number of pairs per bin. HTH, Thierry ---------------------------------------------------------------------------- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-sig-geo-bounces at stat.math.ethz.ch [mailto:r-sig-geo-bounces at stat.math.ethz.ch] Namens Carson, John Verzonden: maandag 12 oktober 2009 15:35 Aan: r-sig-geo at stat.math.ethz.ch Onderwerp: [R-sig-Geo] Problem with gstat variogram estimation I have found anomalous behavior in gstat's variogram estimation. I have listed 3 example variograms below for small data sets. In order to better estimate the nugget effect, I slightly perturbed the locations (by 1 foot increments) of duplicate results. The empirical variograms?are given below. Before I did this (I averaged duplicate values initially), a Gaussian model with 0 nugget was selected for the second variogram and pure nugget models for the first and third. I am using the candidate model list ('Nug', 'Exp', 'Sph', 'Gau', 'Mat', 'Cir', 'Lin', 'Bes') and selecting the model based on SSErr for preliminary testing purposes.?Afterward, the pure nugget models had the lowest SSErr and were selected. Note that the variogram fits are completely controlled by the short range variance, because even the original pure nugget models are substantially different in the estimate of the nugget. The fitted models are listed below. Just by inspection, based on the numbers of pairs in these examples, a pure nugget model should be about halfway between the empirical semivariance of the last lag and the average of the other lags. However, the fitted nuggets are almost identical to the semivariance of the first last (dist = 1.4). It seems to me that this must be due to a bug in the GSTAT code. I pointed this out to Edzer Pebesma, and he asked me to post it here. The variograms are tmp.vgm [[1]] ? np?????? dist????? gamma dir.hor dir.ver? id 1? 4?? 1.414214 0.14174537?????? 0?????? 0 PC1 2? 2? 44.742603 6.70989788?????? 0?????? 0 PC1 3? 2? 57.707880 1.76351594?????? 0?????? 0 PC1 4? 4? 59.987678 1.52197310?????? 0?????? 0 PC1 5? 3? 71.512518 1.21348268?????? 0?????? 0 PC1 6? 1? 84.852877 0.05381849?????? 0?????? 0 PC1 7? 1? 97.266495 1.21827622?????? 0?????? 0 PC1 8? 3 112.237133 5.07947925?????? 0?????? 0 PC1 9 18 121.478856 1.93707676?????? 0?????? 0 PC1 [[2]] ? np?????? dist????? gamma dir.hor dir.ver? id 1? 4?? 1.414214 0.09725079?????? 0?????? 0 PC2 2? 2? 44.742603 0.33598072?????? 0?????? 0 PC2 3? 2? 57.707880 0.39088727?????? 0?????? 0 PC2 4? 4? 59.987678 0.87315735?????? 0?????? 0 PC2 5? 3? 71.512518 0.14944845?????? 0?????? 0 PC2 6? 1? 84.852877 0.19809863?????? 0?????? 0 PC2 7? 1? 97.266495 0.63557814?????? 0?????? 0 PC2 8? 3 112.237133 1.92063948?????? 0?????? 0 PC2 9 18 121.478856 0.65468693?????? 0 ??????0 PC2 [[3]] ? np?????? dist?????? gamma dir.hor dir.ver? id 1? 4?? 1.414214 0.035250817?????? 0?????? 0 PC3 2? 2? 44.742603 0.105299796?????? 0?????? 0 PC3 3? 2? 57.707880 0.020245674?????? 0?????? 0 PC3 4? 4? 59.987678 0.124159836?????? 0?????? 0 PC3 5? 3? 71.512518 0.008112554?????? 0?????? 0 PC3 6? 1? 84.852877 0.034337591?????? 0?????? 0 PC3 7? 1? 97.266495 0.053879459?????? 0?????? 0 PC3 8? 3 112.237133 0.021922987?????? 0?????? 0 PC3 9 18 121.478856 0.085270969?????? 0?????? 0 PC3 But the fitted models are: tmp.vgm.fit [[1]] ? model???? psill range 1?? Nug 0.1483120???? 0 [[2]] ? model????? psill range 1?? Nug 0.09849419???? 0 [[3]] ? model????? psill range 1?? Nug 0.03535234???? 0 John H. Carson Jr., PhD Senior Statistician Applied Sciences & Engineering Shaw Environmental & Infrastructure 16406 US Rte 224 East Findlay, OH 45840 Phone 419-425-6156 Fax 419-425-6085 john.carson at shawgrp.com http://www.shawgrp.com/ Shaw(tm) a world of Solutions(tm) ? ****Internet Email Confidentiality Footer**** Privileged/Confidential Information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone. In such case, you should destroy this message and notify the sender by reply email. Please advise immediately if you or your employer do not consent to Internet email for messages of this kind. Opinions, conclusions and other information in this message that do not relate to the official business of The Shaw Group Inc. or its subsidiaries shall be understood as neither given nor endorsed by it. ______________________________________ The Shaw Group Inc. http://www.shawgrp.com _______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo Druk dit bericht a.u.b. niet onnodig af. Please do not print this message unnecessarily. Dit bericht en eventuele bijlagen geven enkel de visie van de schrijver weer en binden het INBO onder geen enkel beding, zolang dit bericht niet bevestigd is door een geldig ondertekend document. The views expressed in this message and any annex are purely those of the writer and may not be regarded as stating an official position of INBO, as long as the message is not confirmed by a duly signed document.
Edzer, Thank you. I am now looking at REML since I have small data sets and I really need to use the duplicate data. In the follow on I will need to accommodate adaptive sampling weights. The first stage selection is systematic (with some minor modifications). The second stage is sampling areas adjacent to any primary location that exceeds a threshold for one of the variables of interest. Another reason, for example, that one might weight would be to reflect the number of (physical) component samples in a composite soil sample. This may vary from laboratory result to laboratory result. Is there any way to accommodate this in gstat? Thanks, John John H. Carson Jr., PhD Senior Statistician Applied Sciences & Engineering Shaw Environmental & Infrastructure 16406 US Rte 224 East Findlay, OH 45840 Phone 419-425-6156 Fax 419-425-6085 john.carson at shawgrp.com http://www.shawgrp.com/ Shaw(tm) a world of Solutions(tm) -----Original Message----- From: Edzer Pebesma [mailto:edzer.pebesma at uni-muenster.de] Sent: Monday, October 12, 2009 10:38 AM To: Carson, John Cc: r-sig-geo at stat.math.ethz.ch Subject: Re: [R-sig-Geo] Problem with gstat variogram estimation John, thanks for sharing this with r-sig-geo. As Thierry mentioned, the default model fitting procedure (fit.variogram in package gstat) uses weighted least squares, with weights proportional to N_h/(h^2). This explains why the first lag gets so much weight. For pure nugget models, this of course makes little sense; for other models it often does. Argument fit.method gives you somewhat more control. Give it value 1 to have N_h weights; give it value 6 to do unweighted averaging (I agree that this information should be in the fit.variogram documentation). The SSErr values will be uncomparable accross different weighting schemes, as you might expect. -- Edzer
Carson, John wrote:
I have found anomalous behavior in gstat's variogram estimation. I have listed 3 example variograms below for small data sets. In order to better estimate the nugget effect, I slightly perturbed the locations (by 1 foot increments) of duplicate results. The empirical variograms are given below.
Before I did this (I averaged duplicate values initially), a Gaussian model with 0 nugget was selected for the second variogram and pure nugget models for the first and third. I am using the candidate model list ('Nug', 'Exp', 'Sph', 'Gau', 'Mat', 'Cir', 'Lin', 'Bes') and selecting the model based on SSErr for preliminary testing purposes. Afterward, the pure nugget models had the lowest SSErr and were selected. Note that the variogram fits are completely controlled by the short range variance, because even the original pure nugget models are substantially different in the estimate of the nugget. The fitted models are listed below. Just by inspection, based on the numbers of pairs in these examples, a pure nugget model should be about halfway between the empirical semivariance of the last lag and the average of the other lags. However, the fitted nuggets are almost identical to the semivariance of the first last (dist = 1.4).
It seems to me that this must be due to a bug in the GSTAT code. I pointed this out to Edzer Pebesma, and he asked me to post it here.
The variograms are
tmp.vgm
[[1]]
np dist gamma dir.hor dir.ver id
1 4 1.414214 0.14174537 0 0 PC1
2 2 44.742603 6.70989788 0 0 PC1
3 2 57.707880 1.76351594 0 0 PC1
4 4 59.987678 1.52197310 0 0 PC1
5 3 71.512518 1.21348268 0 0 PC1
6 1 84.852877 0.05381849 0 0 PC1
7 1 97.266495 1.21827622 0 0 PC1
8 3 112.237133 5.07947925 0 0 PC1
9 18 121.478856 1.93707676 0 0 PC1
[[2]]
np dist gamma dir.hor dir.ver id
1 4 1.414214 0.09725079 0 0 PC2
2 2 44.742603 0.33598072 0 0 PC2
3 2 57.707880 0.39088727 0 0 PC2
4 4 59.987678 0.87315735 0 0 PC2
5 3 71.512518 0.14944845 0 0 PC2
6 1 84.852877 0.19809863 0 0 PC2
7 1 97.266495 0.63557814 0 0 PC2
8 3 112.237133 1.92063948 0 0 PC2
9 18 121.478856 0.65468693 0 0 PC2
[[3]]
np dist gamma dir.hor dir.ver id
1 4 1.414214 0.035250817 0 0 PC3
2 2 44.742603 0.105299796 0 0 PC3
3 2 57.707880 0.020245674 0 0 PC3
4 4 59.987678 0.124159836 0 0 PC3
5 3 71.512518 0.008112554 0 0 PC3
6 1 84.852877 0.034337591 0 0 PC3
7 1 97.266495 0.053879459 0 0 PC3
8 3 112.237133 0.021922987 0 0 PC3
9 18 121.478856 0.085270969 0 0 PC3
But the fitted models are:
tmp.vgm.fit
[[1]]
model psill range
1 Nug 0.1483120 0
[[2]]
model psill range
1 Nug 0.09849419 0
[[3]]
model psill range
1 Nug 0.03535234 0
John H. Carson Jr., PhD
Senior Statistician
Applied Sciences & Engineering
Shaw Environmental & Infrastructure
16406 US Rte 224 East
Findlay, OH 45840
Phone 419-425-6156
Fax 419-425-6085
john.carson at shawgrp.com
http://www.shawgrp.com/
Shaw(tm) a world of Solutions(tm)
****Internet Email Confidentiality Footer****
Privileged/Confidential Information may be contained in this
message. If you are not the addressee indicated in this message (or
responsible for delivery of the message to such person), you may
not copy or deliver this message to anyone. In such case, you
should destroy this message and notify the sender by reply email.
Please advise immediately if you or your employer do not consent to
Internet email for messages of this kind. Opinions, conclusions and
other information in this message that do not relate to the
official business of The Shaw Group Inc. or its subsidiaries shall
be understood as neither given nor endorsed by it.
______________________________________ The Shaw Group Inc. http://www.shawgrp.com _______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Edzer Pebesma Institute for Geoinformatics (ifgi), University of M?nster Weseler Stra?e 253, 48151 M?nster, Germany. Phone: +49 251 8333081, Fax: +49 251 8339763 http://ifgi.uni-muenster.de/ http://www.springer.com/978-0-387-78170-9 e.pebesma at wwu.de