Dear esteemed list, I'm using spdep::EBest with family = 'binomial' for counts of events within polygons that have an 'at risk' population. The resultant "estmm" is 'shrunk' compared to the raw rate (both given by EBest and calculated "by hand" rate. All good there. Using GeoDa version 1.14.0 24 August 2019 produces identical results for its Empirical Bayesian rate. This was confirmed by plotting the EBest output against GeoDa's rate and finding a perfect correlation along the 1 to 1 line. All good there. Two questions: 1. How can credible intervals around these smoothed rate estimates be calculated? 2. The spdep documentation calls this a binomial family, but the identical results are obtained from GeoDa calls this "Poisson-Gamma" model here: https://geodacenter.github.io/workbook/3b_rates/lab3b.html#fnref11 , so what is actually being calculated? This question may help me answer the first question.. Possibly the answers are addressed in the literature cited which I cannot access right now at home without institutional library access. Thanks for your consideration, Dexter http://dexterlocke.com/
credible interval for empirical Bayesian estimates of rates
5 messages · Dexter Locke, Roger Bivand
On Fri, 24 Apr 2020, Dexter Locke wrote:
Dear esteemed list, I'm using spdep::EBest with family = 'binomial' for counts of events within polygons that have an 'at risk' population. The resultant "estmm" is 'shrunk' compared to the raw rate (both given by EBest and calculated "by hand" rate. All good there. Using GeoDa version 1.14.0 24 August 2019 produces identical results for its Empirical Bayesian rate. This was confirmed by plotting the EBest output against GeoDa's rate and finding a perfect correlation along the 1 to 1 line. All good there.
Please provide a reproducible example, as this may help with answers.
Two questions: 1. How can credible intervals around these smoothed rate estimates be calculated? 2. The spdep documentation calls this a binomial family, but the identical results are obtained from GeoDa calls this "Poisson-Gamma" model here: https://geodacenter.github.io/workbook/3b_rates/lab3b.html#fnref11 , so what is actually being calculated? This question may help me answer the first question..
No, the default family is "poisson", with "binomial" available for non-rare conditions following Martuzzi, implemented by Olaf Berke, ?spdep::EBest. The code in spdep is easily accessible, so can be read directly. Please also compare with code for the EB Moran test, and with analogous code in the DCluster package, empbaysmooth(). Cf. ASDAR 2nd ed., ch. 10, section 10.2, pp. 322-328. The epitools::pois.exact() function is used for CIs. For code and data see https://asdar-book.org/bundles2ed/dismap_bundle.zip.
Possibly the answers are addressed in the literature cited which I cannot access right now at home without institutional library access.
Most institutions do have proxy or VPN access, but the code will be as useful. In PySAL, the code would also guide you, but even though GeoDa is open source, the C++ is fairly dense. Hope this helps, Roger
Thanks for your consideration, Dexter http://dexterlocke.com/ [[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no https://orcid.org/0000-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
Thanks Roger and list. I didn't think a repex was needed because a question was: why does spdep::EBest(counts, population, family = 'binomial') give the same results at GeoDa's, while EBest(.. binomial) is "binomial" while GeoDa calls that "Poisson-Gamma". GeoDa can't give use a repex (GUI) and think this is a question about terminology. The same results were achieved with the packages while naming the model differently - why? Yes ?spdep::EBest directed me to the literature I'm struggling to access. And Yes, I've been looking at the raw code and understand how the estmm is generated. I've been using the epitools::pois.exact() and spdep::EBest. I can compare the point estimates from pois.exact to those provided by EBest, but I'd like to graph side by side their credible / confidence intervals. Its this last part on the credible intervals I'm interested in. How to get credible intervals around estmm? This is my main question. ASDAR is a reference I'm using all the time. Thanks for that gem. DCluster::empbaysmooth also does not provide a credible interval, either. -Dexter http://dexterlocke.com/
On Fri, Apr 24, 2020 at 10:23 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:
On Fri, 24 Apr 2020, Dexter Locke wrote:
Dear esteemed list, I'm using spdep::EBest with family = 'binomial' for counts of events
within
polygons that have an 'at risk' population. The resultant "estmm" is 'shrunk' compared to the raw rate (both given by EBest and calculated "by hand" rate. All good there. Using GeoDa version 1.14.0 24 August 2019 produces identical results for its Empirical Bayesian rate. This was confirmed by plotting the EBest output against GeoDa's rate and finding a perfect correlation along the 1 to 1 line. All good there.
Please provide a reproducible example, as this may help with answers.
Two questions: 1. How can credible intervals around these smoothed rate estimates be calculated? 2. The spdep documentation calls this a binomial family, but the
identical
results are obtained from GeoDa calls this "Poisson-Gamma" model here: https://geodacenter.github.io/workbook/3b_rates/lab3b.html#fnref11 , so what is actually being calculated? This question may help me answer the first question..
No, the default family is "poisson", with "binomial" available for non-rare conditions following Martuzzi, implemented by Olaf Berke, ?spdep::EBest. The code in spdep is easily accessible, so can be read directly. Please also compare with code for the EB Moran test, and with analogous code in the DCluster package, empbaysmooth(). Cf. ASDAR 2nd ed., ch. 10, section 10.2, pp. 322-328. The epitools::pois.exact() function is used for CIs. For code and data see https://asdar-book.org/bundles2ed/dismap_bundle.zip.
Possibly the answers are addressed in the literature cited which I cannot access right now at home without institutional library access.
Most institutions do have proxy or VPN access, but the code will be as useful. In PySAL, the code would also guide you, but even though GeoDa is open source, the C++ is fairly dense. Hope this helps, Roger
Thanks for your consideration, Dexter http://dexterlocke.com/ [[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no https://orcid.org/0000-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
On Fri, 24 Apr 2020, Dexter Locke wrote:
Thanks Roger and list. I didn't think a repex was needed because a question was: why does spdep::EBest(counts, population, family = 'binomial') give the same results at GeoDa's, while EBest(.. binomial) is "binomial" while GeoDa calls that "Poisson-Gamma". GeoDa can't give use a repex (GUI) and think this is a question about terminology. The same results were achieved with the packages while naming the model differently - why?
Reproducible example:
auckland <- st_read(system.file("shapes/auckland.shp",
package="spData")[1], quiet=TRUE)
res <- EBest(auckland$M77_85, 9*auckland$Und5_81)
res0 <- EBest(auckland$M77_85, 9*auckland$Und5_81, family="binomial")
summary(res$estmm)
Min. 1st Qu. Median Mean 3rd Qu. Max. 0.001487 0.002237 0.002549 0.002648 0.002968 0.004531
summary(res0$estmm)
Min. 1st Qu. Median Mean 3rd Qu. Max. 0.001484 0.002235 0.002549 0.002648 0.002968 0.004536 After calculating Und5_81 * 9 as a new column, running GeoDa -> Map -> Rates-Caluculated Map -> Empirical Bayes and exporting a shapefile, in R I see:
summary(auck$R_EBS)
Min. 1st Qu. Median Mean 3rd Qu. Max. 0.001487 0.002237 0.002549 0.002648 0.002968 0.004531 which is the same as the Poisson, and:
all.equal(res$estmm, auck$R_EBS)
[1] TRUE GeoDa is providing the same EB Poisson as EBest, isn't it? If yours differ, are both implementations seeing the same data?
Yes ?spdep::EBest directed me to the literature I'm struggling to access. And Yes, I've been looking at the raw code and understand how the estmm is generated. I've been using the epitools::pois.exact() and spdep::EBest. I can compare the point estimates from pois.exact to those provided by EBest, but I'd like to graph side by side their credible / confidence intervals. Its this last part on the credible intervals I'm interested in. How to get credible intervals around estmm? This is my main question.
EBest() was written to implement Bailey & Gatrell's textbook, which did not provide CI, and just used the Marshall Auckland dataset. If you'd like to implement them, I'd welcome a contribution.
ASDAR is a reference I'm using all the time. Thanks for that gem. DCluster::empbaysmooth also does not provide a credible interval, either.
As you see from ch. 10, CI are described for the epitools implementation. My feeling is that the literature moved away from simple EB rates towards IID RE and spatially structured RE, with relevant covariates, say like the classic Scottish Lip Cancer dataset, and currently CARBayes is a solid package among many others. Simply using base population becomes too unsatisfactory. PHE uses funnel plots which do have CI of a kind, to draw attention from small base populations: https://nhsrcommunity.com/blog/introduction-to-funnel-plots/ which can be used to adjust class intervals for mapping. Roger
-Dexter http://dexterlocke.com/ On Fri, Apr 24, 2020 at 10:23 AM Roger Bivand <Roger.Bivand at nhh.no> wrote:
On Fri, 24 Apr 2020, Dexter Locke wrote:
Dear esteemed list, I'm using spdep::EBest with family = 'binomial' for counts of events
within
polygons that have an 'at risk' population. The resultant "estmm" is 'shrunk' compared to the raw rate (both given by EBest and calculated "by hand" rate. All good there. Using GeoDa version 1.14.0 24 August 2019 produces identical results for its Empirical Bayesian rate. This was confirmed by plotting the EBest output against GeoDa's rate and finding a perfect correlation along the 1 to 1 line. All good there.
Please provide a reproducible example, as this may help with answers.
Two questions: 1. How can credible intervals around these smoothed rate estimates be calculated? 2. The spdep documentation calls this a binomial family, but the
identical
results are obtained from GeoDa calls this "Poisson-Gamma" model here: https://geodacenter.github.io/workbook/3b_rates/lab3b.html#fnref11 , so what is actually being calculated? This question may help me answer the first question..
No, the default family is "poisson", with "binomial" available for non-rare conditions following Martuzzi, implemented by Olaf Berke, ?spdep::EBest. The code in spdep is easily accessible, so can be read directly. Please also compare with code for the EB Moran test, and with analogous code in the DCluster package, empbaysmooth(). Cf. ASDAR 2nd ed., ch. 10, section 10.2, pp. 322-328. The epitools::pois.exact() function is used for CIs. For code and data see https://asdar-book.org/bundles2ed/dismap_bundle.zip.
Possibly the answers are addressed in the literature cited which I cannot access right now at home without institutional library access.
Most institutions do have proxy or VPN access, but the code will be as useful. In PySAL, the code would also guide you, but even though GeoDa is open source, the C++ is fairly dense. Hope this helps, Roger
Thanks for your consideration, Dexter http://dexterlocke.com/ [[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no https://orcid.org/0000-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no https://orcid.org/0000-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
Wonderful thank you. GeoDA and the spdep::EBest are providing the same estimates for my datasets. Thanks for the reprex example with the GUI via ">". That is a helpful convention I'll use in the future.. Ok great, there is not yet a CI implementation. If I can manage one I'd be delighted to share and contribute it. Julie Silge: https://juliasilge.com/blog/bayesian-blues/ and David Robinson http://varianceexplained.org/r/credible_intervals_baseball/ have some suggestions that are similar to each other's (no surprise there, they work together a lot). Thanks for the suggestions about alternative model specifications and including covariates. This has been very helpful, I appreciate this list immensely, Dexter
On Fri, Apr 24, 2020 at 3:05 PM Roger Bivand <Roger.Bivand at nhh.no> wrote:
On Fri, 24 Apr 2020, Dexter Locke wrote:
Thanks Roger and list. I didn't think a repex was needed because a question was: why does spdep::EBest(counts, population, family = 'binomial') give the same results at GeoDa's, while EBest(.. binomial) is "binomial" while GeoDa calls that "Poisson-Gamma". GeoDa can't give use a repex (GUI) and think this is a question about terminology. The same results were achieved with the packages while naming the model differently - why?
Reproducible example:
auckland <- st_read(system.file("shapes/auckland.shp",
package="spData")[1], quiet=TRUE)
res <- EBest(auckland$M77_85, 9*auckland$Und5_81)
res0 <- EBest(auckland$M77_85, 9*auckland$Und5_81, family="binomial")
summary(res$estmm)
Min. 1st Qu. Median Mean 3rd Qu. Max. 0.001487 0.002237 0.002549 0.002648 0.002968 0.004531
summary(res0$estmm)
Min. 1st Qu. Median Mean 3rd Qu. Max. 0.001484 0.002235 0.002549 0.002648 0.002968 0.004536 After calculating Und5_81 * 9 as a new column, running GeoDa -> Map -> Rates-Caluculated Map -> Empirical Bayes and exporting a shapefile, in R I see:
summary(auck$R_EBS)
Min. 1st Qu. Median Mean 3rd Qu. Max. 0.001487 0.002237 0.002549 0.002648 0.002968 0.004531 which is the same as the Poisson, and:
all.equal(res$estmm, auck$R_EBS)
[1] TRUE GeoDa is providing the same EB Poisson as EBest, isn't it? If yours differ, are both implementations seeing the same data?
Yes ?spdep::EBest directed me to the literature I'm struggling to access. And Yes, I've been looking at the raw code and understand how the estmm
is
generated. I've been using the epitools::pois.exact() and spdep::EBest. I can
compare
the point estimates from pois.exact to those provided by EBest, but I'd like to graph side by side their credible / confidence intervals. Its this last part on the credible intervals I'm interested in. How to
get
credible intervals around estmm? This is my main question.
EBest() was written to implement Bailey & Gatrell's textbook, which did not provide CI, and just used the Marshall Auckland dataset. If you'd like to implement them, I'd welcome a contribution.
ASDAR is a reference I'm using all the time. Thanks for that gem. DCluster::empbaysmooth also does not provide a credible interval, either.
As you see from ch. 10, CI are described for the epitools implementation. My feeling is that the literature moved away from simple EB rates towards IID RE and spatially structured RE, with relevant covariates, say like the classic Scottish Lip Cancer dataset, and currently CARBayes is a solid package among many others. Simply using base population becomes too unsatisfactory. PHE uses funnel plots which do have CI of a kind, to draw attention from small base populations: https://nhsrcommunity.com/blog/introduction-to-funnel-plots/ which can be used to adjust class intervals for mapping. Roger
-Dexter http://dexterlocke.com/ On Fri, Apr 24, 2020 at 10:23 AM Roger Bivand <Roger.Bivand at nhh.no>
wrote:
On Fri, 24 Apr 2020, Dexter Locke wrote:
Dear esteemed list, I'm using spdep::EBest with family = 'binomial' for counts of events
within
polygons that have an 'at risk' population. The resultant "estmm" is 'shrunk' compared to the raw rate (both given by EBest and calculated
"by
hand" rate. All good there. Using GeoDa version 1.14.0 24 August 2019 produces identical results
for
its Empirical Bayesian rate. This was confirmed by plotting the EBest output against GeoDa's rate and finding a perfect correlation along
the 1
to 1 line. All good there.
Please provide a reproducible example, as this may help with answers.
Two questions: 1. How can credible intervals around these smoothed rate estimates be calculated? 2. The spdep documentation calls this a binomial family, but the
identical
results are obtained from GeoDa calls this "Poisson-Gamma" model here: https://geodacenter.github.io/workbook/3b_rates/lab3b.html#fnref11 ,
so
what is actually being calculated? This question may help me answer the first question..
No, the default family is "poisson", with "binomial" available for non-rare conditions following Martuzzi, implemented by Olaf Berke, ?spdep::EBest. The code in spdep is easily accessible, so can be read directly. Please also compare with code for the EB Moran test, and with analogous code in the DCluster package, empbaysmooth(). Cf. ASDAR 2nd ed., ch. 10, section 10.2, pp. 322-328. The epitools::pois.exact() function is used for CIs. For code and data see
Possibly the answers are addressed in the literature cited which I
cannot
access right now at home without institutional library access.
Most institutions do have proxy or VPN access, but the code will be as useful. In PySAL, the code would also guide you, but even though GeoDa
is
open source, the C++ is fairly dense. Hope this helps, Roger
Thanks for your consideration, Dexter http://dexterlocke.com/ [[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no https://orcid.org/0000-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
-- Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no https://orcid.org/0000-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en