Local Moran p-values in Geoda, Python, and R
On Wed, 11 Mar 2015, William May wrote:
I sent this to Roger Bivand earlier. Posting it here for posterity, and in case other people have more to add: --------------------------------------------------------------------------- Hi Roger, I'm taking a spatial econometrics class, and we've been making some LISA maps. We noticed that the p-values from the spdep package seem to be a lot different than the results from Geoda or from Python's PySAL package. I looked through the function definitions and noticed that Python, and I think Geoda, use simulations to estimate the p-value, while your localmoran function uses an equation. I attached the graphs from Geoda and from R, and my R code and the original dataset. [R code is added below.] Is this kind of difference normal?
If you would put the figures on a website, we could compare them. One possible reason for differences is the alternative= argument to localmoran(). It's default is "greater", but you could try "two.sided" to see whether this gives values closer to GeoDa. You don't say, by the way, which version of GeoDa you are using (since you mention Python, probably not legacy GeoDa), nor whether you can get the results out in tabular form. Are you using the same spatial weights - both can read and write GAL files. Are you aware that knearneigh() takes a longlat= argument, which must be used if the coordinates are not projected. The p-values in localmoran() are analytical ones from: Anselin, L. 1995. Local indicators of spatial association, Geographical Analysis, 27, 93--115. As you mention, in GeoDa they are by leave-one-out permutation bootstrapping. Roger
Is there a reason to prefer spdep's results over the Geoda results (or
vice versa)?
Thanks for any help,
Will
---------------------------------------------------------------------------
Here's his response:
---------------------------------------------------------------------------
Please *do* write to the R-sig-geo list rather than to me directly -
others can answer your question as well, perhaps better, and in a more
timely way than I can. In addition, threads in the list can be
searched in the archives, so others can avoid the same problem later.
The only reliable test is localmoran.exact(), followed by
localmoran.sad(), for reasons given in the references in their help
pages.? If you test more than one observation, remember to adjust the
p-value for multiple comparisons. Sampling _i (all but i)
overrepresents non-neighbours of i for a test on i. The formulae used
are from the original paper and most often are similar to the
permutation bootstrap values.
Crucially, almost all use of local Moran's I (and local Geary and
Getis) fall foul of the autocorrelation being related to a
mis-specified mean model (omitted explanatory variables and/or omitted
adjustment for global autocorrelation). Look at the references to
localmoran.sad and localmoran.exact for more details, and at the
relevant parts of Waller & Gotway (2004), Schabenberger & Gotway
(2005), and in Bivand et al. (2008, 2013, ASDAR-book, ch. 9).
LISA maps are extremely misleading when the mean model is
mis-specified, and when the p-values are not adjusted for multiple
comparisons. They look simple, but because adjustment for multiple
comparisons is a subjective choice, you can almost choose the map you
want.
Roger
---------------------------------------------------------------------------
And here's my R code:
---------------------------------------------------------------------------
# make a LISA cluster map
library(maptools)
library(spdep)
setwd('/home/will/classes/polisci_610/1/')
# get the data
nat <- readShapeSpatial("NAT.SHP", ID="FIPSNO")
nat.data<-data.frame(nat)
attach(nat.data)
# nearest neighbors (nnb)
coords<-coordinates(nat)
IDs<-row.names(as(nat, "data.frame"))
nat_10nnb<-knn2nb(knearneigh(coords, k=10), row.names=IDs)
# "W" identifies row standardized weights
nat_10nnb_w <- nb2listw(nat_10nnb, style="W")
# can preset list_w object; if do this, remember to change if needed
list_w <- nat_10nnb_w
# LISA values
lisa <- localmoran(HR60, list_w, zero.policy = T)
# centers the variable of interest around its mean
cDV <- HR60 - mean(HR60)
# centers the local Moran's around the mean
mI <- lisa[, 1]
C_mI <- mI - mean(mI) # but we don't want to center it! Only the sign
# matters.
quadrant <- vector(mode="numeric",length=nrow(lisa))
quadrant[cDV>0 & mI>0] <- 1
quadrant[cDV <0 & mI>0] <- 2?????
quadrant[cDV>0 & mI<0] <- 3
quadrant[cDV <0 & mI<0] <- 4
# set a statistical significance level for the local Moran's
signif <- 0.05
# places non-significant Moran's in the category "5"
quadrant[lisa[, 5]> signif] <- 5
# map
png(file="Lisa_map_R.png", width = 680, res = 100)
colors <- c("red", "blue", "lightpink", "skyblue2", rgb(.95, .95, .95))
par(mar=c(0,0,1,0)) # sets margin parameters for plot space
plot(nat, border="grey", col=colors[quadrant],
??? ? main = "LISA Cluster Map, 1960 Homicides")
legend("bottomright",legend=c("high-high","low-low","high-low","low-high"),
?????? fill=colors,bty="n",cex=0.7,y.intersp=1,x.intersp=1)
dev.off()
---------------------------------------------------------------------------
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 91 00 e-mail: Roger.Bivand at nhh.no