Different Moran's I correlograms using correlog{ncf} and sp.correlogram{spdep}
On Fri, 30 Jan 2009, Xingli Giam wrote:
Dear Roger,
So what you trying to say about correlog{ncf} is that after the bands
are computed, all the correlations between data points whose distance
are within the bin are computed. So I suppose it is a binary kind of
specification, if pairwise distances are in the bin, the correlations
will be computed and all these correlations are considered equally in
computing Moran's I.
So should the correlogram obtained by correlog{ncf} same as the one
obtained by sp.correlogram(spdep), when the neighbourhood upperlimit is
same as the increment level of the former.
No, because correlog{ncf} bins the distances, but sp.correlogram{spdep}
adds higher order lags - here the second lag of i (with first order
neighbours j) is the set of all points k that are neighbours of points j
and neither i nor in j. It depends on graph edge counts, not distance as
such. They aren't comparable, really. If you want to do it by hand, check
the distances reported by correlog() (they may not be what you think),
take the bin boundaries, plug them into multiple calls to dnearneigh(),
and go from there.
Roger
E.g.,
#Correlogram from correlog{ncf}, where slm1600 is the chosen spatial model
correlog.200b <- correlog(dat$x_long, dat$y_lat, residuals(mod1), na.rm=T,
increment=200, resamp=99, latlon=T)
# Plotting the first 50 bins
plot(correlog.200b$correlation[1:50], type="b", pch=16, cex=1.5, lwd=1.5,
col="red", xlab="distance (10^2 km)", ylab="Moran's I", cex.lab=2,
cex.axis=1.5, ylim=c(-1,1)); abline(h=0)
#should be the same as
nb200 <- dnearneigh(as.matrix(dat$x_long, dat$y_lat), 0, 200, longlat=T)
ME200.listw <- nb2listw(nb200, style="B", zero.policy=T)
correl<-sp.correlogram(nb200, residuals(mod1), order = 50, method = "I", style
= "W", randomisation = TRUE, zero.policy = TRUE, spChk=NULL)
plot(correl1600),
But the results of Moran's I are very different.
Here are the results:
For sp.correlogram(),
Spatial correlogram for residuals(mod1)
method: Moran's I
estimate expectation variance standard deviate Pr(I) two sided
1 0.06882699 -0.00123153 0.00021800 4.7449 2.086e-06 ***
2 0.03190221 -0.00127877 0.00027178 2.0127 0.0441461 *
3 0.02914748 -0.00133511 0.00028490 1.8059 0.0709262 .
4 0.04458328 -0.00137363 0.00029717 2.6659 0.0076772 **
5 0.05503366 -0.00139665 0.00031539 3.1775 0.0014854 **
6 0.02696152 -0.00143062 0.00032685 1.5704 0.1163121
7 0.06049141 -0.00144718 0.00036181 3.2563 0.0011289 **
8 0.02526273 -0.00146413 0.00037732 1.3759 0.1688468
9 0.02901564 -0.00147493 0.00039208 1.5399 0.1235966
10 0.06624244 -0.00148810 0.00039586 3.4042 0.0006637 ***
11 0.04189714 -0.00151057 0.00041899 2.1206 0.0339535 *
12 -0.00321915 -0.00154560 0.00045316 -0.0786 0.9373374
.........and so on
For correlog():
$correlation (just first 4 bins)
0.381911078 0.356233456 0.322664149 0.304275005
How should we reconcile the differences?
Many thanks,
Xingli
Quoting Roger Bivand <Roger.Bivand at nhh.no>:
On Thu, 29 Jan 2009, Xingli Giam wrote:
Hello List, I have been bothered by this problem for the entire day, hope someone can give me a hand here. Thanks in advance for reading! It could be really simple and basic so hope you guys bear with it. We often plot Moran's I spatial correlograms that contrast glm residuals and spatial model residuals (after incorporating spatial autocorrelation (SAC) via SEVM, autocov reg, etc) over lag distance classes.
A spatial correlogram is not a well-defined concept. One view is that the inter-pair distances should be binned, and the output computed for the pairs of observations in each bin - equivalent to defining widening rings of neighbours for each observation. There are issues that arise here when some observations have no neighbours in a given bin - should one adjust the values used to calculate E(I) and var(I)? This is in correlog() in ncf, and to a certain extent in pgirmess. Another view is that the first order set of neighbours should be raised to higher orders irrespective of how the set was defined: i and j are first order neighbours, i and k are second order neighbours if j and k are first order neighbours. This isn't quite matrix powering, because if you are already a first order neighbour, you can't be a second order as well (discussed in detail in Cliff & Ord 1981, pp. 118-122). See also ?nblag to see what is going on. correlog() in pgirmess uses nclass.Sturges() to choose the number of bins, but does not fully protect against bins with no-neighbour observations.
After deciding the most appropriate neighbourhood distance (1600 km -
mine's a global dataset), I created a list of weights via nb2listw that
is an inverse function of distance (1/distance). I then performed SEVM
using the list of weights, using the brute force ME() algorithm to
extract eigenvectors that reduce SAC.
I plan to report the Moran's I value given from moran.test() on both my
(non- spatial) GLM and Spatial GLM to prove that the spatial model has
reduced autocorrelation.
In addition to that, I plan to show the correlograms of the model
residuals both non-spatial and spatial GLMs.
I would like to ask which function is a better choice for showing the
correlograms. Should I use correlog{ncf} or sp.correlogram{spdep}. From
what I know, calculating Moran's I require definition of a neighbourhood
and a specific weighting scheme. Does anyone know what is the default
neighbourhood and weighting scheme of correlog()? Is it - if I specify
an increment of 100km, then, all points within 0-100, and then 100-200,
etc, are considered neighbours with a binary weighting scheme for each
distance class?
The ncf correlog() uses many nxn matrices, and computes the bands by: dkl <- ceiling(dmat/increment) where dmat is a lower triangle distance matrix, increment is the band width, and dkl has the same form as dmat (it says which distance band the moran pairs in the lower triangle crossproduct matrix used belong to). As you can see, the source code is the documentation here.
For sp.correlogram, can I specify the nb object defined by min:0km, max:100km, and use a lag order of, for example 50, to plot the correlogram of my spatial model (which has a neighbourhood distance of 1600 km)? For both functions, it seems that there is no way to incorporate my inverse- distance weighting scheme that I used in my spatial models. Is this of practical concern if I intend to plot my correlograms? Lastly I have a side question regarding correlograms. When we see distances of, e.g., 100, 200, 300, etc (the increment or lag distance is 100), on the X-axis of a spatial correlogram, does the point at 200 refer to correlations between neighbours of distance 100-200km apart? Or does it include the first 100km as well (i.e., correlations between neighbours of distance 0-200km apart)?
In general, the pairs of distances within that band (or lag order). But check the code to be sure. Roger
Sorry for the long post. Many thanks for reading, I really appreciate it. I know some are really basic questions but I can't find any answers to my questions on google or searching through the list. Many thanks, Xingli
-- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no