Different Moran's I correlograms using correlog{ncf} and sp.correlogram{spdep}

Dear Roger,

So what you trying to say about correlog{ncf} is that after the bands 
are computed, all the correlations between data points whose distance 
are within the bin are computed. So I suppose it is a binary kind of 
specification, if pairwise distances are in the bin, the correlations 
will be computed and all these correlations are considered equally in 
computing Moran's I.

So should the correlogram obtained by correlog{ncf} same as the one 
obtained by sp.correlogram(spdep), when the neighbourhood upperlimit is 
same as the increment level of the former.
No, because correlog{ncf} bins the distances, but sp.correlogram{spdep} 
adds higher order lags - here the second lag of i (with first order 
neighbours j) is the set of all points k that are neighbours of points j 
and neither i nor in j. It depends on graph edge counts, not distance as 
such. They aren't comparable, really. If you want to do it by hand, check 
the distances reported by correlog() (they may not be what you think), 
take the bin boundaries, plug them into multiple calls to dnearneigh(), 
and go from there.

Roger
E.g.,

#Correlogram from correlog{ncf}, where slm1600 is the chosen spatial model

correlog.200b <- correlog(dat$x_long, dat$y_lat, residuals(mod1), na.rm=T,
increment=200, resamp=99, latlon=T)

# Plotting the first 50 bins
plot(correlog.200b$correlation[1:50], type="b", pch=16, cex=1.5, lwd=1.5,
col="red", xlab="distance (10^2 km)", ylab="Moran's I", cex.lab=2,
cex.axis=1.5, ylim=c(-1,1)); abline(h=0)

#should be the same as

nb200 <- dnearneigh(as.matrix(dat$x_long, dat$y_lat), 0, 200, longlat=T)
ME200.listw <- nb2listw(nb200, style="B", zero.policy=T)

correl<-sp.correlogram(nb200, residuals(mod1), order = 50, method = "I", style
= "W", randomisation = TRUE, zero.policy = TRUE, spChk=NULL)

plot(correl1600),

But the results of Moran's I are very different.

Here are the results:

For sp.correlogram(),
Spatial correlogram for residuals(mod1)
method: Moran's I
     estimate expectation    variance standard deviate Pr(I) two sided
1   0.06882699 -0.00123153  0.00021800           4.7449       2.086e-06 ***
2   0.03190221 -0.00127877  0.00027178           2.0127       0.0441461 *
3   0.02914748 -0.00133511  0.00028490           1.8059       0.0709262 .
4   0.04458328 -0.00137363  0.00029717           2.6659       0.0076772 **
5   0.05503366 -0.00139665  0.00031539           3.1775       0.0014854 **
6   0.02696152 -0.00143062  0.00032685           1.5704       0.1163121
7   0.06049141 -0.00144718  0.00036181           3.2563       0.0011289 **
8   0.02526273 -0.00146413  0.00037732           1.3759       0.1688468
9   0.02901564 -0.00147493  0.00039208           1.5399       0.1235966
10  0.06624244 -0.00148810  0.00039586           3.4042       0.0006637 ***
11  0.04189714 -0.00151057  0.00041899           2.1206       0.0339535 *
12 -0.00321915 -0.00154560  0.00045316          -0.0786       0.9373374
.........and so on

For correlog():
$correlation (just first 4 bins)
0.381911078  0.356233456  0.322664149  0.304275005

How should we reconcile the differences?

Many thanks,
Xingli

Quoting Roger Bivand <Roger.Bivand at nhh.no>:

On Thu, 29 Jan 2009, Xingli Giam wrote:

Hello List,

I have been bothered by this problem for the entire day, hope someone
can give me a hand here. Thanks in advance for reading! It could be
really simple and basic so hope you guys bear with it.

We often plot Moran's I spatial correlograms that contrast glm residuals
and spatial model residuals (after incorporating spatial autocorrelation
(SAC) via SEVM, autocov reg, etc) over lag distance classes.
A spatial correlogram is not a well-defined concept. One view is that the
inter-pair distances should be binned, and the output computed for the
pairs of observations in each bin - equivalent to defining widening rings
of neighbours for each observation. There are issues that arise here when
some observations have no neighbours in a given bin - should one adjust
the values used to calculate E(I) and var(I)? This is in correlog() in
ncf, and to a certain extent in pgirmess.

Another view is that the first order set of neighbours should be raised to
higher orders irrespective of how the set was defined: i and j are first
order neighbours, i and k are second order neighbours if j and k are first
order neighbours. This isn't quite matrix powering, because if you are
already a first order neighbour, you can't be a second order as well
(discussed in detail in Cliff & Ord 1981, pp. 118-122). See also ?nblag to
see what is going on. correlog() in pgirmess uses nclass.Sturges() to
choose the number of bins, but does not fully protect against bins with
no-neighbour observations.

After deciding the most appropriate neighbourhood distance (1600 km -
mine's a global dataset), I created a list of weights via nb2listw that
is an inverse function of distance (1/distance). I then performed SEVM
using the list of weights, using the brute force ME() algorithm to
extract eigenvectors that reduce SAC.

I plan to report the Moran's I value given from moran.test() on both my
(non- spatial) GLM and Spatial GLM to prove that the spatial model has
reduced autocorrelation.

In addition to that, I plan to show the correlograms of the model
residuals both non-spatial and spatial GLMs.

I would like to ask which function is a better choice for showing the
correlograms. Should I use correlog{ncf} or sp.correlogram{spdep}. From
what I know, calculating Moran's I require definition of a neighbourhood
and a specific weighting scheme. Does anyone know what is the default
neighbourhood and weighting scheme of correlog()? Is it - if I specify
an increment of 100km, then, all points within 0-100, and then 100-200,
etc, are considered neighbours with a binary weighting scheme for each
distance class?
The ncf correlog() uses many nxn matrices, and computes the bands by:

dkl <- ceiling(dmat/increment)

where dmat is a lower triangle distance matrix, increment is the band
width, and dkl has the same form as dmat (it says which distance band the
moran pairs in the lower triangle crossproduct matrix used belong to).

As you can see, the source code is the documentation here.

For sp.correlogram, can I specify the nb object defined by min:0km,
max:100km, and use a lag order of, for example 50, to plot the
correlogram of my spatial model (which has a neighbourhood distance of
1600 km)?

For both functions, it seems that there is no way to incorporate my
inverse- distance weighting scheme that I used in my spatial models. Is
this of practical concern if I intend to plot my correlograms?

Lastly I have a side question regarding correlograms. When we see
distances of, e.g., 100, 200, 300, etc (the increment or lag distance is
100), on the X-axis of a spatial correlogram, does the point at 200
refer to correlations between neighbours of distance 100-200km apart? Or
does it include the first 100km as well (i.e., correlations between
neighbours of distance 0-200km apart)?
In general, the pairs of distances within that band (or lag order). But
check the code to be sure.

Roger

Sorry for the long post. Many thanks for reading, I really appreciate
it. I know some are really basic questions but I can't find any answers
to my questions on google or searching through the list.

Many thanks,
Xingli

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

Different Moran's I correlograms using correlog{ncf} and sp.correlogram{spdep}

Thread (4 messages)