eigenvectors increase spatial autocorrelation in ols regression

Dear Roger,

Why ncf::correlog() should not be used for regression residuals in this
case?

Could you clarify this to me, please?
I do not have my copy of Cliff & Ord (1973) to hand. The value of Moran's 
I is the same, but the inferences will be affected by changes in the 
expectation and variance (under Normality):

library(spdep)
columbus <- st_read(system.file("shapes/columbus.shp",
   package="spData")[1])
col.gal.nb <- read.gal(system.file("weights/columbus.gal",
   package="spData")[1])
OLS <- lm(CRIME ~ INC + HOVAL, data=columbus)
lm.morantest(OLS, listw=nb2listw(col.gal.nb))
moran.test(residuals(OLS), listw=nb2listw(col.gal.nb),
   randomisation=FALSE)

This is because standard regression assumptions affect how we view the 
regression residuals, details in Cliff & Ord. See for example the code in 
moran.test() for E(I) (var(I) is similar):

     EI <- (-1)/wc$n1

where wc$n1 is n - 1 from spweights.constants(). In the lm.morantest() 
case:

     XtXinv <- chol2inv(model$qr$qr[p1, p1, drop = FALSE])
     X <- model.matrix(terms(model), model.frame(model))
     if (length(nacoefs) > 0L)
         X <- X[, -nacoefs]
     if (!is.null(wts <- weights(model))) {
         X <- drop(t(sapply(1:length(wts), function(i) sqrt(wts[i]) *
             X[i, ])))
     }
     Z <- lag.listw(listw.U, X, zero.policy = zero.policy)
     C1 <- t(X) %*% Z
     trA <- (sum(diag(XtXinv %*% C1)))
     EI <- -((N * trA)/((N - p) * S0))

and S_0 is the sum of weights (N if row standardised), p is the number 
columns of X, and A = (X'X)^{-1} X'WX. For the distance band case, you 
also need to adjust N for the count of pairs of neighbours within that 
band; in moran.test() you see n <- as.double(length(which(cards > 0))) by 
default, where cards is a vector of neighbour counts; in lm.morantest() 
the equivalent by default is
  N <- as.double(length(which(card(listw$neighbours) > 0L)))

My presumption is that ncf::correlog() is for input variables, not 
regression residuals, as there is no argument to pass though the fitted 
model object from which to extract the model matrix or (X'X)^{-1}. 
Further, resampling regression residuals may be unsafe for similar 
reasons; the draws should not be from the residuals, I think.

Recall that the code is shown by typing the function name, and this best 
documents what is going on.

Again, refer to Cliff & Ord 1973 for the full development.

Roger
Thank you

Best,

Vinicius

Em qui., 23 de jul. de 2020 ?s 13:52, Roger Bivand <Roger.Bivand at nhh.no>
escreveu:

On Wed, 22 Jul 2020, Peter B. Pearman wrote:

Dear Roger and list members,

I have a ols regression and want to remove spatial autocorrelation (SAC)
from the residuals, in order to avoid its potential effects of SAC on
the hypothesis tests (and the reviewers/editor).  I have generated
spatial eigenvectors with SpatialFiltering(), and added the generated
vectors to the regression.  Surprisingly, SAC appears to become more
pronounced.  I also tried ME(), but many more vectors are produced and
SAC is also not removed. Isn't including the vectors from
SpatialFiltering() supposed to reduce SAC?

Can you please enlighten me as to what's going on, what I am doing
wrong, or what I should try?
You are using point support (your forests are at points not areas, so that
the observations are not contiguous). The advice to consider this as a
mixed model problem may be relevant. It is certainly the case that
ncf::correlog() should not be used for regression residuals.

Further, your data are in a band 1.5-4.3E, 43.0-43.4N, so using 0.7
degrees in dnearneigh() was a risky choice, and squashes the data. If you
coerce to an sf object and apply an appropriate CRS, you get many fewer
neighbours on average.

library(sf)
data <- st_as_sf(data, coords=c("LONG", "LAT"))
(nbnear4 <- dnearneigh(data, 0, 0.7))
st_crs(data) <- 4326
(nbnear4a <- dnearneigh(data, 0, 27))
# or project to a relevant planar UTM 31N spec.
data_utm31 <- st_transform(data, st_crs(32631))
(nbd <- dnearneigh(data_utm31, 0, 27))

and so on. SpatialFiltering() and ME() test against global residual
autocorrelation in finding candidate eigenvectors, and running
lm.morantest() on the fitted model shows that there is no global
autocorrelation as intended (because positive and negative local
autocorrelation cancels out). I think that the underlying problems are
mixing approaches that are not asking the same questions, and in too
little care in choosing the weights.

Hope this clarifies,

Roger

Thanks in advance for you time.

Peter

The data are here:

https://drive.google.com/file/d/1Z3FIGIAbYvXqGETn0EWLjTOY8MYpZi_S/view?usp=
sharing

The analysis goes like this:

library(spatialreg)
library(spdep)
library(tidyverse)
library(car)
library(ncf)

data <- read_csv("for_RAR.csv")
set.seed(12345)
x <- data$ses.mntd
y <- log(data$RAR)
ols_for_RAR <- lm(y ~ x)
## qqplot() and shapiro.test() show residuals are nicely distributed
# x is significant and R-square about 0.2, demonstrated here
car::Anova(ols_for_RAR,type="III")
summary(ols_for_RAR)

# the following appears to make an acceptable neighbor network
c1<-c(data$LONG)
c2<-c(data$LAT)
cbindForests<-cbind(c1,c2)
# a value of 0.7, below, is sufficient to join all the points.
# qualitatively the results aren't affected, as far as I see by setting
this
higher
# However, the number of eigenvectors generated by SpatialFiltering()
varies
a lot
nbnear4 <- dnearneigh(cbindForests, 0, 0.7)
plot(nbnear4, cbindForests, col = "red", pch = 20)

# SAC appears significant at short distances (<10km), which is what I
want
to remove
cor_for <- correlog(c1, c2, residuals(ols_for_RAR), increment = 1,
resamp =
1000, latlon=TRUE, na.rm = TRUE)
plot(cor_for$correlation[1:20],type="s")
# p-values
print(cor_for$p[which(cor_for$p < 0.05)])
# Moran's I values
cor_for$correlation[which(cor_for$p < 0.05)]

# Generate optimized spatial eigenvectors using SpatialFiltering() and
use
them
# Several vectors are generated depending on values in dnearneigh()
spfilt_mntd_RAR<- spatialreg::SpatialFiltering(y ~ x, nb=nbnear4,style =
"W", tol=0.0001, ExactEV = TRUE)
new_mod <- lm(y ~ x + fitted(spfilt_mntd_RAR))
car::Anova(new_mod, type="III")
summary(new_mod)

# Plot Moran's I at distances under 20km
cor_for_1c <- correlog(c1, c2, residuals(new_mod), increment = 1, resamp
=
1000, latlon=TRUE, na.rm = TRUE)
plot(cor_for_1c$correlation[1:20],type="s")

# Extract significant values of Moran's I
cor_for_1c$p[which(cor_for_1c$p < 0.05)]
cor_for_1c$correlation[which(cor_for_1c$p < 0.05)]

# The result is that Moran's I is significant at additional short
distances
--

_+_+_+_+_+_+_+_+_

Peter B. Pearman
Ikerbasque Research Professor

Laboratory for Computational Ecology and Evolution
Departamento de Biolog?a Vegetal y Ecolog?a
Facultad de Ciencias y Tecnolog?a
Ap. 644
Universidad del Pa?s Vasco/ Euskal Herriko Unibertsitatea

Barrio Sarriena s/n
48940 Leioa, Bizkaia

SPAIN

Tel.  +34 94 601 8030

Fax  +34 94 601 3500

www.ehu.eus/es/web/bgppermp

When you believe in things that you don?t understand

Then you suffer

                        -- Stevie Wonder

Download my public encryption key here: https://pgp.mit.edu/

and please use it when you send me e-mail.

--
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no
https://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en

eigenvectors increase spatial autocorrelation in ols regression

Thread (7 messages)