Dear All,
I am using the "spatialreg" package to estimate a spatial autoregressive model for a large number of observations (~75.000) and would thus prefer to use stsls() over lagsarlm() to gain estimation speed. In a second step however, I want to use the model for out of sample prediction with predict.sarlm().
As predict.sarlm() is not designed for stsls() outputs I tried to modify the stsls() function outputs in a way that the prediction function would be able to handle them.
Now I have two question:
- Is it, from a theoretical point of view, legitimate to just use the IV estimates instead of the ML estimates for the predictors described in
Goulard et al. (2017)?
- Is the following modification of the original stsls() function in combination with the predict.sarlm() an acceptable approach to implement the prediction methods suggested by Goulard et al. (2017) with an IV estimator? (Unfortunately I could not come up with a way to test this myself)
Modifications + test predictions in R:
[modifications are marked with "## additional outputs start ##" and "## additional outputs end ##" ]
library("spdep")
library("spatialreg")
# modify stsls function outputs
stsls2 <- function(formula, data = list(), listw, zero.policy=NULL,
na.action=na.fail, robust=FALSE, HC=NULL, legacy=FALSE, W2X=TRUE) {
if (!inherits(listw, "listw"))
stop("No neighbourhood list")
if (is.null(zero.policy))
zero.policy <- get("zeroPolicy", envir = .spatialregOptions)
stopifnot(is.logical(zero.policy))
if (!inherits(formula, "formula")) formula <- as.formula(formula)
mt <- terms(formula, data = data)
mf <- lm(formula, data, na.action=na.action, method="model.frame")
na.act <- attr(mf, "na.action")
if (!is.null(na.act)) {
subset <- !(1:length(listw$neighbours) %in% na.act)
listw <- subset(listw, subset, zero.policy=zero.policy)
}
y <- model.extract(mf, "response")
if (any(is.na(y))) stop("NAs in dependent variable")
X <- model.matrix(mt, mf)
if (any(is.na(X))) stop("NAs in independent variable")
if (robust) {
if (is.null(HC)) HC <- "HC0"
if (!any(HC %in% c("HC0", "HC1")))
stop("HC must be one of HC0, HC1")
}
# modified to pass zero.policy Juan Tomas Sayago 100913
Wy <- lag.listw(listw, y, zero.policy=zero.policy)
dim(Wy) <- c(nrow(X),1)
colnames(Wy) <- c("Rho")
# WX <- lag.listw(W,X[,2:ncol(X)])
n <- NROW(X)
m <- NCOL(X)
xcolnames <- colnames(X)
K <- ifelse(xcolnames[1] == "(Intercept)", 2, 1)
if (m > 1) {
WX <- matrix(nrow=n, ncol=(m-(K-1)))
if(W2X) WWX <- matrix(nrow = n, ncol = ncol(WX) )
for (k in K:m) {
wx <- lag.listw(listw, X[,k], zero.policy=zero.policy)
if(W2X) wwx <- lag.listw(listw, wx, zero.policy = zero.policy)
if (any(is.na(wx)))
stop("NAs in lagged independent variable")
WX[,(k-(K-1))] <- wx
if(W2X) WWX[, (k - (K - 1))] <- wwx
}
if(W2X) inst <- cbind(WX, WWX)
else inst <- WX
}
if (K == 2 && listw$style != "W") {
# modified to meet other styles, email from Rein Halbersma
wx1 <- as.double(rep(1, n))
wx <- lag.listw(listw, wx1, zero.policy=zero.policy)
if(W2X) wwx <- lag.listw(listw, wx, zero.policy=zero.policy)
if (m > 1) {
inst <- cbind(wx, inst)
if(W2X) inst <- cbind(wwx, inst)
} else {
inst <- matrix(wx, nrow=n, ncol=1)
if(W2X) inst <- cbind(inst, wwx)
}
# colnames(inst) <- xcolnames
}
# if (listw$style == "W") colnames(WX) <- xcolnames[-1]
result <- tsls(y=y, yend=Wy, X=X, Zinst=inst, robust=robust, HC=HC,
legacy=legacy)
result$zero.policy <- zero.policy
result$robust <- robust
if (robust) result$HC <- HC
result$legacy <- legacy
result$listw_style <- listw$style
result$call <- match.call()
## additional outputs ##
result$rho <- result$coefficients[1]
result$coefficients <- result$coefficients[2:length(result$coefficients)]
result$type <- "lag"
result$X <- X
result$y <- y
result$s2<-result$sse / result$df
## additional outputs end ##
class(result) <- "stsls"
result
}
# update original function in spatialreg
tmpfun <- get("stsls", envir = asNamespace("spatialreg"))
environment(stsls2) <- environment(tmpfun)
attributes(stsls2) <- attributes(tmpfun)
assignInNamespace("stsls", stsls2, ns="spatialreg")
#-# test prediction with updated function
# data
data(oldcol)
# set seed
set.seed(123)
# draw observations to use as test data
outsamp <- sample(1:nrow(COL.OLD), nrow(COL.OLD)*.3, replace = FALSE)
# create train and test dataframes
S.sample <- COL.OLD[-outsamp, ] # train
O.sample <- COL.OLD[outsamp, ] # test
S.sample.nb <- subset.nb(COL.nb, !(1:length(COL.nb) %in% outsamp)) # train
# run regression
testreg <- spatialreg::stsls(CRIME ~ INC + HOVAL, data=S.sample,
nb2listw(S.sample.nb, style="W"))
# in and outsample prediction
S.PRED.TC <- spatialreg::predict.sarlm(object = testreg, listw = nb2listw(S.sample.nb, style="W"),
pred.type = "TC", power = FALSE)
O.PRED.TC <- spatialreg::predict.sarlm(object = testreg , newdata = O.sample,
listw = nb2listw(COL.nb, style="W"), pred.type = "TC", power = FALSE)
O.PRED.BP <- spatialreg::predict.sarlm(object = testreg , newdata = O.sample,
listw = nb2listw(COL.nb, style="W"), pred.type = "BP", power = FALSE)
I would be very glad to get some feedback on both the general idea and the implementation!
Best regards,
Selim Banabak
Using stsls() with predict.sarlm()
2 messages · Banabak, Selim, Roger Bivand
On Mon, 13 Jul 2020, Banabak, Selim wrote:
Dear All, I am using the "spatialreg" package to estimate a spatial autoregressive model for a large number of observations (~75.000) and would thus prefer to use stsls() over lagsarlm() to gain estimation speed. In a second step however, I want to use the model for out of sample prediction with predict.sarlm(). As predict.sarlm() is not designed for stsls() outputs I tried to modify the stsls() function outputs in a way that the prediction function would be able to handle them. Now I have two question: - Is it, from a theoretical point of view, legitimate to just use the IV estimates instead of the ML estimates for the predictors described in Goulard et al. (2017)?
You would need to go through the underlying equations. Just using the point estimates of the coefficients and their standard errors is one approach, but there are few studies beyond Goulard et al. Maybe look at Suesse and co-author https://doi.org/10.1016/j.csda.2017.11.004, https://doi.org/10.1080/00949655.2017.1286495 (also ML - much of the STSLS/GMM literature avoids looking at important problems). While estimating and fitting larger data sets is more time-consuming with ML, using LU or Cholesky decomposition is practical when the spatial weights are sparse (this applies to STSLS too). Predicting is also constrained when n is large, as inverting the nxn matrix may be needed. Check whether Stata knows how to predict from STSLS, and check usage of sphet::spreg rather than spatialreg::stsls. Consider contacting the maintainer of sphet if there is no response on this list, as it would make more sense to explore predict methods for more modern sphet implementations than legacy ones in spatialreg.
- Is the following modification of the original stsls() function in combination with the predict.sarlm() an acceptable approach to implement the prediction methods suggested by Goulard et al. (2017) with an IV estimator? (Unfortunately I could not come up with a way to test this myself)
You would need to do the matrix math separately - maybe contacting the author of the spatialreg implementations, Martin Gubri, would also make sense. Interesting topic! Roger
Modifications + test predictions in R:
[modifications are marked with "## additional outputs start ##" and "## additional outputs end ##" ]
library("spdep")
library("spatialreg")
# modify stsls function outputs
stsls2 <- function(formula, data = list(), listw, zero.policy=NULL,
na.action=na.fail, robust=FALSE, HC=NULL, legacy=FALSE, W2X=TRUE) {
if (!inherits(listw, "listw"))
stop("No neighbourhood list")
if (is.null(zero.policy))
zero.policy <- get("zeroPolicy", envir = .spatialregOptions)
stopifnot(is.logical(zero.policy))
if (!inherits(formula, "formula")) formula <- as.formula(formula)
mt <- terms(formula, data = data)
mf <- lm(formula, data, na.action=na.action, method="model.frame")
na.act <- attr(mf, "na.action")
if (!is.null(na.act)) {
subset <- !(1:length(listw$neighbours) %in% na.act)
listw <- subset(listw, subset, zero.policy=zero.policy)
}
y <- model.extract(mf, "response")
if (any(is.na(y))) stop("NAs in dependent variable")
X <- model.matrix(mt, mf)
if (any(is.na(X))) stop("NAs in independent variable")
if (robust) {
if (is.null(HC)) HC <- "HC0"
if (!any(HC %in% c("HC0", "HC1")))
stop("HC must be one of HC0, HC1")
}
# modified to pass zero.policy Juan Tomas Sayago 100913
Wy <- lag.listw(listw, y, zero.policy=zero.policy)
dim(Wy) <- c(nrow(X),1)
colnames(Wy) <- c("Rho")
# WX <- lag.listw(W,X[,2:ncol(X)])
n <- NROW(X)
m <- NCOL(X)
xcolnames <- colnames(X)
K <- ifelse(xcolnames[1] == "(Intercept)", 2, 1)
if (m > 1) {
WX <- matrix(nrow=n, ncol=(m-(K-1)))
if(W2X) WWX <- matrix(nrow = n, ncol = ncol(WX) )
for (k in K:m) {
wx <- lag.listw(listw, X[,k], zero.policy=zero.policy)
if(W2X) wwx <- lag.listw(listw, wx, zero.policy = zero.policy)
if (any(is.na(wx)))
stop("NAs in lagged independent variable")
WX[,(k-(K-1))] <- wx
if(W2X) WWX[, (k - (K - 1))] <- wwx
}
if(W2X) inst <- cbind(WX, WWX)
else inst <- WX
}
if (K == 2 && listw$style != "W") {
# modified to meet other styles, email from Rein Halbersma
wx1 <- as.double(rep(1, n))
wx <- lag.listw(listw, wx1, zero.policy=zero.policy)
if(W2X) wwx <- lag.listw(listw, wx, zero.policy=zero.policy)
if (m > 1) {
inst <- cbind(wx, inst)
if(W2X) inst <- cbind(wwx, inst)
} else {
inst <- matrix(wx, nrow=n, ncol=1)
if(W2X) inst <- cbind(inst, wwx)
}
# colnames(inst) <- xcolnames
}
# if (listw$style == "W") colnames(WX) <- xcolnames[-1]
result <- tsls(y=y, yend=Wy, X=X, Zinst=inst, robust=robust, HC=HC,
legacy=legacy)
result$zero.policy <- zero.policy
result$robust <- robust
if (robust) result$HC <- HC
result$legacy <- legacy
result$listw_style <- listw$style
result$call <- match.call()
## additional outputs ##
result$rho <- result$coefficients[1]
result$coefficients <- result$coefficients[2:length(result$coefficients)]
result$type <- "lag"
result$X <- X
result$y <- y
result$s2<-result$sse / result$df
## additional outputs end ##
class(result) <- "stsls"
result
}
# update original function in spatialreg
tmpfun <- get("stsls", envir = asNamespace("spatialreg"))
environment(stsls2) <- environment(tmpfun)
attributes(stsls2) <- attributes(tmpfun)
assignInNamespace("stsls", stsls2, ns="spatialreg")
#-# test prediction with updated function
# data
data(oldcol)
# set seed
set.seed(123)
# draw observations to use as test data
outsamp <- sample(1:nrow(COL.OLD), nrow(COL.OLD)*.3, replace = FALSE)
# create train and test dataframes
S.sample <- COL.OLD[-outsamp, ] # train
O.sample <- COL.OLD[outsamp, ] # test
S.sample.nb <- subset.nb(COL.nb, !(1:length(COL.nb) %in% outsamp)) # train
# run regression
testreg <- spatialreg::stsls(CRIME ~ INC + HOVAL, data=S.sample,
nb2listw(S.sample.nb, style="W"))
# in and outsample prediction
S.PRED.TC <- spatialreg::predict.sarlm(object = testreg, listw = nb2listw(S.sample.nb, style="W"),
pred.type = "TC", power = FALSE)
O.PRED.TC <- spatialreg::predict.sarlm(object = testreg , newdata = O.sample,
listw = nb2listw(COL.nb, style="W"), pred.type = "TC", power = FALSE)
O.PRED.BP <- spatialreg::predict.sarlm(object = testreg , newdata = O.sample,
listw = nb2listw(COL.nb, style="W"), pred.type = "BP", power = FALSE)
I would be very glad to get some feedback on both the general idea and the implementation!
Best regards,
Selim Banabak
[[alternative HTML version deleted]]
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Department of Economics, Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; e-mail: Roger.Bivand at nhh.no https://orcid.org/0000-0003-2392-6140 https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en