Sampling from a SAR error model
On Thu, 21 Sep 2006, Sam Field wrote:
Roger, Thanks for the code. Just now had time to get back to the problem. Sorry it took so long. you wrote,
to just take the null models in both cases, where lambda == rho by definition, but the intercept term differs (I'm not sure how to insert autocorrelation into the intercept in the lag model, which is probably a source of error in the simulation.
I am a little confused by this statement. I thought that the only difference between the lag and error model is that in the error model, spatial "spill overs" are restricted to the error term. I was thinking that lambda would always equal rho. Not just in the null model. I was also thinking that the spatial lag model would not be properly identified without at least on covariate, since absent any variation in the systematic component of the model (Xb), what would the spatial lag model mean? If you premultipy a vector of constants by (I-pW)^-1, you just get back another vector of constants.
For row-standardised, yes, but not necessarily in other styles of weighting, so it is a bit picky. The code in the functions tries to accommodate that, following reports from users not using row standardisation.
A couple of other observations. When the sample size in your example is increased, the lambda and rho estimates are quite a bit closer to their simluated values. I added a single covariate in your example, and the recovery of labmda and rho preformed about as well as the null models. Although your example didn't recover the simulated values exactly. They were close enough to make me think that there is something funny about my example. So this is where I now turn.
OK, that seems sensible. Please keep the list informed of where this takes you. Best wishes, Roger
thanks for your input! Sam Quoting Roger Bivand <Roger.Bivand at nhh.no>:
Sam,
Indeed perplexing. I've tried to slim things down to:
library(spdep)
nsamp <- 100
rho <- 0.4
nb7rt <- cell2nb(7, 7, torus=TRUE)
set.seed(20060918)
e <- matrix(rnorm(nsamp*length(nb7rt)), nrow=length(nb7rt))
listw <- nb2listw(nb7rt)
invmat <- invIrW(listw, rho=rho)
y <- invmat %*% e
lag_res0 <- lapply(1:nsamp, function(i)
lagsarlm(y[,i] ~ 1, listw=listw))
err_res0 <- lapply(1:nsamp, function(i)
errorsarlm(y[,i] ~ 1, listw=listw))
summary(do.call("rbind", lapply(lag_res0, coefficients)))
summary(do.call("rbind", lapply(err_res0, coefficients)))
to just take the null models in both cases, where lambda == rho
by
definition, but the intercept term differs (I'm not sure how to
insert
autocorrelation into the intercept in the lag model, which is
probably a
source of error in the simulation.
I also had a look at Haining 1990 and the use of the Cholesky
lower
triangle of the SAR (simultaneous autoregressive) covariance
matrix, and
tried:
W <- listw2mat(listw)
Id <- diag(length(nb7rt))
sarcov <- solve(t(Id - rho*W) %*% (Id - rho*W))
sarcovL <- chol((sarcov + t(sarcov))/2)
y <- t(sarcovL) %*% e # Kaluzny et al. 1996, p. 160
lag_res1 <- lapply(1:nsamp, function(i)
lagsarlm(y[,i] ~ 1, listw=listw))
err_res1 <- lapply(1:nsamp, function(i)
errorsarlm(y[,i] ~ 1, listw=listw))
summary(do.call("rbind", lapply(lag_res1, coefficients)))
summary(do.call("rbind", lapply(err_res1, coefficients)))
which is similar to, but not the same as just premultiplying by
(I - rho W)^{-1}, and where I'm very unsure how to simulate for
the lag
model correctly. The error model simulation is, however, perhaps
best done
this way, isn't it?
Running this shows that none of them retrieve the input
coefficient value
exactly. The weights here are about as well-behaved as possible,
all
observations have four (grid on a torus), and things perhaps get
worse
with less regular weights.
Simulating for the error model is discussed in the literature,
but the lag
model is something of an oddity, certainly worth sorting out.
Best wishes,
Roger
On Sat, 16 Sep 2006, Sam Field wrote:
Roger and Terry, Thanks to both of you for your reply. Yes, sigma^2 enters into
the
generation of e. The systematic component of the model, X_mat%*%parms, has a variance of around 10^2. e is typically around 4^2, or... e <- rnorm(n,mean=0,sd=4) I went ahead and copied the code below (it isn't too long) - in
case
others are interested in this exersise. The car package is
only
necessary for the n.bins function in the last lines of the
code. So
it is optional. I also tried the invIrM function suggested by Terry. It seems to produce the same results that I get. It
also
appears in the code below. Any help/suggestions would be
greatly
appreciated. I am stuck.
#creating data
library(spdep)
library(car)
#sample size
n <- 100
#coordinates
x_coord <- runif(n,0,10)
y_coord <- runif(n,0,10)
## w matrix and row normalized
w_raw <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)
for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
{w_raw[i,j]<- (sqrt((x_coord[i]-x_coord[j])^2 +
(y_coord[i]-y_coord[j])^2))}}
w_dummy <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)
#defines neighborhood distance
neighbor_dist <- 3
for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
{if(w_raw[i,j] > neighbor_dist) w_dummy[i,j] <- 0}}
for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
{if( i == j ) w_dummy[i,j] <- 0}}
row_sum <- rep(1,length(x_coord))
for(i in 1:length(x_coord))
{row_sum[i] <- sum(w_dummy[i,]) }
w <- matrix(nrow=length(x_coord),ncol=length(x_coord),1)
# divide by row sums
for(i in 1:length(x_coord)){for(j in 1:length(x_coord))
{w[i,j] <- w_dummy[i,j]/row_sum[i]}}
neighbors <- dnearneigh(cbind(x_coord,y_coord),0,neighbor_dist)
nb_list <- nb2listw(neighbors)
#create some independent variables
X1 <- rnorm(n,4,2)
X2 <- rnorm(n,2,2)
X3 <- rnorm(n,0,1)
X4 <- rnorm(n,13,3)
X5 <- rnorm(n,21,.5)
X_mat <- cbind(rep(1,n),X1,X2,X3,X4,X5)
# population parameters
parms <- c(12.4,2,-4,5,1,-.5)
rho <- .3
#begin iterations
n_iter <- 100 #draw 1500 samples of size n
estimates <- matrix(nrow=n_iter,ncol=7)
estimates2 <- matrix(nrow=n_iter,ncol=7)
estimates3 <- matrix(nrow=n_iter,ncol=7)
for(iter in 1:n_iter)
{
e <- rnorm(n,mean=0,sd=4)
# drawing samples
y_lag <- (solve(diag(100)- rho*w))%*%X_mat%*%parms +
solve(diag(100)-rho*w)%*%e
y_error <- X_mat%*%parms + (solve(diag(100)-rho*w))%*%e
y_error2 <- X_mat%*%parms + invIrM(neighbors,.3)%*%e
lag_model <- lagsarlm(y_lag ~ X1 + X2 + X3 + X4 + X5,
listw=nb_list,type='lag')
lag_model2 <- errorsarlm(y_error ~ X1 + X2 + X3 + X4 + X5,
listw=nb_list)
lag_model3 <- errorsarlm(y_error2 ~ X1 + X2 + X3 + X4 + X5,
listw=nb_list)
# retrieve parameter estimates
estimates[iter,] <- coefficients(lag_model)
estimates2[iter,] <- coefficients(lag_model2)
estimates3[iter,] <- coefficients(lag_model3)
}
mean(estimates[,7])
hist(estimates[,7],nclass=n.bins(estimates[,7]),probability=TRUE)
lines(density(estimates[,7]),lwd=2) mean(estimates2[,7])
hist(estimates2[,7],nclass=n.bins(estimates2[,7]),probability=TRUE)
lines(density(estimates2[,7]),lwd=2) mean(estimates3[,7])
hist(estimates3[,7],nclass=n.bins(estimates3[,7]),probability=TRUE)
lines(density(estimates3[,7]),lwd=2)
#data <- data.frame(cbind(y_lag, y_error,
X1,X2,X3,X4,X5,x_coord,y_coord))
#names(data) <- c("y_lag","y_error",names(data)[3:9])
#write.table(data,file="C:/test.txt")
Quoting Roger Bivand <Roger.Bivand at nhh.no>:
On Fri, 15 Sep 2006, Sam Field wrote:
List, I am having trouble writing a script that samples from an
SAR
error
process. I've done it successfully for a spatial lag model,
but
not
for the spatial error model. For the spatial lag model, I
have
the
following: y_lag <- (solve(diag(100)- p*w))%*%X_mat%*%parms + solve(diag(100)-p*w)%*%e where parms is a parameter vector X_mat is n by p matrix of independent variables (+
constant)
e is a vector of indepndent normal deviates (mean = 0) p is the autoregressive paramter and w is a square, n by n contiguity matrix (row
normalized).
This works beautifully. lagsarlm recovers parms and p
without
a
problem. Over repeated sampling, the estimated values are
centered
on the value for p in the simulation. Is there something wrong with the following for the spatial
error
model? y_error <- X_mat%*%parms + (solve(diag(100)-p*w))%*%e
Interesting question. Where is the sigma^2 going in your case
-
is it in the generation of e? Could you try to use the columbus
dataset
and set.seed to generate a reproducible example - it may be that
what
you have written does not communicate exactly what you have done? Roger
The distribution of values for p obtain from errorsarlm
over
repeated sampling are not centered around the value for the simulation, but are typically much lower and all over the
place. I
have only looked at values for p ranging from .3 to .7. any help would be greatly appreciated!
-- Roger Bivand Economic Geography Section, Department of Economics,
Norwegian
School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
-- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no