Skip to content
Prev 10011 / 29559 Next

Forming Spatial Weights Matrix with Large Datasets (>150, 000 observations)

On Tue, 16 Nov 2010, John GIBSON wrote:

            
If you ran the steps separately, you would see where the bottleneck is. My 
guess is that method="eigen" in the first spatial regression is the 
underlying problem, and that replacing this with sparse matrix techniques 
will resolve the problem: method="Matrix". See also comments inline in 
code below, as you seem to create dense matrices when sparse 
representations are all you need.

Roger
What does:

print(W_34km_nb)

report as the average number of neighbours? Is it too large for your 
hypothesised processes?
No, unnecessary, created on the fly, and is dense.
You may save repeated calls to nb2listw() by doing it once and storing the 
listw object for use:

lw <- nb2listw(W_34km_nb,style = "W")
fa00.moran <- moran.test(rawd$fa00, lw)

and so on ...
No, use lm.morantest() passing the lm object:

uhat.moran <- moran.test(rawd.lm, lw)
Why create the dense matrix?

W_34km_dsls <- nb2listw(W_34km_nb, glist=W_34km_ivds, style="W", ...
As stated above, don't use method="eigen" for larger N, use 
method="Matrix", or another exact sparse method, or an approximation, such 
as Monte Carlo or Chebyshev. Running N=25000 on a 1GB laptop isn't a 
problem, with sparse matrix techniques or approximations, everything 
becomes easier.

Note also that you need to use impacts() on the output of lagsarlm() 
because the coefficients should not be interpreted like OLS coefficients - 
see the references in ?impacts.