Skip to content

Help with creating a weight matrix for point dataset of 78, 000 observations

2 messages · Susan Gorelick, Roger Bivand

#
On Sun, 16 Jun 2013, Susan Gorelick wrote:

            
Your data are point data. Did you look at a summary() of the object after 
reading it in? You would look at summary in Stata, so do the same in R.
Did you check what the CRS of the imported object was?
You did read ?poly2nb, didn't you? You have data with point support, so if 
you really want to treat them as polygons, you would go through a 
triangulation first - then tri2nb?
Please do understand how R packages work. Do visit the spdep page on CRAN 
- you'll see that RANN (a clever implementation of fast approximate 
nearest neighbours) is "Suggests:". If spdep had "Depends:" on RANN, it 
would have been installed automatically when you installed spdep; when 
packages are "Suggests:", they are optional extras. In your case with 
point support, RANN will be faster than using the fallback, so install it. 
This is just as in using additional, user-contributed code in Stata, the 
user chooses what to install.
library(fortunes)
fortune("Yoda")
No, but you should read up on and understand how R handles contributed 
packages. It may be that R Studio is getting in the way adding an extra 
layer of stuff between you and the console - it will work fine when you 
know what you are doing, but will not help you to learn, it is intended 
to support programming in R when the programmer knows what to do.
Generating neighbour objects for 78k points is not a problem:

library(spdep)
set.seed(1)
xy <- matrix(runif(78000*2), ncol=2)
system.time(nb6 <- knn2nb(knearneigh(xy, k=6)))
#   user  system elapsed
#  7.093   0.047   7.142

You may of course check this by creating k6 neighbours in GeoDa from the 
projected point shapefile, and reading the resulting GAL file into R.

However, this is an asymmetric neighbour list, so fitting ML models will 
be done accurately with method="LU", which will be slower than 
method="Matrix" using updating spare Cholesky log determinants for 
symmetric neighbours. You may also use method="MC" for Monte Carlo log 
determinant approximations. The default method="eigen" will have memory 
problems with your dense 78k by 78k matrix.

You should also be able to use the full range of GM estimators (as you 
would in Stata). Check by exporting the neighbours generated in R to Stata 
with:

lw <- nb2listw(nb, style="W")
# for row standardised weights
sn <- listw2sn(lw)
write.sn2gwt(sn, "nb_w.gwt")

spmat import W using nb_w.gwt, geoda replace
spreg gs2sls Y <your Xs>, id(ID) dlmat(W) elmat(W)

See for details:

http://rri.wvu.edu/wp-content/uploads/2012/11/Piras_BivandWP2013.pdf

Hope this clarifies,

Roger