Skip to content

Creating very large spatial weight matrix

6 messages · Aleksandr Andreev, Michael Sumner, Roger Bivand

#
Hello list,

I have 120,000 geocoded observations, for which I'm trying to create a
distance-based spatial weighting matrix so that I can perform a Moran
test.

Each observation has Lat and Lon.

Unfortunately, when I run
dists <- as.matrix(dist(cbind(Lon, Lat)))
I get the message:
Error in vector("double", length) : vector size specified is too large

Now I realize that 120,000^2 / 2 is on the order of 6 GB. However, I
seem to be running into software limitations on the vector size before
I hit RAM limitations. Also, in principle, it should be possible
(though slow) to use hard disk space to store this matrix. Does anyone
have any ideas on how to do this in R?

Thanks,

------------------------
Aleksandr Andreev
Graduate Student - Department of Economics
University of North Carolina at Chapel Hill
#
In general you need at least twice the required memory, and it has to
be contiguous. Try with a fresh instance of R and try to create a
single vector of that size, that might show that you *could* do it.

Otherwise, check out the ff package, and see other options in the High
Performance Computing Task View on CRAN.

There may be other techniques you can use to solve the problem, but
those two things are my direct answers to your questions.

Cheers, Mike.

On Fri, Nov 19, 2010 at 10:28 AM, Aleksandr Andreev
<aleksandr.andreev at gmail.com> wrote:

  
    
#
And, please report your OS and version of R (64-bit presumably?).
On Fri, Nov 19, 2010 at 10:39 AM, Michael Sumner <mdsumner at gmail.com> wrote:

  
    
#
Yes, sorry, I'm running R 2.12.0 on Ubuntu 64-bit (kernel 2.6.32-25-generic)

Thanks for pointing out ff.


------------------------
Aleksandr Andreev
Graduate Student - Department of Economics
University of North Carolina at Chapel Hill
Mobile: +1 303 507 93 88
Skype: typiconman



2010/11/18 Michael Sumner <mdsumner at gmail.com>:
#
Sorry, I also realize that your vector is just too long - R is limited
to a 32-bit index, so it just cannot be done without special tricks in
separate vectors.

The ff package documentation mentions that 64-bit indexing may be
included in a future release, and there have been discussions on this
for base R. (perhaps for R 3.0)

Cheers, Mike
On Fri, Nov 19, 2010 at 10:40 AM, Michael Sumner <mdsumner at gmail.com> wrote:

  
    
#
On Thu, 18 Nov 2010, Aleksandr Andreev wrote:

            
The actual answer is to use the function needed for this operation:

library(spdep)
coords <- cbind(Lon, Lat)
dnb <- dnearneigh(coords, 0, dmax, longlat=TRUE)

where dmax is a small distance in km. Of course, if you really need all 
the distances, all bets are off, but this would be an unusually specified 
picture of the underlying spatial process. I suggest not worrying about 
ensuring that all observations have at least one neighbour - for such a 
global measure as Moran's I for N=120', dropping a few cannot matter much. 
Go with a tight dmax, and it should just work. If dmax is loose, and the 
average number of neighbours creeps up, the nb object (and the following 
listw object) will get denser, with possibly some observations with 
thousands of neighbours, so oversmoothing the process.

If this is continental rather than whole-world, consider projecting to the 
plane and using graph-based neighbours (?graph2nb).

Hope this helps,

Roger