SPDEP Package, neighbours list for Moran's I on large grid dataset

Un texte encapsul? et encod? dans un jeu de caract?res inconnu a ?t? nettoy?...
Nom : non disponible
URL : <https://stat.ethz.ch/pipermail/r-help/attachments/20110419/a3438017/attachment.pl>
There is a recent thread on the R-sig-geo list that addresses your need,
using a focal function in the raster package. The thread is at:

https://stat.ethz.ch/pipermail/r-sig-geo/2011-April/011444.html

That list may be a better place to post on this topic.

Hope this helps,

Roger
Hello list members, i'd like to calculate the Moran I on a large dataset,
a
8640x3432 grid of values.

When i try to create the neighbours list with the cell2nb function, on
such
a scale, R works for several hours and seems to crash. I'm using the last
version (2.13), 64 bits, on a mac pro with 4go of memory.

I think that i'm doing it wrong :-)

I'd like to use a "moving window" weight matrix instead of a full scale
one,
but i was not lucky enough to find a solution in the docs.

Thanks for any help,

Laurent

-- 
Laurent J?gou
Cartographe et enseignant
UTM - D?pt. G?ographie
31058 TOULOUSE Cedex 9 - 05.61.50.43.89
http://www.univ-tlse2.fr/geoprdc

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

-----
Roger Bivand
Economic Geography Section
Department of Economics
Norwegian School of Economics and Business Administration
Helleveien 30
N-5045 Bergen, Norway

--
View this message in context: http://r.789695.n4.nabble.com/SPDEP-Package-neighbours-list-for-Moran-s-I-on-large-grid-dataset-tp3460184p3462767.html
Sent from the R help mailing list archive at Nabble.com.

Hello list members, i'd like to calculate the Moran I on a large dataset,
a 8640x3432 grid of values.

When i try to create the neighbours list with the cell2nb function, on
such a scale, R works for several hours and seems to crash. I'm using the last
version (2.13), 64 bits, on a mac pro with 4go of memory.

I think that i'm doing it wrong :-)

I'd like to use a "moving window" weight matrix instead of a full scale
one, but i was not lucky enough to find a solution in the docs.
I confronted this problem recently, and decided to roll my own.  For a large dataset, an exhaustive computation of Moran's I is very time-consuming, as you say; but an estimate of it, based on a subset of pairs chosen randomly, seemed (for my data, at least) to converge quite nicely.  Here's my code.  It assumes a square grid, so you'll need to adapt it for your non-square grid, but that should be trivial.  To determine the right number of pairs for a good estimate in my application, I called this function with various numbers of pairs, and plotted the estimate produced as a function of the number of pairs used, and observed good convergence by 200,000 pairs.  You will probably want to do this yourself, for your dataset.  200,000 pairs took just a second or two to run, on my machine, so this is much, much faster than an exhaustive calculation.  As a bonus, this code also gives you Geary's C.  Oh, and the code below assumes a wraparound lattice (i.e. a torus, i.e. periodic boundaries), which may not be what you want; just get rid of the d_wraparound stuff and the following pmax() call if you want non-wraparound boundaries, and that should work fine, I think.  Not sure what you mean by the moving window thing, so I leave that as an exercise for the reader.  :->

  I've never made an R package, and I'm not sure there's much demand for this code (is there?), so I'm presently unlikely to package it up.  However, if someone who owns a package related to this problem would like to adopt this code, generalize it to non-square lattices, add a flag to choose periodic or non-periodic boundaries, etc., feel free to do so; I hereby place it in the public domain.  I'd just like to hear about it if someone does this, and receive credit somewhere, that's all.  I'm not super-good in R, either, so if there's a better way to do some things, like the somewhat hacked random index generation (trying to avoid floating point issues when generating a random integer), please let me know.  I'm always interested in learning how to do things better.

  And of course this code is provided without warranty, may have bugs, etc.

  Enjoy!

Ben Haller
McGill University

correlation_stats <- function(p, n_pairs=200000)
{
	# Compute Moran's I and Geary's C for the landscape p.  This is tricky because the exact calculation
	# would be far too time-consuming, as it involves comparison of every point to every other point.
	# So I am trying to estimate it here with a small subset of all possible point comparisons.
	p_size <- NROW(p)		# 512
	p_length <- length(p)	# 262144
	mean_p <- mean(p)
	pv <- as.vector(p)

	# select points and look up their values; for speed, points are selected by their vector index
	p1ix <- floor(runif(n_pairs, min=0, max=p_length - 0.001))
	p1x <- (p1ix %/% p_size) + 1
	p1y <- (p1ix %% p_size) + 1
	p1ix <- p1ix + 1

	p2ix <- floor(runif(n_pairs, min=0, max=p_length - 0.001))
	p2x <- (p2ix %/% p_size) + 1
	p2y <- (p2ix %% p_size) + 1
	p2ix <- p2ix + 1

	v1 <- pv[p1ix] - mean_p
	v2 <- pv[p2ix] - mean_p

	# Calculate distances between point pairs, both directly and wrapping around
	# The average direct distance is much smaller than the average wraparound distance,
	# because points near the center vertically have a maximum direct distance near 256,
	# but a minimum wraparound distance near 256.  The Moran's I estimate from wraparound
	# distances is different, as a result.  Rupert recommends taking the shorter of the
	# two distances, whichever it is, because that keeps us within the meaningful scale
	# of our space, before it just starts repeating due to periodicity.
	d_direct <- 1 / sqrt(((p1x - p2x) ^ 2) + ((p1y - p2y) ^ 2))
	d_direct[is.infinite(d_direct)] <- 0

	d_wraparound <- 1 / sqrt(((p1x - p2x) ^ 2) + ((p_size - abs(p1y - p2y)) ^ 2))
	d_wraparound[is.infinite(d_wraparound)] <- 0

	d <- pmax(d_direct, d_wraparound)	# max because we want the min distance, and these are 1/distance

	# precalculate some shared terms
	sum_d <- sum(d)
	sum_v1_sq <- sum(v1 ^ 2)

	# Moran's I: -1 is perfect dispersion, 1 is perfect correlation, 0 is a random spatial pattern
	M_I <- (n_pairs / sum_d) * (sum(d * v1 * v2) / ((sum_v1_sq + sum(v2 ^ 2)) / 2))
	# (n_pairs / sum(d_direct)) * (sum(d_direct * v1 * v2) / ((sum(v1 ^ 2) + sum(v2 ^ 2)) / 2))
	# (n_pairs / sum(d_wraparound)) * (sum(d_wraparound * v1 * v2) / ((sum(v1 ^ 2) + sum(v2 ^ 2)) / 2))

	# Geary's C: 0 is positive spatial autocorrelation, 2 is negative spatial autocorrelation, 1 is a random spatial pattern
	G_C <- ((n_pairs - 1) * sum(d * ((v1 - v2) ^ 2))) / (2 * sum_d * sum_v1_sq)

	return(list(Moran_I=M_I, Geary_C=G_C))
}
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110422/6ca5d58c/attachment.pl>

Hello, thank you very much for this solution, i'll try to adapt it to my problem, i think i'll be able to.
Great!
By the way, i wrote about a "sliding-window" weight matrix, it's an expression used in satellite images processing, esp. convolution filters, meaning that the main matrix is not confronted to another one of the same dimensions, but to a much smaller one. The calculus is executed value by value (pixels on an image), centering the small matrix on each one, hence the name of sliding window.
Yeah, I'm familiar with the concept in general, but I don't see how you can really apply it to Moran's I calculations; the Moran's I value is, by definition, based on calculating the value for every point compared to every other point, and looking at only points within a smaller window risks getting a value that is not representative of the larger context.  But if it makes sense to you in your application, then more power to you.  :->

Ben Haller
McGill University