Skip to content
Prev 17997 / 29559 Next

Memory problems with dnearneigh in spdep -- related data questions

Thanks again, Roger. I will work on the rasters and learning the focal method for this. 
In the interest of learning for beyond this particular analysis, I'm a bit confused about what I'm seeing in the data summaries and how to understand it. I realize understanding one's data is extremely fundamental, sorry.

The data I was using was from a points file generated in ArcGIS. I have another older Arc-generated points file of the same data that is not projected and proj4string NA, but everything else is the same
When I took an older raster of the data that I have and converted it to points in R (package: raster),
I get the following summary

Object of class SpatialPointsDataFrame
Coordinates:
??????? min????? max
x -17.52090 51.40407
y -34.82918 37.53746
Is projected: NA 
proj4string : [NA]
Number of points: 36820887
Data attributes:
???? Min.?? 1st Qu.??? Median????? Mean?? 3rd Qu.????? Max. 
???? 0.00????? 0.28????? 2.57???? 26.43???? 10.51 103100.00 

So it is the same, except it isn't projected and it doesn't have that entry, NA's. Where the ones that come from ArcGIS have a very high number of NA's (all but 1 million of the points - 35m out of 36m).
Again it's all from a 1k raster.

1) Any ideas why the difference in NA's and if this might be part of my problem? And in general, what would one check for in the data to avoid this problem (if it is a problem)?
2) In the projected data set, it clearly says units=m (inside the proj4string entry), so if I do a distance threshold can I assume it is calculating in meters? If it isn't projected, is the default then lat lon and kilometers? In general, if I'm looking at a summary of data, how do I know in what units it is going to be interpreting a distance threshold (or something similar for any other function)?

Some weeks ago, I successfully ran the G* with dnearneigh on a subset of the data - just Ethiopia (the points file was generated from ArcGIS). The distance threshold I used was 30 (honestly I was just trying to see if it would run, there was no theoretical reason for 30). The output made sense and was similar to what my colleague got in ArcGIS for Ethiopia.

Summary of data: 
Object of class SpatialPointsDataFrame
Coordinates:
??????????????? min???? max
coords.x1 -162966.9 1492490
coords.x2? 377203.2 1642482
Is projected: TRUE 
proj4string :
[+proj=utm +zone=37 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0]
Number of points: 1335221
Data attributes:
??? OBJECTID????????? POINTID??? GRID_CODE?????? 
?Min.?? :????? 1?? Min.?? :0?? Min.?? :??? 0.00? 
?1st Qu.: 333806?? 1st Qu.:0?? 1st Qu.:??? 5.32? 
?Median : 667611?? Median :0?? Median :?? 15.25? 
?Mean?? : 667611?? Mean?? :0?? Mean?? :?? 59.38? 
?3rd Qu.:1001416?? 3rd Qu.:0?? 3rd Qu.:?? 69.10? 
?Max.?? :1335221?? Max.?? :0?? Max.?? :72526.11? 

3) This data set doesn't have the NA's and it worked. Hmmm. But the units are in meters and I used a distance threshold of 30. For 1km data, shouldn't there have been no neighbors at 30m? Why did this work? or was it calculating kilometers instead?

Thanks for any insight,

Juliann



----- Original Message -----
From: Roger Bivand <Roger.Bivand at nhh.no>
To: Empty Empty <phytophthorasb at yahoo.com>
Cc: "r-sig-geo at r-project.org" <r-sig-geo at r-project.org>
Sent: Wednesday, April 10, 2013 12:10 AM
Subject: Re: [R-sig-Geo] Memory problems with dnearneigh in spdep
On Tue, 9 Apr 2013, Empty Empty wrote:

            
Given your object summary, you need to check where your data came from. The bounding box is for geographical coordinates, but the declared coordinate reference system is projected in units of metres. So your 9.333 is 9.333m, and no neighbours will be found. The inclusion of NAs in gridded data represented as points is unnecessary, and all points not on land should be dropped before analysis begins.

I do suggest moving to a raster representation, and using focal methods in the raster package to generate the separate components of equation 14.3, p. 263-264 in Gettis & Ord (1996). Using focal methods defines the moving window as a matrix, here 10x10, moved over the raster and circumventing the creation of a weights object - create a new raster layer with \sum{j} w_{ij}(d) x_j values. W_i^* will be a constant, as will \bar{x}^*, and possibly the other starred terms too.

Hope this helps,

Roger
-- Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no