Hi I'm puzzled hot to deal with NAs and lakes on SpatialPolygonsDataFrame objects... My problems (maybe should I post in separated files?): -spplot does not seem to handle specially NA values... -neighbors graph seem to make links even to lakes/holes -moran.test seem to work with NA values... but what about the neighbors/weights used as input? Are the results biased? -localmoran does not seem to work with NA I illustrate each of them below. Example is based on Syracuse data from ASDAR (data from web-site http://www.asdar-book.org/bundles/lat1_bundle.zip): #NA PROBLEM setwd("H:/Documents/Stats/Book Bivand/Chap 9") library(sp) library(rgdal) NY8 <- readOGR(".", "NY8_utm18") library(spdep) Syracuse <- NY8[NY8$AREANAME == "Syracuse city",] Syracuse2<-Syracuse Syracuse2$POP8[43]<-NA spplot(Syracuse2, zcol="POP8") #it appears as white, similar to other colors which are nevertheless true values and not NA! LAKE problem: #check if lakes: sapply(sapply(slot(Syracuse, "polygons"),function(x) slot(x, "Polygons")), function(x) slot(x, "hole")) #btw, it is pretty complicated, are there some more user-friendly wrapers for that? kind of isHole, getHole? slot(slot(slot(Syracuse2, "polygons")[[43]],"Polygons")[[1]], "hole")<-TRUE plot(poly2nb(Syracuse), coordinates(Syracuse2)) Here it seems that it did not take into account the hole and still computes neighbors... right? And if yes... does it affect the results using moran tests? Thanks a lot! Matthieu Stigler
Dealing with lakes and NA in Moran/SpatialPolygonsDataFrame
5 messages · Matthieu Stigler, Roger Bivand
Hi I have a second problem with NA in SpatialPolygonsDataFrame, it seems that residuals.spautolm does not work with NA values, returning NULL. I add to the questions I asked yesterday about dealing with NAs and I'm very thankful if anyone could give me any advice! Thanks a lot!! Matthieu Stigler 2009/9/3 Matthieu Stigler <matthieu.stigler at gmail.com>:
Hi I'm puzzled hot to deal with NAs and lakes on SpatialPolygonsDataFrame objects... My problems (maybe should I post in separated files?): -spplot does not seem to handle specially NA values... -neighbors graph seem to make links even to lakes/holes -moran.test seem to work with NA values... but what about the neighbors/weights used as input? Are the results biased? -localmoran does not seem to work with NA I illustrate each of them below. Example is based on Syracuse data from ASDAR (data from web-site http://www.asdar-book.org/bundles/lat1_bundle.zip): #NA PROBLEM setwd("H:/Documents/Stats/Book Bivand/Chap 9") library(sp) library(rgdal) NY8 <- readOGR(".", "NY8_utm18") library(spdep) Syracuse <- NY8[NY8$AREANAME == "Syracuse city",] Syracuse2<-Syracuse Syracuse2$POP8[43]<-NA spplot(Syracuse2, zcol="POP8") #it appears as white, similar to other colors which are nevertheless true values and not NA! LAKE problem: #check if lakes: sapply(sapply(slot(Syracuse, "polygons"),function(x) slot(x, "Polygons")), function(x) slot(x, "hole")) #btw, it is pretty complicated, are there some more user-friendly wrapers for that? kind of isHole, getHole? slot(slot(slot(Syracuse2, "polygons")[[43]],"Polygons")[[1]], "hole")<-TRUE plot(poly2nb(Syracuse), coordinates(Syracuse2)) Here it seems that it did not take into account the hole and still computes neighbors... right? And if yes... does it affect the results using moran tests?
SPAUTOLM residuals problem:
NY8 <- readOGR(".", "NY8_utm18")
library(spdep)
Syracuse <- NY8[NY8$AREANAME == "Syracuse city",]
Syracuse2<-Syracuse
Syracuse2$POP8[43]<-NA
spplot(Syracuse2, zcol="POP8")
NY8$Z[43]<-NA
NY8$PEXPOSURE[43]<-NA
NY8$PCTAGE65P[43]<-NA
NY8$PCTOWNHOME[43]<-NA
NY_nb <- read.gal("NY_nb.gal", region.id=row.names(as(NY8, "data.frame")))
NYlistw<-nb2listw(NY_nb, style = "B")
nysar<-spautolm(Z~PEXPOSURE+PCTAGE65P+PCTOWNHOME , data=NY8, listw=NYlistw)
summary(nysar)
residuals(nysar)
Thanks a lot! Matthieu Stigler
1 day later
On Thu, 3 Sep 2009, Matthieu Stigler wrote:
Hi I'm puzzled hot to deal with NAs and lakes on SpatialPolygonsDataFrame objects... My problems (maybe should I post in separated files?): -spplot does not seem to handle specially NA values... -neighbors graph seem to make links even to lakes/holes -moran.test seem to work with NA values... but what about the neighbors/weights used as input? Are the results biased? -localmoran does not seem to work with NA I illustrate each of them below. Example is based on Syracuse data from ASDAR (data from web-site http://www.asdar-book.org/bundles/lat1_bundle.zip): #NA PROBLEM setwd("H:/Documents/Stats/Book Bivand/Chap 9") library(sp) library(rgdal) NY8 <- readOGR(".", "NY8_utm18") library(spdep) Syracuse <- NY8[NY8$AREANAME == "Syracuse city",] Syracuse2<-Syracuse Syracuse2$POP8[43]<-NA spplot(Syracuse2, zcol="POP8") #it appears as white, similar to other colors which are nevertheless true values and not NA!
NAs are coded "transparent", which look the same when the background is white. In the default divergent palette, white is used. If you do: library(lattice) trellis.device(bg="grey") spplot(Syracuse2, zcol="POP8") (wrong way to change background but OK to show the point) or trellis.par.set(sp.theme()) spplot(Syracuse2, zcol="POP8") where white is not in the palette.
LAKE problem: #check if lakes: sapply(sapply(slot(Syracuse, "polygons"),function(x) slot(x, "Polygons")), function(x) slot(x, "hole")) #btw, it is pretty complicated, are there some more user-friendly wrapers for that? kind of isHole, getHole?
Holes are easy to fall into, so no wrappers.
slot(slot(slot(Syracuse2, "polygons")[[43]],"Polygons")[[1]], "hole")<-TRUE plot(poly2nb(Syracuse), coordinates(Syracuse2))
poly2nb() uses all the geometries present. The prefered choice is to subset the geometries to keep only the ones the user requires, so: not_holes <- !sapply(sapply(slot(Syracuse2, "polygons"),function(x) slot(x, "Polygons")), function(x) slot(x, "hole")) nb <- poly2nb(Syracuse2[not_holes,]) plot(nb, coordinates(Syracuse2)[not_holes,]) I guess this clarifies things. In principle, a single Polygon object in a Polygons object will not be a hole anyway (it is an external ring), so your example is rather artificial. Subsetting the geometries with "[" is to be prefered. Hope this helps, Roger I guess the same applies to your followup - but spautolm() ought not to permit computation on missing data - I'll check this.
Here it seems that it did not take into account the hole and still computes neighbors... right? And if yes... does it affect the results using moran tests? Thanks a lot! Matthieu Stigler
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
1 day later
2009/9/5 Roger Bivand <Roger.Bivand at nhh.no>:
On Thu, 3 Sep 2009, Matthieu Stigler wrote:
Hi I'm puzzled hot to deal with NAs and lakes on SpatialPolygonsDataFrame objects... My problems (maybe should I post in separated files?): -spplot does not seem to handle specially NA values... -neighbors graph seem to make links even to lakes/holes -moran.test seem to work with NA values... but what about the neighbors/weights used as input? Are the results biased? -localmoran does not seem to work with NA I illustrate each of them below. Example is based on Syracuse data from ASDAR (data from web-site http://www.asdar-book.org/bundles/lat1_bundle.zip): #NA PROBLEM setwd("H:/Documents/Stats/Book Bivand/Chap 9") library(sp) library(rgdal) NY8 <- readOGR(".", "NY8_utm18") library(spdep) Syracuse <- NY8[NY8$AREANAME == "Syracuse city",] Syracuse2<-Syracuse Syracuse2$POP8[43]<-NA spplot(Syracuse2, zcol="POP8") #it appears as white, similar to other colors which are nevertheless true values and not NA!
Thanks a lot for your help! For plotting NAs... well changing bg is a little bit radical, as everything is then different... the problem came actually because I used heat.col() which looks really pretty but uses white... isn't it a function in treillis to change only NAs and not all the rest with? Thanks!
NAs are coded "transparent", which look the same when the background is white. In the default divergent palette, white is used. If you do: library(lattice) trellis.device(bg="grey") spplot(Syracuse2, zcol="POP8") (wrong way to change background but OK to show the point) or trellis.par.set(sp.theme()) spplot(Syracuse2, zcol="POP8") where white is not in the palette.
LAKE problem: #check if lakes: sapply(sapply(slot(Syracuse, "polygons"),function(x) slot(x, "Polygons")), function(x) slot(x, "hole")) #btw, it is pretty complicated, are there some more user-friendly wrapers for that? kind of isHole, getHole?
Holes are easy to fall into, so no wrappers.
slot(slot(slot(Syracuse2, "polygons")[[43]],"Polygons")[[1]], "hole")<-TRUE plot(poly2nb(Syracuse), coordinates(Syracuse2))
poly2nb() uses all the geometries present. The prefered choice is to subset the geometries to keep only the ones the user requires, so: not_holes <- !sapply(sapply(slot(Syracuse2, "polygons"),function(x) ?slot(x, "Polygons")), function(x) slot(x, "hole")) nb <- poly2nb(Syracuse2[not_holes,]) plot(nb, coordinates(Syracuse2)[not_holes,]) I guess this clarifies things. In principle, a single Polygon object in a Polygons object will not be a hole anyway (it is an external ring), so your example is rather artificial. Subsetting the geometries with "[" is to be prefered.
Oh so for the lake, I should rather use ringDir? And when my dataset contains lakes and NAs, should I do the same as above also for Nas and then subset? Will Moran values be affected by that? Thanks a lot!
Hope this helps, Roger I guess the same applies to your followup - but spautolm() ought not to permit computation on missing data - I'll check this.
Here it seems that it did not take into account the hole and still computes neighbors... right? And if yes... does it affect the results using moran tests? Thanks a lot! Matthieu Stigler
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
On Mon, 7 Sep 2009, Matthieu Stigler wrote:
2009/9/5 Roger Bivand <Roger.Bivand at nhh.no>:
On Thu, 3 Sep 2009, Matthieu Stigler wrote:
Hi I'm puzzled hot to deal with NAs and lakes on SpatialPolygonsDataFrame objects... My problems (maybe should I post in separated files?): -spplot does not seem to handle specially NA values... -neighbors graph seem to make links even to lakes/holes -moran.test seem to work with NA values... but what about the neighbors/weights used as input? Are the results biased? -localmoran does not seem to work with NA I illustrate each of them below. Example is based on Syracuse data from ASDAR (data from web-site http://www.asdar-book.org/bundles/lat1_bundle.zip): #NA PROBLEM setwd("H:/Documents/Stats/Book Bivand/Chap 9") library(sp) library(rgdal) NY8 <- readOGR(".", "NY8_utm18") library(spdep) Syracuse <- NY8[NY8$AREANAME == "Syracuse city",] Syracuse2<-Syracuse Syracuse2$POP8[43]<-NA spplot(Syracuse2, zcol="POP8") #it appears as white, similar to other colors which are nevertheless true values and not NA!
Thanks a lot for your help! For plotting NAs... well changing bg is a little bit radical, as everything is then different... the problem came actually because I used heat.col() which looks really pretty but uses white... isn't it a function in treillis to change only NAs and not all the rest with? Thanks!
Not that I am aware of. If you want this level of control, use the at= and col.regions= arguments, and assign an improbable value to the NAs, or use base graphics and use hatching for the NAs. I'm travelling and do not have Deepayan Sarkar's excellent lattice book with me, perhaps you could check what he says?
NAs are coded "transparent", which look the same when the background is white. In the default divergent palette, white is used. If you do: library(lattice) trellis.device(bg="grey") spplot(Syracuse2, zcol="POP8") (wrong way to change background but OK to show the point) or trellis.par.set(sp.theme()) spplot(Syracuse2, zcol="POP8") where white is not in the palette.
LAKE problem: #check if lakes: sapply(sapply(slot(Syracuse, "polygons"),function(x) slot(x, "Polygons")), function(x) slot(x, "hole")) #btw, it is pretty complicated, are there some more user-friendly wrapers for that? kind of isHole, getHole?
Holes are easy to fall into, so no wrappers.
slot(slot(slot(Syracuse2, "polygons")[[43]],"Polygons")[[1]], "hole")<-TRUE plot(poly2nb(Syracuse), coordinates(Syracuse2))
poly2nb() uses all the geometries present. The prefered choice is to subset the geometries to keep only the ones the user requires, so: not_holes <- !sapply(sapply(slot(Syracuse2, "polygons"),function(x) ?slot(x, "Polygons")), function(x) slot(x, "hole")) nb <- poly2nb(Syracuse2[not_holes,]) plot(nb, coordinates(Syracuse2)[not_holes,]) I guess this clarifies things. In principle, a single Polygon object in a Polygons object will not be a hole anyway (it is an external ring), so your example is rather artificial. Subsetting the geometries with "[" is to be prefered.
Oh so for the lake, I should rather use ringDir? And when my dataset contains lakes and NAs, should I do the same as above also for Nas and then subset? Will Moran values be affected by that?
Please do not use the hole= or ringDir= slot as topologically checked. See checkPolygonsHoles() in maptools for details. In many representations, the entity (here Polygons object) has to have one or more external rings that are not holes, so your lake may be turned from hole to non-hole by software. If there is "nothing" there, subset it out on an attribute, for example a factor describing land cover: x1 <- x0[x0$landCover != "lake",] then subset on the NAs. Using the geometry characteristics of entities whose topologies have not been built or checked is not robust (may work with some data sets, but not with others). A lake isn't a hole, it is an entity that is a lake, or it is not an entity at all (subsetted out). In this case you need to check your visualization representations. Confusion with Moran's I will result if your entities are confused, so sort them out first. Hope this helps, Roger
Thanks a lot!
Hope this helps, Roger I guess the same applies to your followup - but spautolm() ought not to permit computation on missing data - I'll check this.
Here it seems that it did not take into account the hole and still computes neighbors... right? And if yes... does it affect the results using moran tests? Thanks a lot! Matthieu Stigler
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
-- Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no