regression and cluster analysis with shapefiles: data import problems
On Tue, 25 Apr 2006, Danlin Yu wrote:
Anja: If you have a shapefile imported using read.shape, you can view the header (the attribute names) of your shapefile object using "colnames(shapeobject$att.data)" Similarly, you can assign individual attributes as R objects using command like: attribute1 <- shapeobject$att.data$attribute1 For polygons, you can get the X, Y, coordinates by using get.Pcent (shapeobject) in the package maptools (which will be loaded when you load the package spdep). I am quite sure you can do similar work with point file as well (Roger mentioned that to me once, but since I have access to ArcGIS, and as a habit I always creat the X, Y fields for my data, I forgot the procedure).
It always helps to paste copies of the commands you are giving, usually together with sessionInfo() to give the package versions. If you are using the maptools package, you can use read.shape() to return an object of class "Map", as the help page for the function explains.
library(maptools)
Loading required package: foreign Loading required package: sp
sessionInfo()
Version 2.3.0 (2006-04-24) i686-pc-linux-gnu attached base packages: [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets" [7] "base" other attached packages: maptools sp foreign "0.5-11" "0.8-15" "0.8-12"
list.files(pattern="shp")
[1] "baltim.shp" "columbus.shp" "fylk-val.shp" "sids.shp" These are the shapefiles distributed with the maptools package, by the way.
col1 <- read.shape("columbus.shp")
Shapefile type: Polygon, (5), # of Shapes: 49
class(col1)
[1] "Map"
names(col1$att.data)
[1] "AREA" "PERIMETER" "COLUMBUS_" "COLUMBUS_I" "POLYID" [6] "NEIG" "HOVAL" "INC" "CRIME" "OPEN" [11] "PLUMB" "DISCBD" "X" "Y" "NSA" [16] "NSB" "EW" "CP" "THOUS" "NEIGNO"
col1data <- col1$att.data lm(CRIME ~ HOVAL + INC, data=col1data)
Call:
lm(formula = CRIME ~ HOVAL + INC, data = col1data)
Coefficients:
(Intercept) HOVAL INC
68.6190 -0.2739 -1.5973
If you use the more user-friendly sp classes with functions from the
maptools package, you do:
getinfo.shape("columbus.shp")
Shapefile type: Polygon, (5), # of Shapes: 49 to find out what kind it is, and use one of readShapePoints(), readShapeLines() or readShapePoly():
col2 <- readShapePoly("columbus.shp")
class(col2)
[1] "SpatialPolygonsDataFrame" attr(,"package") [1] "sp" for which lm() just works:
lm(CRIME ~ HOVAL + INC, data=col2)
Call:
lm(formula = CRIME ~ HOVAL + INC, data = col2)
Coefficients:
(Intercept) HOVAL INC
68.6190 -0.2739 -1.5973
and should you need the centroids of polygons (or point locations), the
coordinate method returns them:
coords <- coordinates(col2) str(coords)
num [1:49, 1:2] 8.83 8.33 9.01 8.46 9.01 ... where str() is a helper function showing the structure of an object. There are other ways, but this is the simplest, because sp class SpatialPolygons component Polygons objects store their centroids. By the way, the plot methods for these classes of objects are also pretty good, and new variables can just be added:
mylm <- lm(CRIME ~ HOVAL + INC, data=col2) col2$mylm_res <- residuals(mylm) spplot(col2, "mylm_res")
(with lots of options for modification if needed, but now nobody can say that they couldn't map the residuals!) In fact currently the easiest shapefile reader (readOGR()) is in the rgdal package, because it also reads the *.prj file, and so creates the sp class object with the appropriate metadata, and finds out automatically if the spatial entities are points, lines, or polygons. If your planners, Anja, need to keep the coordinate reference system tidy, this may be worth looking at, combined with the still slightly messy writing of shapefiles: writePolyShape(), writeLinesShape(), or writePointsShape() in the maptools package and showWKT() in the rgdal package to write the appropriate *.prj. If you are using read.shapefile() in the shapefiles package, you'll need to get at an internal element to find the data.frame:
library(shapefiles)
col3 <- read.shapefile("columbus")
names(col3$dbf$dbf)
[1] "AREA" "PERIMETER" "COLUMBUS_" "COLUMBUS_I" "POLYID" [6] "NEIG" "HOVAL" "INC" "CRIME" "OPEN" [11] "PLUMB" "DISCBD" "X" "Y" "NSA" [16] "NSB" "EW" "CP" "THOUS" "NEIGNO"
col3data <- col3$dbf$dbf lm(CRIME ~ HOVAL + INC, data=col3data)
Call:
lm(formula = CRIME ~ HOVAL + INC, data = col3data)
Coefficients:
(Intercept) HOVAL INC
68.6190 -0.2739 -1.5973
will do it.
Roger
Hope this helps. Cheers,
___________________________________________ Danlin Yu, Ph.D. Assistant Professor Department of Earth & Environmental Studies Montclair State University Montclair, NJ, 07043 Tel: 973-655-4313 Fax: 973-655-4072 email: yud at mail.montclair.edu ----- Original Message ----- From: Anja Matatko <Anja.Matatko at alta4.com> Date: Tuesday, April 25, 2006 11:37 am Subject: [R-sig-Geo] regression and cluster analysis with shapefiles: data import problems Dear list, I have a shapefile with several numerical attributes and want to do cluster and regression analysis with this data (sociodemographic data). I use the read.shapefile-command to import my data (it is well imported, it appears as an object), and then my first problem occurs: I want to do the analysis as in Anselin's "An Introduction to Spatial Regression Analysis in R" or the FAO publication by Petrucci / Salvati / Seghieri "The application of a spatial regression model to the analysis and mapping of poverty", but they all use other raw data than shapefiles and I couldn't find a hint how to convert my shapefile to a format that supports the lm() or similar functions. I think there are two problems to solve: - I need to have access to the headers of my shapefile attribute data, in order to execute commands as mydata$variable1 (actually, I can't call mydata$variable1, I just get the answer NULL) - I need something to get the spatial information contained in the shapefile (my attribute table has no x y coordinates included). Or is it really necessary to insert x and y columns in the attribute table with the help of ArcGIS? Thanks for help, Anja (student of applied geography writing masters thesis about the use of geographical information systems and several "spatial statistics software" - including R - in planning support, in domains not much affiliated to spatial analysis but working with spatial data) -------------------------------------------- Anja Matatko alta4 Geoinformatik AG Frauenstra?e 8-9 54290 Trier/Germany voice: +49.651.96626-0 fax: +49.651.96626-26 email: anja.matatko at alta4.com internet: http://www.alta4.com Die n?chsten GIS-Schulungen: - ArcGIS 9 Umsteiger: 15. - 17.05.06 in M?nchen - ArcGIS 9 Geoprocessing & Model Builder: 18. - 19.05.06 in M?nchen - What's New in ArcGIS 9.1?: 22.-23.05.06 in Trier - ArcGIS Digitalisieren: 02.06.06 in Trier Weitere Infos: http://www.alta4.com/de/schulung/index.php Treffen Sie uns auf folgenden Veranstaltungen: - IT-Messe: 04.-05.05.06 in Trier - ESRI Anwenderkonferenz: 09.-11.05.06 in Salzburg Weitere Infos: http://www.alta4.com/de/alta4/events.php _______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo _______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no