Skip to content
Prev 973 / 29559 Next

regression and cluster analysis with shapefiles: data import problems

On Tue, 25 Apr 2006, Danlin Yu wrote:

            
It always helps to paste copies of the commands you are giving, usually 
together with sessionInfo() to give the package versions.

If you are using the maptools package, you can use read.shape() to return 
an object of class "Map", as the help page for the function explains.
Loading required package: foreign
Loading required package: sp
Version 2.3.0 (2006-04-24) 
i686-pc-linux-gnu 

attached base packages:
[1] "methods"   "stats"     "graphics"  "grDevices" "utils"     "datasets" 
[7] "base"     

other attached packages:
maptools       sp  foreign 
"0.5-11" "0.8-15" "0.8-12"
[1] "baltim.shp"   "columbus.shp" "fylk-val.shp" "sids.shp"    

These are the shapefiles distributed with the maptools package, by the 
way.
Shapefile type: Polygon, (5), # of Shapes: 49
[1] "Map"
[1] "AREA"       "PERIMETER"  "COLUMBUS_"  "COLUMBUS_I" "POLYID"    
 [6] "NEIG"       "HOVAL"      "INC"        "CRIME"      "OPEN"      
[11] "PLUMB"      "DISCBD"     "X"          "Y"          "NSA"       
[16] "NSB"        "EW"         "CP"         "THOUS"      "NEIGNO"
Call:
lm(formula = CRIME ~ HOVAL + INC, data = col1data)

Coefficients:
(Intercept)        HOVAL          INC  
    68.6190      -0.2739      -1.5973  


If you use the more user-friendly sp classes with functions from the 
maptools package, you do:
Shapefile type: Polygon, (5), # of Shapes: 49

to find out what kind it is, and use one of readShapePoints(), 
readShapeLines() or readShapePoly():
[1] "SpatialPolygonsDataFrame"
attr(,"package")
[1] "sp"

for which lm() just works:
Call:
lm(formula = CRIME ~ HOVAL + INC, data = col2)

Coefficients:
(Intercept)        HOVAL          INC  
    68.6190      -0.2739      -1.5973  

and should you need the centroids of polygons (or point locations), the 
coordinate method returns them:
num [1:49, 1:2] 8.83 8.33 9.01 8.46 9.01 ...

where str() is a helper function showing the structure of an object. There 
are other ways, but this is the simplest, because sp class SpatialPolygons 
component Polygons objects store their centroids. By the way, the plot 
methods for these classes of objects are also pretty good, and new 
variables can just be added:
(with lots of options for modification if needed, but now nobody can say 
that they couldn't map the residuals!)

In fact currently the easiest shapefile reader (readOGR()) is in the rgdal
package, because it also reads the *.prj file, and so creates the sp class
object with the appropriate metadata, and finds out automatically if the 
spatial entities are points, lines, or polygons. If your planners, Anja, 
need to keep the coordinate reference system tidy, this may be worth 
looking at, combined with the still slightly messy writing of shapefiles:

writePolyShape(), writeLinesShape(), or writePointsShape() in the maptools 
package and showWKT() in the rgdal package to write the appropriate *.prj.

If you are using read.shapefile() in the shapefiles package, you'll need 
to get at an internal element to find the data.frame:
[1] "AREA"       "PERIMETER"  "COLUMBUS_"  "COLUMBUS_I" "POLYID"    
 [6] "NEIG"       "HOVAL"      "INC"        "CRIME"      "OPEN"      
[11] "PLUMB"      "DISCBD"     "X"          "Y"          "NSA"       
[16] "NSB"        "EW"         "CP"         "THOUS"      "NEIGNO"
Call:
lm(formula = CRIME ~ HOVAL + INC, data = col3data)

Coefficients:
(Intercept)        HOVAL          INC  
    68.6190      -0.2739      -1.5973  


will do it.

Roger