analysis on .dbf file instead of .shp
Hello,
On Mon, Jun 11, 2012 at 8:46 PM, aniruddha ghosh <aniru123 at gmail.com> wrote:
Hello list, I am trying to perform a regression analysis on a vector data (shape file). Some of the attributes of the shape files are the potential explanatory variables (lets say X1 and X2) and response variable (Y). Now instead of reading the shapefile, I'm using the associated .dbf file and performing the analysis. This looks like, ----------------------------------------
data<-read.dbf("test.dbf")
names(data)
?"FID" ?"X1" ? ?"X2" ? ?"Y" ? ? "POINT_X" ? ? ? "POINT_Y"
X<-cbind(data$X1,data$x2) Y<-data$Y summary(lm(Y~X))
---------------------------------------- Question: Is it a good practice to use the .dbf file instead of the .shp file?
It should not matter, and you can obtain the same data (via the same
foreign::read.dbf function) by using the maptools functions
readShapePoints/Lines/Poly. You can always get the original data with
as.data.frame:
fname.shp <- system.file("shapes/baltim.shp", package="maptools")[1]
fname.dbf <- system.file("shapes/baltim.dbf", package="maptools")[1]
library(foreign)
dd <- read.dbf(fname.dbf)
names(dd)
library(maptools)
xx <- readShapePoints(fname.shp)
names(as.data.frame(xx))
[1] "STATION" "PRICE" "NROOM" "DWELL" "NBATH"
"PATIO" "FIREPL" "AC" "BMENT" "NSTOR" "GAR"
"AGE" "CITCOU" "LOTSZ" "SQFT"
[16] "X" "Y" "coords.x1" "coords.x2"
Note that for the SpatialPointsDataFrame you also get the spatial
coordinates as extra columns (in this case it is a simple one-to-one
of point coordinates to attributes, which won't always be true for
MULTIPOINT or line/polygon geometries).
Apart from the spatial coordinate values, there are some attribute
differences, but the dimensions, names and column class of the two
data.frames is the same:
all.equal(dd, as.data.frame(xx)[,-c(18, 19)])
[1] "Attributes: < Names: 1 string mismatch >"
"Attributes: < Length mismatch: comparison on first
2 components >"
[3] "Attributes: < Component 2: Lengths (17, 211) differ (string
compare on first 17) >" "Attributes: < Component 2: 17 string
mismatches >"
There is another route to read shapefile/dbf with readOGR() in the
rgdal package, and there might be slight differences with reading the
DBF that way since it is a completely different set of code under the
hood, though they would be subtle if at all and may just depend on the
vagaries of the file. The return value is a Spatial*DataFrame as it
is for the maptools functions.
Cheers, Mike.
Can I use the model developed here to predict some unknown Y with known X (obtained from another .dbf file), and combine the predicted Y as attribute to this .dbf file? I'm using the .dbf file beacuse it is allowing me to apply diiferent methods from different packages for prediction which I couldn't apply to the .shp files due to my limited knowledge in using R! Thanks, Aniruddha Ghosh
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Michael Sumner Hobart, Australia e-mail: mdsumner at gmail.com