Hi,
I have noticed that saving data to files that include a DBF, result in
bogus data where there were NA. Using the write.dbf() function from
the foreign package seems to work a little better, but I still get odd
results in numeric columns. Writing to GRASS with the methods in the
spgrass6 package results in some thing that looks like this:
### code snippet:
writeVECT6(SDF=spatial.data, vname='pedons_grouped')
### errors:
Projection of input dataset and current location appear to match
Layer: pedons_g
WARNING: Column name changed: 'describer.' -> 'describer_'
WARNING: Column name changed: 'cat' -> 'cat_'
Importing map 103 features...
DBMI-DBF driver error:
SQL parser error: @@rror, unexpected NAME processing 'nan'
in statement:
insert into pedons_grouped values ( 1, 'd2g1', 'alex',
32.311427999999999, 252.434875000000005, 7227.804688000000169,
-0.000162000000000, 3, nan, 'NA',
-2147483648, 'NA', 'NA', -2147483648, -2147483648, 'NA',
nan, '1', 'NA' )
Error in db_execute_immediate()
ERROR: Cannot insert new row: insert into pedons_grouped values ( 1,
'd2g1', 'alex', 32.311427999999999, 252.434875000000005,
7227.804688000000169, -0.000162000000000, 3, nan, 'NA',
-2147483648,
'NA', 'NA', -2147483648, -2147483648, 'NA', nan, '1', 'NA' )
### another self-contained example:
# load libs
library(sp)
library(rgdal)
library(foreign)
# read in xy data and promote to sp object
e <- read.csv(url('http://casoilresource.lawr.ucdavis.edu/drupal/files/elev.csv_.txt'))
coordinates(e) <- ~ x+y
# add a factor column
e at data$f <- factor(rep(letters[1:10], each=30))
# add some NA
e at data$elev[288:300] <- NA
e at data$f[288:300] <- NA
# save sp object to shapefile
writeOGR(e, driver='ESRI Shapefile', dsn='.', layer='pts')
# the results from dumping the DBF:
[...]
285,1543,j
286,1518,j
287,1656,j
288,-2147483648,NA
289,-2147483648,NA
[...]
# one more try with the foreign package's write.dbf()
write.dbf(e at data, file='second_try.dbf')
# results: look better, although the '******' isn't a legal int!
[...]
285,1543,j
286,1518,j
287,1656,j
288,*******,
289,*******,
[...]
Any ideas on how to work with missing data in numeric columns, when
the dreaded DBF file is involved??? This is a real show-stopper when
sending vector data back to GRASS, as it seems to rely on intermediate
files. Maybe it would be a good idea to send the geometry first, and
then the attribute data. There would still be a problem if the DBF
back-end is in use...
Cheers,
Dylan
writing shapefiles / DBF files when input data contains NA
6 messages · Roger Bivand, Dylan Beaudette
On Mon, 6 Oct 2008, Dylan Beaudette wrote:
Hi, I have noticed that saving data to files that include a DBF, result in bogus data where there were NA. Using the write.dbf() function from the foreign package seems to work a little better, but I still get odd results in numeric columns. Writing to GRASS with the methods in the spgrass6 package results in some thing that looks like this:
Dylan,
I'm afraid that there is no good solution for this at all. DBF does not
seem to have a clear and uniform NA treatment (or even !is.finite()
treatment). The only work-around is to preprocess the data.frame in the
output object to insert known NODATA values, and to replace those flags
manually on the GRASS side. This could possibly be written as a wrapper
around writeVECT6(). The help page does say:
"Please note that the OGR drivers used may not handle missing data
gracefully, and be prepared to have to correct for this manually.
For example use of the 'readOGR' PostGIS driver directly may
perform better than moving the data through the DBF driver used in
this function - or a PostgreSQL driver used directly or through
ODBC may be a solution. Do not rely on missing values of vector
data moving smoothly across the interface."
I did try to look at the SQLite driver on the GRASS side, which might be
more robust, but did not see how to proceed.
One possibility is not to recode, but to build an NA mask on the R side,
and then loop over fields on the GRASS side for the chosen driver
inserting NAs in the correct rows (whatever the syntax for that might be).
Would this be db.execute with an insertion of SQL NULL?
Can we redirect this discussion to the statgrass list, because GRASS
developers follow that list?
Best wishes,
Roger
### code snippet:
writeVECT6(SDF=spatial.data, vname='pedons_grouped')
### errors:
Projection of input dataset and current location appear to match
Layer: pedons_g
WARNING: Column name changed: 'describer.' -> 'describer_'
WARNING: Column name changed: 'cat' -> 'cat_'
Importing map 103 features...
DBMI-DBF driver error:
SQL parser error: @@rror, unexpected NAME processing 'nan'
in statement:
insert into pedons_grouped values ( 1, 'd2g1', 'alex',
32.311427999999999, 252.434875000000005, 7227.804688000000169,
-0.000162000000000, 3, nan, 'NA',
-2147483648, 'NA', 'NA', -2147483648, -2147483648, 'NA',
nan, '1', 'NA' )
Error in db_execute_immediate()
ERROR: Cannot insert new row: insert into pedons_grouped values ( 1,
'd2g1', 'alex', 32.311427999999999, 252.434875000000005,
7227.804688000000169, -0.000162000000000, 3, nan, 'NA',
-2147483648,
'NA', 'NA', -2147483648, -2147483648, 'NA', nan, '1', 'NA' )
### another self-contained example:
# load libs
library(sp)
library(rgdal)
library(foreign)
# read in xy data and promote to sp object
e <- read.csv(url('http://casoilresource.lawr.ucdavis.edu/drupal/files/elev.csv_.txt'))
coordinates(e) <- ~ x+y
# add a factor column
e at data$f <- factor(rep(letters[1:10], each=30))
# add some NA
e at data$elev[288:300] <- NA
e at data$f[288:300] <- NA
# save sp object to shapefile
writeOGR(e, driver='ESRI Shapefile', dsn='.', layer='pts')
# the results from dumping the DBF:
[...]
285,1543,j
286,1518,j
287,1656,j
288,-2147483648,NA
289,-2147483648,NA
[...]
# one more try with the foreign package's write.dbf()
write.dbf(e at data, file='second_try.dbf')
# results: look better, although the '******' isn't a legal int!
[...]
285,1543,j
286,1518,j
287,1656,j
288,*******,
289,*******,
[...]
Any ideas on how to work with missing data in numeric columns, when
the dreaded DBF file is involved??? This is a real show-stopper when
sending vector data back to GRASS, as it seems to rely on intermediate
files. Maybe it would be a good idea to send the geometry first, and
then the attribute data. There would still be a problem if the DBF
back-end is in use...
Cheers,
Dylan
_______________________________________________ R-sig-Geo mailing list R-sig-Geo at stat.math.ethz.ch https://stat.ethz.ch/mailman/listinfo/r-sig-geo
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
On Tuesday 07 October 2008, Roger Bivand wrote:
On Mon, 6 Oct 2008, Dylan Beaudette wrote:
Hi, I have noticed that saving data to files that include a DBF, result in bogus data where there were NA. Using the write.dbf() function from the foreign package seems to work a little better, but I still get odd results in numeric columns. Writing to GRASS with the methods in the spgrass6 package results in some thing that looks like this:
Dylan,
I'm afraid that there is no good solution for this at all. DBF does not
seem to have a clear and uniform NA treatment (or even !is.finite()
treatment). The only work-around is to preprocess the data.frame in the
output object to insert known NODATA values, and to replace those flags
manually on the GRASS side. This could possibly be written as a wrapper
around writeVECT6(). The help page does say:
"Please note that the OGR drivers used may not handle missing data
gracefully, and be prepared to have to correct for this manually.
For example use of the 'readOGR' PostGIS driver directly may
perform better than moving the data through the DBF driver used in
this function - or a PostgreSQL driver used directly or through
ODBC may be a solution. Do not rely on missing values of vector
data moving smoothly across the interface."
I did try to look at the SQLite driver on the GRASS side, which might be
more robust, but did not see how to proceed.
One possibility is not to recode, but to build an NA mask on the R side,
and then loop over fields on the GRASS side for the chosen driver
inserting NAs in the correct rows (whatever the syntax for that might be).
Would this be db.execute with an insertion of SQL NULL?
Can we redirect this discussion to the statgrass list, because GRASS
developers follow that list?
Best wishes,
Roger
Sorry for the cross-posting. Wanted to clarify where this thread is going/went. Hi Roger-- It looks like the limiting factor in this equation is the code used in v.out.ogr.
On Tuesday 07 October 2008, Dylan Beaudette wrote:
On Tuesday 07 October 2008, Roger Bivand wrote:
On Mon, 6 Oct 2008, Dylan Beaudette wrote:
Hi, I have noticed that saving data to files that include a DBF, result in bogus data where there were NA. Using the write.dbf() function from the foreign package seems to work a little better, but I still get odd results in numeric columns. Writing to GRASS with the methods in the spgrass6 package results in some thing that looks like this:
Dylan,
I'm afraid that there is no good solution for this at all. DBF does not
seem to have a clear and uniform NA treatment (or even !is.finite()
treatment). The only work-around is to preprocess the data.frame in the
output object to insert known NODATA values, and to replace those flags
manually on the GRASS side. This could possibly be written as a wrapper
around writeVECT6(). The help page does say:
"Please note that the OGR drivers used may not handle missing data
gracefully, and be prepared to have to correct for this manually.
For example use of the 'readOGR' PostGIS driver directly may
perform better than moving the data through the DBF driver used in
this function - or a PostgreSQL driver used directly or through
ODBC may be a solution. Do not rely on missing values of vector
data moving smoothly across the interface."
I did try to look at the SQLite driver on the GRASS side, which might be
more robust, but did not see how to proceed.
One possibility is not to recode, but to build an NA mask on the R side,
and then loop over fields on the GRASS side for the chosen driver
inserting NAs in the correct rows (whatever the syntax for that might
be). Would this be db.execute with an insertion of SQL NULL?
Can we redirect this discussion to the statgrass list, because GRASS
developers follow that list?
Best wishes,
Roger
Sorry for the cross-posting. Wanted to clarify where this thread is going/went. Hi Roger-- It looks like the limiting factor in this equation is the code used in v.out.ogr.
From the GRASS-dev + Frank W's help:
Sounds good :) Does anyone know how to fix vector/v.out.ogr/main.c to support NULLs? I see db_set_value_null() in lib/db/dbmi_base/value.c which might be relevant.
Markus, Once you establish which GRASS attributes are NULL, you can ensure they are pushed out to OGR as null by just skipping the step that sets them. Perhaps that will help a bit.
So, once v.out.ogr is fixed, this should clear up several issues: 1. import of vector data into R via spgrass6 methods 2. better compatibility of vector data exported from GRASS I still do not know why writeOGR() does not create correct DBF files... it may be related to the code in v.out.ogr.... Cheers, Dylan
Some follow-up: the incorrect handling of NULL values appears to be related to the current implementation of v.out.ogr AND readOGR() / writeOGR(). Dylan
Dylan Beaudette Soil Resource Laboratory http://casoilresource.lawr.ucdavis.edu/ University of California at Davis 530.754.7341
On Wed, 8 Oct 2008, Dylan Beaudette wrote:
On Tuesday 07 October 2008, Dylan Beaudette wrote:
On Tuesday 07 October 2008, Roger Bivand wrote:
On Mon, 6 Oct 2008, Dylan Beaudette wrote:
Hi, I have noticed that saving data to files that include a DBF, result in bogus data where there were NA. Using the write.dbf() function from the foreign package seems to work a little better, but I still get odd results in numeric columns. Writing to GRASS with the methods in the spgrass6 package results in some thing that looks like this:
Dylan,
I'm afraid that there is no good solution for this at all. DBF does not
seem to have a clear and uniform NA treatment (or even !is.finite()
treatment). The only work-around is to preprocess the data.frame in the
output object to insert known NODATA values, and to replace those flags
manually on the GRASS side. This could possibly be written as a wrapper
around writeVECT6(). The help page does say:
"Please note that the OGR drivers used may not handle missing data
gracefully, and be prepared to have to correct for this manually.
For example use of the 'readOGR' PostGIS driver directly may
perform better than moving the data through the DBF driver used in
this function - or a PostgreSQL driver used directly or through
ODBC may be a solution. Do not rely on missing values of vector
data moving smoothly across the interface."
I did try to look at the SQLite driver on the GRASS side, which might be
more robust, but did not see how to proceed.
One possibility is not to recode, but to build an NA mask on the R side,
and then loop over fields on the GRASS side for the chosen driver
inserting NAs in the correct rows (whatever the syntax for that might
be). Would this be db.execute with an insertion of SQL NULL?
Can we redirect this discussion to the statgrass list, because GRASS
developers follow that list?
Best wishes,
Roger
Sorry for the cross-posting. Wanted to clarify where this thread is going/went. Hi Roger-- It looks like the limiting factor in this equation is the code used in v.out.ogr.
From the GRASS-dev + Frank W's help:
Sounds good :) Does anyone know how to fix vector/v.out.ogr/main.c to support NULLs? I see db_set_value_null() in lib/db/dbmi_base/value.c which might be relevant.
Markus, Once you establish which GRASS attributes are NULL, you can ensure they are pushed out to OGR as null by just skipping the step that sets them. Perhaps that will help a bit.
So, once v.out.ogr is fixed, this should clear up several issues: 1. import of vector data into R via spgrass6 methods 2. better compatibility of vector data exported from GRASS I still do not know why writeOGR() does not create correct DBF files... it may be related to the code in v.out.ogr.... Cheers, Dylan
Some follow-up: the incorrect handling of NULL values appears to be related to the current implementation of v.out.ogr AND readOGR() / writeOGR().
OK, this makes sense, because parts of readOGR() / writeOGR() were written based on the logic of v.in.ogr and v.out.ogr, and more attention was given to the geometries than the attribute fields. If the GRASS code was taking liberties with handling NAs, then that behaviour is very probably present in readOGR() / writeOGR() too. The rgdal package has a public sourceforge CVS repository, so everybody please feel free to browse for bugs. It would be helpful to have a set of vector files with valid NAs (not just shapefiles), and a set of sp objects with NAs, and to be able to move them in and out of both R and GRASS (and other software) with the NAs intact. As a first bite, OGRFeature::IsFieldSet() seems to test whether the field is set or not. It isn't used in ogrReadColumn() in src/ogrsource.cpp in rgdal, nor the equivalent in OGR_write() in src/OGR_write.cpp. Assuming that we can correct these to use OGR NULL data representations (would that be unset the field for the feature?), we then depend on the drivers using the same logic. In addition, non-OGR written files need to use the same understanding of NULL as the OGR drivers. GRASS v.in.ogr() does use OGR_F_IsFieldSet(), and if not set writes a NULL to numeric fields and an empty string to the others. Fixing writeOGR() ought to get NAs from R to GRASS. v.out.ogr does not seem to use OGR_F_UnsetField() on the fields being output, and readOGR() does not test for the fields being unset either - so getting NAs from GRASS to R needs more work. This is described in extenso here because things don't happen by themselves, and this particular overlap of R/OGR/GRASS code probably matters to regular users of rgdal. Collaboration in fixing the handling of NAs in vector data files invited! Roger
Dylan
Roger Bivand Economic Geography Section, Department of Economics, Norwegian School of Economics and Business Administration, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no
On Wednesday 08 October 2008, Roger Bivand wrote:
On Wed, 8 Oct 2008, Dylan Beaudette wrote:
On Tuesday 07 October 2008, Dylan Beaudette wrote:
On Tuesday 07 October 2008, Roger Bivand wrote:
On Mon, 6 Oct 2008, Dylan Beaudette wrote:
Hi, I have noticed that saving data to files that include a DBF, result in bogus data where there were NA. Using the write.dbf() function from the foreign package seems to work a little better, but I still get odd results in numeric columns. Writing to GRASS with the methods in the spgrass6 package results in some thing that looks like this:
Dylan,
I'm afraid that there is no good solution for this at all. DBF does not
seem to have a clear and uniform NA treatment (or even !is.finite()
treatment). The only work-around is to preprocess the data.frame in the
output object to insert known NODATA values, and to replace those flags
manually on the GRASS side. This could possibly be written as a wrapper
around writeVECT6(). The help page does say:
"Please note that the OGR drivers used may not handle missing data
gracefully, and be prepared to have to correct for this manually.
For example use of the 'readOGR' PostGIS driver directly may
perform better than moving the data through the DBF driver used
in this function - or a PostgreSQL driver used directly or through ODBC
may be a solution. Do not rely on missing values of vector data moving
smoothly across the interface."
I did try to look at the SQLite driver on the GRASS side, which might
be more robust, but did not see how to proceed.
One possibility is not to recode, but to build an NA mask on the R
side, and then loop over fields on the GRASS side for the chosen driver
inserting NAs in the correct rows (whatever the syntax for that might
be). Would this be db.execute with an insertion of SQL NULL?
Can we redirect this discussion to the statgrass list, because GRASS
developers follow that list?
Best wishes,
Roger
Sorry for the cross-posting. Wanted to clarify where this thread is going/went. Hi Roger-- It looks like the limiting factor in this equation is the code used in v.out.ogr.
From the GRASS-dev + Frank W's help:
Sounds good :) Does anyone know how to fix vector/v.out.ogr/main.c to support NULLs? I see db_set_value_null() in lib/db/dbmi_base/value.c which might be relevant.
Markus, Once you establish which GRASS attributes are NULL, you can ensure they are pushed out to OGR as null by just skipping the step that sets them. Perhaps that will help a bit.
So, once v.out.ogr is fixed, this should clear up several issues: 1. import of vector data into R via spgrass6 methods 2. better compatibility of vector data exported from GRASS I still do not know why writeOGR() does not create correct DBF files... it may be related to the code in v.out.ogr.... Cheers, Dylan
Some follow-up: the incorrect handling of NULL values appears to be related to the current implementation of v.out.ogr AND readOGR() / writeOGR().
OK, this makes sense, because parts of readOGR() / writeOGR() were written based on the logic of v.in.ogr and v.out.ogr, and more attention was given to the geometries than the attribute fields. If the GRASS code was taking liberties with handling NAs, then that behaviour is very probably present in readOGR() / writeOGR() too.
If that is the case, then a fix for one should easily be 'ported' to the other. A place to start looking for an answer would probably be the source for v.in.ogr -- as this correctly preserves NULL data when importing from shapefiles... haven't tried anything else.
The rgdal package has a public sourceforge CVS repository, so everybody please feel free to browse for bugs. It would be helpful to have a set of vector files with valid NAs (not just shapefiles), and a set of sp objects with NAs, and to be able to move them in and out of both R and GRASS (and other software) with the NAs intact.
Attached to this message is one such shapefile-- sorry I do not have another vector format.
As a first bite, OGRFeature::IsFieldSet() seems to test whether the field is set or not. It isn't used in ogrReadColumn() in src/ogrsource.cpp in rgdal, nor the equivalent in OGR_write() in src/OGR_write.cpp. Assuming that we can correct these to use OGR NULL data representations (would that be unset the field for the feature?), we then depend on the drivers using the same logic. In addition, non-OGR written files need to use the same understanding of NULL as the OGR drivers. GRASS v.in.ogr() does use OGR_F_IsFieldSet(), and if not set writes a NULL to numeric fields and an empty string to the others. Fixing writeOGR() ought to get NAs from R to GRASS. v.out.ogr does not seem to use OGR_F_UnsetField() on the fields being output, and readOGR() does not test for the fields being unset either - so getting NAs from GRASS to R needs more work. This is described in extenso here because things don't happen by themselves, and this particular overlap of R/OGR/GRASS code probably matters to regular users of rgdal. Collaboration in fixing the handling of NAs in vector data files invited! Roger
Maybe this bit from M. Neteler / Frank W. will help: Markus:
?vector/v.out.ogr/main.c to support NULLs? I see db_set_value_null() in ?lib/db/dbmi_base/value.c which might be relevant.
Frank:
Once you establish which GRASS attributes are NULL, you can ensure they are pushed out to OGR as null by just skipping the step that sets them. Perhaps that will help a bit.
Cheers, Dylan -------------- next part -------------- PROJCS["NAD_1927_UTM_Zone_13N",GEOGCS["GCS_North_American_1927",DATUM["D_North_American_1927",SPHEROID["Clarke_1866",6378206.4,294.978698213898]],PRIMEM["Greenwich",0],UNIT["Degree",0.017453292519943295]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-105],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["Meter",1]] -------------- next part -------------- A non-text attachment was scrubbed... Name: a_temp2.shx Type: application/octet-stream Size: 300 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20081008/290af92e/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: a_temp2.dbf Type: application/x-dbase Size: 1518 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20081008/290af92e/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: a_temp2.shp Type: application/octet-stream Size: 800 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-sig-geo/attachments/20081008/290af92e/attachment-0001.obj>