Dear R developers, tl;dr I've been trying to read foxpro dbf files with foreign::read.dbf(), they weren't being read properly, I patched the foreign package to make it work, now what? Long version: I recently encountered unexpected behavior attempting to read dbf files using foreign::read.dbf() from here: https://forms.ferc.gov/f1allyears/f1_2020.zip unzipped, in UPLOADERS/FORM1/working/F1_15.DBF - and as a note, this is a foxpro database. I would expect the first row of the first column to be 40, instead I am getting "(" (realizing that "(" has a decimal ascii value of 40). The xbase docs indicate that this is a field of type "I" which is a 4-byte integer unique to foxpro, and it doesn't look like this case is contemplated by read.dbf() I made some modifications to Rdbfread.c and dbfopen.c in the foreign package (version 0.8-82) to add specific handling for field type "I". I'm not current set up to contribute directly, I don't have SVN access. 1. Is this patch of general interest? I'm weighing in the development guidelines: - DO NOT fix exotic bugs that haven't bugged anyone - DO make small enhancements if they are badly needed and I feel like this is maybe a bit of an exotic lack-of-feature (wouldn't call it a bug), and I have no idea if this is badly needed (by anybody, other than myself) 2. if of general interest, how can I get set up with SVN credentials for R-packages? Thanks! -Ezra
foreign::read.dbf fails to parse dbf properly
3 messages · Duncan Murdoch, Ezra Tucker
On 29/07/2022 4:52 p.m., Ezra Tucker wrote:
Dear R developers, tl;dr I've been trying to read foxpro dbf files with foreign::read.dbf(), they weren't being read properly, I patched the foreign package to make it work, now what? Long version: I recently encountered unexpected behavior attempting to read dbf files using foreign::read.dbf() from here: https://forms.ferc.gov/f1allyears/f1_2020.zip unzipped, in UPLOADERS/FORM1/working/F1_15.DBF - and as a note, this is a foxpro database. I would expect the first row of the first column to be 40, instead I am getting "(" (realizing that "(" has a decimal ascii value of 40). The xbase docs indicate that this is a field of type "I" which is a 4-byte integer unique to foxpro, and it doesn't look like this case is contemplated by read.dbf() I made some modifications to Rdbfread.c and dbfopen.c in the foreign package (version 0.8-82) to add specific handling for field type "I". I'm not current set up to contribute directly, I don't have SVN access. 1. Is this patch of general interest? I'm weighing in the development guidelines: - DO NOT fix exotic bugs that haven't bugged anyone - DO make small enhancements if they are badly needed and I feel like this is maybe a bit of an exotic lack-of-feature (wouldn't call it a bug), and I have no idea if this is badly needed (by anybody, other than myself) 2. if of general interest, how can I get set up with SVN credentials for R-packages?
Roger addressed your first question. I'll give some information about the second one. You should automatically have read permission on the R-packages repository, as with most R svn repositories. I think to get write permission, you'd need to be invited to join R Core, or to be a maintainer of the package. The more common way to have changes accepted is to post them to the bugs.r-project.org web site. See https://www.r-project.org/bugs.html for more details on how to get an account set up there so you can post things. Duncan Murdoch
Thank you all for your thoughts, I'll definitely submit my bug report and patch there. With respect to the data themselves and the data formats, in my research I came across this document: http://www.manmrk.net/tutorials/database/xbase/data_types.html as a helpful reference Forgot to mention, attempted all this on R version 4.2.1 (2022-06-23) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Fedora Linux 36 (Workstation Edition) and also on a Windows 10 machine running R version 4.2.1 with UCRT. To Andre's and Roger's points, great minds- I did exactly the same things, opening the dbf files in LibreOffice (works perfectly, so does the soffice command line tool), also tried converting to csv using ogr2ogr, and got EXACTLY the same problem as I'm seeing in R. dbfopen.c says it's derived from Shapelib, documentation here: http://shapelib.maptools.org/dbf_api.html stating that DBFGetNativeFieldType() has support for C, D, F, N, L and M data types, and if ogr2ogr uses the same source code from shapelib, makes sense to me why it would interpret these values the same way. I'll note really quickly that if this were a simple matter of converting the ASCII, this wouldn't be an issue but certain characters (mostly the control characters, backspace, delete, and a few others) prevented me from accurately reconstructing all the original data once they were loaded into R. -Ezra
On Sat, 2022-07-30 at 08:17 -0400, Duncan Murdoch wrote:
On 29/07/2022 4:52 p.m., Ezra Tucker wrote:
Dear R developers, tl;dr I've been trying to read foxpro dbf files with foreign::read.dbf(), they weren't being read properly, I patched the foreign package to make it work, now what? Long version: I recently encountered unexpected behavior attempting to read dbf files using foreign::read.dbf() from here: https://forms.ferc.gov/f1allyears/f1_2020.zip unzipped, in UPLOADERS/FORM1/working/F1_15.DBF - and as a note, this is a foxpro database. I would expect the first row of the first column to be 40, instead I am getting "(" (realizing that "(" has a decimal ascii value of 40). The xbase docs indicate that this is a field of type "I" which is a 4-byte integer unique to foxpro, and it doesn't look like this case is contemplated by read.dbf() I made some modifications to Rdbfread.c and dbfopen.c in the foreign package (version 0.8-82) to add specific handling for field type "I". I'm not current set up to contribute directly, I don't have SVN access. 1. Is this patch of general interest? I'm weighing in the development guidelines: ?? - DO NOT fix exotic bugs that haven't bugged anyone ?? - DO make small enhancements if they are badly needed and I feel like this is maybe a bit of an exotic lack-of-feature (wouldn't call it a bug), and I have no idea if this is badly needed (by anybody, other than myself) 2. if of general interest, how can I get set up with SVN credentials for R-packages?
Roger addressed your first question.? I'll give some information about the second one. You should automatically have read permission on the R-packages repository, as with most R svn repositories. I think to get write permission, you'd need to be invited to join R Core, or to be a maintainer of the package. The more common way to have changes accepted is to post them to the bugs.r-project.org web site.? See https://www.r- project.org/bugs.html? for more details on how to get an account set up there so you can post things. Duncan Murdoch