Skip to content

foreign::read.dbf fails to parse dbf properly

3 messages · Duncan Murdoch, Ezra Tucker

#
Dear R developers,

tl;dr I've been trying to read foxpro dbf files with
foreign::read.dbf(), they weren't being read properly, I patched the
foreign package to make it work, now what?

Long version:
I recently encountered unexpected behavior attempting to read dbf files
using foreign::read.dbf() from here:

https://forms.ferc.gov/f1allyears/f1_2020.zip

unzipped, in UPLOADERS/FORM1/working/F1_15.DBF - and as a note, this is
a foxpro database. I would expect the first row of the first column to
be 40, instead I am getting "(" (realizing that "(" has a decimal ascii
value of 40). The xbase docs indicate that this is a field of type "I"
which is a 4-byte integer unique to foxpro, and it doesn't look like
this case is contemplated by read.dbf()

I made some modifications to Rdbfread.c and dbfopen.c in the foreign
package (version 0.8-82) to add specific handling for field type "I".

I'm not current set up to contribute directly, I don't have SVN access.

1. Is this patch of general interest? I'm weighing in the development
guidelines:
  - DO NOT fix exotic bugs that haven't bugged anyone
  - DO make small enhancements if they are badly needed
and I feel like this is maybe a bit of an exotic lack-of-feature
(wouldn't call it a bug), and I have no idea if this is badly needed
(by anybody, other than myself)

2. if of general interest, how can I get set up with SVN credentials
for R-packages?

Thanks!
-Ezra
#
On 29/07/2022 4:52 p.m., Ezra Tucker wrote:
Roger addressed your first question.  I'll give some information about 
the second one.

You should automatically have read permission on the R-packages 
repository, as with most R svn repositories.

I think to get write permission, you'd need to be invited to join R 
Core, or to be a maintainer of the package.

The more common way to have changes accepted is to post them to the 
bugs.r-project.org web site.  See https://www.r-project.org/bugs.html 
for more details on how to get an account set up there so you can post 
things.

Duncan Murdoch
#
Thank you all for your thoughts, I'll definitely submit my bug report
and patch there.

With respect to the data themselves and the data formats, in my
research I came across this document:
http://www.manmrk.net/tutorials/database/xbase/data_types.html
as a helpful reference

Forgot to mention, attempted all this on
R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Fedora Linux 36 (Workstation Edition)

and also on a Windows 10 machine running R version 4.2.1 with UCRT.

To Andre's and Roger's points, great minds- I did exactly the same
things, opening the dbf files in LibreOffice (works perfectly, so does
the soffice command line tool), also tried converting to csv using
ogr2ogr, and got EXACTLY the same problem as I'm seeing in R.

dbfopen.c says it's derived from Shapelib, documentation here:
http://shapelib.maptools.org/dbf_api.html
stating that DBFGetNativeFieldType() has support for C, D, F, N, L and
M data types, and if ogr2ogr uses the same source code from shapelib,
makes sense to me why it would interpret these values the same way.

I'll note really quickly that if this were a simple matter of
converting the ASCII, this wouldn't be an issue but certain characters
(mostly the control characters, backspace, delete, and a few others)
prevented me from accurately reconstructing all the original data once
they were loaded into R.

-Ezra
On Sat, 2022-07-30 at 08:17 -0400, Duncan Murdoch wrote: