Summary: Unexpected result of read.dbf
It really isn't clear that this is correct. The reason is correct:
read.dbf treats numeric files with no decimals as integers, and that _is_
as stated on the help page. So it is definitely not a `bug', and reading
the help would have shown the reason for the original question.
[I in general do not reply to questions that can be answered from the help
page.]
I believe this field has been incorrectly coded as numeric, as it seems to
be a factor ('keycode'). In particular, 19 is not a valid field size for
a numeric field.
If one wants to allow this, I think we have to use double for a field in
which any value is not representable as an integer, and not just if the
field size exceeds 9. I have been working on implementing that.
On Fri, 19 Aug 2005, Susumu Tanimura wrote:
Hi there, This is summary and patch for a bug in read.dbf, demonstrating in Message-Id: <20050818150446.697835cb.stanimura-ngs at umin.ac.jp>. After consulting Rjpwiki, a cyber-community of R user in Japan, the cause was found, and the patch of solution was proposed. Overflowing occurs when we use read.dbf for reading a dbf file having a field of longer signed integer. For example, $ dbf2txt test.dbf #KEYCODE 422010010 42201002101 42201002102 42201002103 42201002104 422010060 422010071 422010072 42201008001 42201008002 The KEYCODE field is numeric type, 19 digits, and no decimal. You can create this file with OpenOffice.org Calc, txt2dbf, and so on. You also prepare a file of CSV format.
library(foreign)
cbind(read.csv("test.csv"),read.dbf("test.dbf"))
KEYCODE KEYCODE
1 422010010 422010010
2 42201002101 NA
3 42201002102 NA
4 42201002103 NA
5 42201002104 NA
6 422010060 422010060
7 422010071 422010071
8 422010072 422010072
9 42201008001 NA
10 42201008002 NA
This is not reproducible when the field has decimals like numeric
type, 19 digits, and 5 decimals.
The patch written of Mr. Eiji Nakama is followed.
--- foreign.orig/src/dbfopen.c 2005-08-19 18:54:06.000000000 +0900
+++ foreign/src/dbfopen.c 2005-08-19 18:58:06.000000000 +0900
@@ -970,7 +970,8 @@
|| psDBF->pachFieldType[iField] == 'F' )
/* || psDBF->pachFieldType[iField] == 'D' ) D is Date */
{
- if( psDBF->panFieldDecimals[iField] > 0 )
+ if( psDBF->panFieldDecimals[iField] > 0 ||
+ psDBF->panFieldSize[iField] > 9 )
return( FTDouble );
else
return( FTInteger );
After adopting the patch, read.dbf works correctly.
cbind(read.csv("test.csv"),read.dbf("test.dbf"))
KEYCODE KEYCODE 1 422010010 422010010 2 42201002101 42201002101 3 42201002102 42201002102 4 42201002103 42201002103 5 42201002104 42201002104 6 422010060 422010060 7 422010071 422010071 8 422010072 422010072 9 42201008001 42201008001 10 42201008002 42201008002 -- Susumu Tanimura
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595