On Mon, Sep 30, 2013 at 9:45 AM, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
Le lundi 30 septembre 2013 ? 08:38 -0500, Joshua Ulrich a ?crit :
On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
Hi!
It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider
quoted integers as an acceptable value for columns for which
colClasses="integer". But when colClasses is omitted, these columns are
read as integer anyway.
For example, let's consider a file named file.dat, containing:
"1"
"2"
read.table("file.dat", colClasses="integer")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
scan() expected 'an integer' and got '"1"'
But:
str(read.table("file.dat"))
'data.frame': 2 obs. of 1 variable:
$ V1: int 1 2
The latter result is indeed documented in ?read.table:
Unless ?colClasses? is specified, all columns are read as
character columns and then converted using ?type.convert? to
logical, integer, numeric, complex or (depending on ?as.is?)
factor as appropriate. Quotes are (by default) interpreted in all
fields, so a column of values like ?"42"? will result in an
integer column.
Should the former behavior be considered a bug?
No. If you tell read.table the column is integer and it's actually
character on disk, it should be an error.
All values in a CSV file are stored as characters on disk, disregarding
the fact that they are surrounded by quotes or not. 1 is saved as
00110001 (ASCII character #49), not 00000001, nor 00000000 00000000
00000000 00000001 (as would for example imply a 32 bit storage of
integers).
Yes, I'm aware that write.table creates a character representation of
the data on disk. That's its purpose. writeBin is for writing actual
binary representations. I thought you would understand that by
"actually character on disk" I meant "actually a quoted value". I
assumed you would understand my intent.
read.table uses scan to read the file. ?scan says:
The allowed input for a numeric field is optional whitespace
followed either ?NA? or an optional sign followed by a decimal or
hexadecimal constant (see NumericConstants), or ?NaN?, ?Inf? or
?infinity? (ignoring case). Out-of-range values are recorded as
?Inf?, ?-Inf? or ?0?.
For an integer field the allowed input is optional whitespace,
followed by either ?NA? or an optional sign and one or more digits
(?0-9?): all out-of-range values are converted to ?NA_integer_?.
There's no mention of quotes being allowed.
So, with all due respect, please refrain from formulating such blatantly
erroneous statements.
So, with all due respect, please refrain from formulating such
blatantly pedantic responses to someone trying to help you.