read.table() with quoted integers
On 13-10-04 7:31 AM, Joshua Ulrich wrote:
On Tue, Oct 1, 2013 at 11:29 AM, David Winsemius <dwinsemius at comcast.net> wrote:
On Sep 30, 2013, at 6:38 AM, Joshua Ulrich wrote:
On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
Hi! It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider quoted integers as an acceptable value for columns for which colClasses="integer". But when colClasses is omitted, these columns are read as integer anyway. For example, let's consider a file named file.dat, containing: "1" "2"
read.table("file.dat", colClasses="integer")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'an integer' and got '"1"' But:
str(read.table("file.dat"))
'data.frame': 2 obs. of 1 variable:
$ V1: int 1 2
The latter result is indeed documented in ?read.table:
Unless ?colClasses? is specified, all columns are read as
character columns and then converted using ?type.convert? to
logical, integer, numeric, complex or (depending on ?as.is?)
factor as appropriate. Quotes are (by default) interpreted in all
fields, so a column of values like ?"42"? will result in an
integer column.
Should the former behavior be considered a bug?
No. If you tell read.table the column is integer and it's actually character on disk, it should be an error.
My reading of the `read.table` help page is that one should expect that when there is an 'integer'-class and an `as.integer` function and "integer" is the argument to colClasses, that `as.integer` will be applied to the values in the column. Should I be reading elsewhere?
I assume you're referring to the paragraph below. Possible values are ?NA? (the default, when ?type.convert? is used), ?"NULL"? (when the column is skipped), one of the atomic vector classes (logical, integer, numeric, complex, character, raw), or ?"factor"?, ?"Date"? or ?"POSIXct"?. Otherwise there needs to be an ?as? method (from package ?methods?) for conversion from ?"character"? to the specified formal class. I read that as meaning that an "as" method is required for classes not already listed in the prior sentence. It doesn't say an "as" method will be applied if colClasses is one of the atomic, factor, Date, or POSIXct classes; but I can see how you might assume that, since all the atomic, factor, Date, and POSIXct classes already have "as" methods...
And this does suggest a workaround for ffdf: instead of declaring the class to be "integer", declare a class "ffdf_integer", and write a conversion method. Or simply read everything as character and call as.integer() explicitly. Duncan Murdoch