Skip to content
Back to formatted view

Raw Message

Message-ID: <524EACBF.20107@gmail.com>
Date: 2013-10-04T11:55:43Z
From: Duncan Murdoch
Subject: read.table() with quoted integers
In-Reply-To: <CAPPM_gS1xxSXSnCA0W-+zh3EwoJ89BcxFXyxvffXzujqV7Qi5w@mail.gmail.com>

On 13-10-04 7:31 AM, Joshua Ulrich wrote:
> On Tue, Oct 1, 2013 at 11:29 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>>
>> On Sep 30, 2013, at 6:38 AM, Joshua Ulrich wrote:
>>
>>> On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
>>>> Hi!
>>>>
>>>>
>>>> It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider
>>>> quoted integers as an acceptable value for columns for which
>>>> colClasses="integer". But when colClasses is omitted, these columns are
>>>> read as integer anyway.
>>>>
>>>> For example, let's consider a file named file.dat, containing:
>>>> "1"
>>>> "2"
>>>>
>>>>> read.table("file.dat", colClasses="integer")
>>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
>>>>   scan() expected 'an integer' and got '"1"'
>>>>
>>>> But:
>>>>> str(read.table("file.dat"))
>>>> 'data.frame':   2 obs. of  1 variable:
>>>> $ V1: int  1 2
>>>>
>>>> The latter result is indeed documented in ?read.table:
>>>>      Unless ?colClasses? is specified, all columns are read as
>>>>      character columns and then converted using ?type.convert? to
>>>>      logical, integer, numeric, complex or (depending on ?as.is?)
>>>>      factor as appropriate.  Quotes are (by default) interpreted in all
>>>>      fields, so a column of values like ?"42"? will result in an
>>>>      integer column.
>>>>
>>>>
>>>> Should the former behavior be considered a bug?
>>>>
>>> No. If you tell read.table the column is integer and it's actually
>>> character on disk, it should be an error.
>>
>> My reading of the `read.table` help page is that one should expect that when
>> there is an 'integer'-class and an  `as.integer` function and  "integer" is the
>> argument to colClasses, that `as.integer` will be applied to the values in the
>> column. Should I be reading elsewhere?
>>
> I assume you're referring to the paragraph below.
>
>    Possible values are ?NA? (the default, when ?type.convert? is
>    used), ?"NULL"? (when the column is skipped), one of the
>    atomic vector classes (logical, integer, numeric, complex,
>    character, raw), or ?"factor"?, ?"Date"? or ?"POSIXct"?.
>    Otherwise there needs to be an ?as? method (from package
>    ?methods?) for conversion from ?"character"? to the specified
>    formal class.
>
> I read that as meaning that an "as" method is required for classes not
> already listed in the prior sentence.  It doesn't say an "as" method
> will be applied if colClasses is one of the atomic, factor, Date, or
> POSIXct classes; but I can see how you might assume that, since all
> the atomic, factor, Date, and POSIXct classes already have "as"
> methods...

And this does suggest a workaround for ffdf:  instead of declaring the 
class to be "integer", declare a class "ffdf_integer", and write a 
conversion method.  Or simply read everything as character and call 
as.integer() explicitly.

Duncan Murdoch