read.table() with quoted integers
On Sep 30, 2013, at 6:38 AM, Joshua Ulrich wrote:
On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat <nalimilan at club.fr> wrote:
Hi! It seems that read.table() in R 3.0.1 (Linux 64-bit) does not consider quoted integers as an acceptable value for columns for which colClasses="integer". But when colClasses is omitted, these columns are read as integer anyway. For example, let's consider a file named file.dat, containing: "1" "2"
read.table("file.dat", colClasses="integer")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'an integer' and got '"1"' But:
str(read.table("file.dat"))
'data.frame': 2 obs. of 1 variable:
$ V1: int 1 2
The latter result is indeed documented in ?read.table:
Unless ?colClasses? is specified, all columns are read as
character columns and then converted using ?type.convert? to
logical, integer, numeric, complex or (depending on ?as.is?)
factor as appropriate. Quotes are (by default) interpreted in all
fields, so a column of values like ?"42"? will result in an
integer column.
Should the former behavior be considered a bug?
No. If you tell read.table the column is integer and it's actually character on disk, it should be an error.
My reading of the `read.table` help page is that one should expect that when there is an 'integer'-class and an `as.integer` function and "integer" is the argument to colClasses, that `as.integer` will be applied to the values in the column. Should I be reading elsewhere?
David. > >> This creates problems when combined with read.table.ffdf from package >> ff, since this function tries to guess the column classes by reading the >> first rows of the file, and then passes colClasses to read.table to read >> the remaining rows by chunks. A column of quoted integers is correctly >> detected as integer in the first read, but read.table() fails in >> subsequent reads. >> > This sounds like a issue with read.table.ffdf. The column of quoted > integers is *incorrectly* detected as integer because they're actually > character on disk. read.table.ffdf should rely on how the data are > actually stored on disk (via as.is=TRUE), not how read.table might > convert them once they're read into R. > >> >> Regards >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > -- > Joshua Ulrich | about.me/joshuaulrich > FOSS Trading | www.fosstrading.com > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel David Winsemius Alameda, CA, USA