Stricter read.table?
Stavros Macrakis <macrakis <at> alum.mit.edu> writes:
read.table gives idiosyncratic results when the input is formatted strangely, for example: read.table(textConnection(
"a'b\nc'd\n"),header=FALSE, fill=TRUE,sep="",quote="'")
=> "c'd" "a'b" "c'd" read.table(textConnection(
"a'b\nc'd\nf'\n'\n"), header=FALSE,fill=TRUE sep="",quote="'")
=> "f'" "\na" "b" "c'd" "f'" "\n" Though read.table doesn't specify the syntax of its input precisely, these results don't seem particularly useful or consistent. Is there a stricter version of read.table (perhaps in a package) that gives errors or warnings if it finds quotation marks in the middle of fields or encounters other such peculiar situations?
I dissected this behavior a bit more here <https://stat.ethz.ch/pipermail/r-devel/2010-November/059016.html> (it is due to an inconsistency between the way that scan() and readLines() handle lines with unterminated quotes, IIRC) and Martin Maechler said <https://stat.ethz.ch/pipermail/r-devel/2010-November/059107.html> "I think it can be defended to file as a bug, but it is tricky to pinpoint exactly what the issue is." I don't know of a stricter version of read.table(), but if you had the time and inclination to pick through the code and (i) provide a careful definition of desired behavior and (ii) supply patches, you could do your little bit to make R better. (If I posted a bug report would you annotate it with a discussion of desired behavior?)