The behaviour of read.csv().
On 6/12/2010, at 3:00 AM, Duncan Murdoch wrote:
I was going to suggest using DIF rather than CSV. It contains more internal information about the file (including the type of each entry), but has the disadvantage of being less readable, even though it is ascii.
I don't think DIF is really the answer. My colleagues are familiar with the *.csv concept; they have never heard of ``DIF''. As I have said, we have had but few problems using *.csv. Better the devil you know ... Furthermore I have to deal with data provided by various sources ``external'' to the research project that I work for. I have to use the data that these sources provide, in the format in which they provide it. If they give me *.csv files I count myself lucky. Finally, there seems to be no ``write.DIF'' function, i.e. there is no way to produce *.DIF output, as far as I can tell. Hence it would not seem practical to use *.DIF as a data exchange standard.
However, in putting together a little demo, I found a couple of bugs in the R implementation of read.DIF, and it looks as though it ignores the internal type information. Sigh.
As of r53778, the bugs I noticed should be fixed. read.DIF now respects the internal type information, so it will keep character strings like "001" as type character (unless you ask it to change the type).
What does ``r53778'' mean? cheers, Rolf