Unexpected behaviour of write.csv - read.csv
On Thu, Jan 13, 2011 at 1:06 PM, Prof Brian Ripley
<ripley at stats.ox.ac.uk> wrote:
On Thu, 13 Jan 2011, Duncan Murdoch wrote:
On 11-01-13 6:26 AM, Rainer M Krug wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Assuming the following:
x<- data.frame(a=1:10, b=runif(10)) str(x)
'data.frame': ? 10 obs. of ?2 variables: ?$ a: int ?1 2 3 4 5 6 7 8 9 10 ?$ b: num ?0.692 0.325 0.634 0.16 0.873 ...
write.csv(x, "x.csv")
x2<- read.csv("x.csv")
str(x2)
'data.frame': ? 10 obs. of ?3 variables: ?$ X: int ?1 2 3 4 5 6 7 8 9 10 ?$ a: int ?1 2 3 4 5 6 7 8 9 10 ?$ b: num ?0.692 0.325 0.634 0.16 0.873 ...
Using the two functions write.csv and read.csv, I would assume, that the resulting data.frame x2 be identical with x, but it has an additional column X, which contains the row names of x. I know read.table and write.table which work as expected, but I would like to use a csv for data exchange reasons. I know that I can use write.csv(x, "x.csv", row.names=FALSE) and it would work, but shouldn't that be the default behaviour?
I don't think so. ?The CSV format is an export format which holds less information than a dataframe. ?By exporting the dataframe to CSV and importing the result, you are discarding information and you should expect to get something different.
You need to read it with read.csv("x.csv", row.names=1)
Nothing in the csv format lets R know that the first column is the row names
(in the format used by read.table, having a header that is one column short
does). ?Now R could guess that a .csv file with an empty string for the
first column name is meant to be the row names, but that would be merely a
guess based on one (barely documented for spreadsheets) convention.
read.csv / read.table already use heuristics to determine the column types so adding this to the heuristic seems not to be a departure from the established philosophy.
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com