What else do tab-delimited and csv do differently? - R-help

Tue, Nov 5, 2002 8:56 AM #

On Wed, Nov 06, 2002 at 02:51:20PM +1300, Patrick Connolly wrote:

...

Weird.

I've had some trouble with Excel before, adding a bunch of delimiters to the
end of rows, for reasons only Excel understands.

On the Linux box, try
#for the tab separated file
awk -F"\t" '{print NF}' myfile.txt | sort -u

#for the csv file
awk -F"," '{print NF}' myfile.csv | sort -u

Cheers

Jason

Indigo Industrial Controls Ltd.
64-21-343-545
jasont at indigoindustrial.co.nz
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Patrick Connolly

Tue, Nov 5, 2002 5:51 PM #

Until recently, I naively believed that a csv was the same as a
tab-delimited file except commas replaced tab characters (provided
of course, the characters being separated weren't either).

Evidently, that is not the case.  I tried reading a tab-delimited file
(created in Excel [2000, W2K] and ftp'd as ASCII to linux) into R 

platform i686-pc-linux-gnu
arch     i686             
os       linux-gnu        
system   i686, linux-gnu  
status                    
major    1                
minor    6.1              
year     2002             
month    11               
day      01               
language R   

I've done this sort of thing for years, but probably not with 50
thousand rows, though I've used much bigger text files before.  R just
hung and I had to kill the process.

Saving the file as a csv file instead was much more successful.  In 16
or 17 seconds, my dataframe had been produced with no apparant effort.

Is there a simple explanation?

Patrick Connolly
HortResearch
Mt Albert
Auckland
New Zealand 
Ph: +64-9 815 4200 x 7188
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~
I have the world`s largest collection of seashells. I keep it on all
the beaches of the world ... Perhaps you`ve seen it.  ---Steven Wright 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~


______________________________________________________
The contents of this e-mail are privileged and/or confidential to the
named recipient and are not to be used by any other person and/or
organisation. If you have received this e-mail in error, please notify 
the sender and delete all material pertaining to this e-mail.
______________________________________________________
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Patrick Connolly

Wed, Nov 6, 2002 11:42 AM #

On Wed, 06-Nov-2002 at 05:56AM +1300, Jason Turner wrote:

|> On Wed, Nov 06, 2002 at 02:51:20PM +1300, Patrick Connolly wrote:

|> > Until recently, I naively believed that a csv was the same as a
|> > tab-delimited file except commas replaced tab characters (provided
|> > of course, the characters being separated weren't either).
|> > 
|> > Evidently, that is not the case.  I tried reading a tab-delimited file
|> > (created in Excel [2000, W2K] and ftp'd as ASCII to linux) into R 
|> > 
|> ...
|> > R just
|> > hung and I had to kill the process.
|> > 
|> > Saving the file as a csv file instead was much more successful.  In 16
|> > or 17 seconds, my dataframe had been produced with no apparant effort.
|> 
|> Weird.
|> 
|> I've had some trouble with Excel before, adding a bunch of delimiters to the
|> end of rows, for reasons only Excel understands.

Are you sure that's not because of some stray cells with the odd
invisible (e.g. space) character outside the region you think you're
using?  It's very easy for that to happen, so I always paste the
region I want into another sheet and use that one.


|> 
|> On the Linux box, try
|> #for the tab separated file
|> awk -F"\t" '{print NF}' myfile.txt | sort -u
|> 
|> #for the csv file
|> awk -F"," '{print NF}' myfile.csv | sort -u

That does not report anything obviously in error.  Using
count.fields() does the same.

It's most likely an undocumented feature in Excel, which nobody will
ever get to the bottom of.  I'm not very surprised.

best

Patrick Connolly
HortResearch
Mt Albert
Auckland
New Zealand 
Ph: +64-9 815 4200 x 7188
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~
I have the world`s largest collection of seashells. I keep it on all
the beaches of the world ... Perhaps you`ve seen it.  ---Steven Wright 
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~


______________________________________________________
The contents of this e-mail are privileged and/or confidential to the
named recipient and are not to be used by any other person and/or
organisation. If you have received this e-mail in error, please notify 
the sender and delete all material pertaining to this e-mail.
______________________________________________________
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._