Skip to content

What else do tab-delimited and csv do differently?

3 messages · Jason Turner, Patrick Connolly

#
On Wed, Nov 06, 2002 at 02:51:20PM +1300, Patrick Connolly wrote:
...
Weird.

I've had some trouble with Excel before, adding a bunch of delimiters to the
end of rows, for reasons only Excel understands.

On the Linux box, try
#for the tab separated file
awk -F"\t" '{print NF}' myfile.txt | sort -u

#for the csv file
awk -F"," '{print NF}' myfile.csv | sort -u

Cheers

Jason
#
Until recently, I naively believed that a csv was the same as a
tab-delimited file except commas replaced tab characters (provided
of course, the characters being separated weren't either).

Evidently, that is not the case.  I tried reading a tab-delimited file
(created in Excel [2000, W2K] and ftp'd as ASCII to linux) into R 

platform i686-pc-linux-gnu
arch     i686             
os       linux-gnu        
system   i686, linux-gnu  
status                    
major    1                
minor    6.1              
year     2002             
month    11               
day      01               
language R   

I've done this sort of thing for years, but probably not with 50
thousand rows, though I've used much bigger text files before.  R just
hung and I had to kill the process.

Saving the file as a csv file instead was much more successful.  In 16
or 17 seconds, my dataframe had been produced with no apparant effort.

Is there a simple explanation?
#
On Wed, 06-Nov-2002 at 05:56AM +1300, Jason Turner wrote:

        
|> On Wed, Nov 06, 2002 at 02:51:20PM +1300, Patrick Connolly wrote:
|> > Until recently, I naively believed that a csv was the same as a
|> > tab-delimited file except commas replaced tab characters (provided
|> > of course, the characters being separated weren't either).
|> > 
|> > Evidently, that is not the case.  I tried reading a tab-delimited file
|> > (created in Excel [2000, W2K] and ftp'd as ASCII to linux) into R 
|> > 
|> ...
|> > R just
|> > hung and I had to kill the process.
|> > 
|> > Saving the file as a csv file instead was much more successful.  In 16
|> > or 17 seconds, my dataframe had been produced with no apparant effort.
|> 
|> Weird.
|> 
|> I've had some trouble with Excel before, adding a bunch of delimiters to the
|> end of rows, for reasons only Excel understands.

Are you sure that's not because of some stray cells with the odd
invisible (e.g. space) character outside the region you think you're
using?  It's very easy for that to happen, so I always paste the
region I want into another sheet and use that one.


|> 
|> On the Linux box, try
|> #for the tab separated file
|> awk -F"\t" '{print NF}' myfile.txt | sort -u
|> 
|> #for the csv file
|> awk -F"," '{print NF}' myfile.csv | sort -u

That does not report anything obviously in error.  Using
count.fields() does the same.

It's most likely an undocumented feature in Excel, which nobody will
ever get to the bottom of.  I'm not very surprised.

best