Skip to content
Prev 132165 / 398503 Next

Importing Large Dataset from Excel

On Wed, 12-Dec-2007 at 11:35AM +0100, Peter Dalgaard wrote:

        
|> Philippe Grosjean wrote:
|> > The problem is often a misspecification of the comment.char argument. 
|> > For read.table(), it defaults to '#'. This means that everywhere you 
|> > have a '#' char in your Excel sheet, the rest of the line is ignored. 
|> > This results in a different number of items per line.
|> >
|> > You should better use read.csv() which provides better default arguments 
|> > for your particular problem.
|> > Best,
|> >
|> >   
|> Or read.delim/read.delim2, which should be even better at TAB-separated
|> files.
|> 
|> In general, be very suspicious of read.table() with such files, not only
|> because of the '#' but also because it expects columns separated by
|> _arbitrary_ amounts of whitespace. I.e., n TABs  counts as one, so empty
|> fields are skipped over.

I don't recall that happening with TABs, but a problem can arise when
the last (rightmost) column has more than a few empty cells.
Occasionally, I've had to resort to adding a dummy column on the
right, but as Peter suggests, read.delim is usually less involved.