Importing Large Dataset from Excel
On Wed, 12-Dec-2007 at 11:35AM +0100, Peter Dalgaard wrote:
|> Philippe Grosjean wrote:
|> > The problem is often a misspecification of the comment.char argument. |> > For read.table(), it defaults to '#'. This means that everywhere you |> > have a '#' char in your Excel sheet, the rest of the line is ignored. |> > This results in a different number of items per line. |> > |> > You should better use read.csv() which provides better default arguments |> > for your particular problem. |> > Best, |> > |> > |> Or read.delim/read.delim2, which should be even better at TAB-separated |> files. |> |> In general, be very suspicious of read.table() with such files, not only |> because of the '#' but also because it expects columns separated by |> _arbitrary_ amounts of whitespace. I.e., n TABs counts as one, so empty |> fields are skipped over. I don't recall that happening with TABs, but a problem can arise when the last (rightmost) column has more than a few empty cells. Occasionally, I've had to resort to adding a dummy column on the right, but as Peter suggests, read.delim is usually less involved.
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.
___ Patrick Connolly
{~._.~} Great minds discuss ideas
_( Y )_ Middle minds discuss events
(:_~*~_:) Small minds discuss people
(_)-(_) ..... Anon
~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.