Novice question about getting data into R

I found it easy to use R when typing data manually into it.  Now I need to
read data from a file, and I get the following errors:
refdata =
read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv", header
= TRUE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 1 did not have 42 elements
refdata =
read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 2 did not have 42 elements

(I'd tried the first version above because the first record has column
names.)

First, I don't know why R expects 42 elements in a record.  
There is one column for a time variable (weeks since a given week of samples
were taken) and one for each week of sampling in the data file (Week 18
through Week 37 inclusive).  And there is only 19 rows.
The samples represented by the columns are independant, and the numbers in
the columns are the fraction of events sampled that result in an event of
another kind in the week since the sample was taken.

The samples are not the same size, and starting with week 20, the number of
values progressively gets smaller since there have been fewer than 37  weeks
since the samples were taken.

I can show you the contents of the data file if you wish.  It is
unremarkable, csv, with strings used for column names enclosed in double
quotes.

I don't have to manually separate the samples into their own files do I?  I
was hoping to write a function that estimates the density function that best
fits each sample individually, and then iterate of the columns, applying
that function to each in turn.

What is the best way to handle this?

Thanks

Ted
View this message in context: http://www.nabble.com/Novice-question-about-getting-data-into-R-tp19576065p19576065.html
Sent from the R help mailing list archive at Nabble.com.
I found it easy to use R when typing data manually into it.  Now I need to
read data from a file, and I get the following errors:

refdata =
read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv", header
= TRUE)
If your file is really a comma separated file, use read.csv, not 
read.table (which defaults to white space separators).

Duncan Murdoch
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 1 did not have 42 elements
refdata =
read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 2 did not have 42 elements

(I'd tried the first version above because the first record has column
names.)

First, I don't know why R expects 42 elements in a record.  
There is one column for a time variable (weeks since a given week of samples
were taken) and one for each week of sampling in the data file (Week 18
through Week 37 inclusive).  And there is only 19 rows.
The samples represented by the columns are independant, and the numbers in
the columns are the fraction of events sampled that result in an event of
another kind in the week since the sample was taken.

The samples are not the same size, and starting with week 20, the number of
values progressively gets smaller since there have been fewer than 37  weeks
since the samples were taken.

I can show you the contents of the data file if you wish.  It is
unremarkable, csv, with strings used for column names enclosed in double
quotes.

I don't have to manually separate the samples into their own files do I?  I
was hoping to write a function that estimates the density function that best
fits each sample individually, and then iterate of the columns, applying
that function to each in turn.

What is the best way to handle this?

Thanks

Ted

I found it easy to use R when typing data manually into it.  Now I need to
read data from a file, and I get the following errors:

refdata =
read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv", header
= TRUE)

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 1 did not have 42 elements

refdata =
read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv")

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 2 did not have 42 elements

(I'd tried the first version above because the first record has column
names.)

First, I don't know why R expects 42 elements in a record.  

Hard to tell. One guess is that you have 42 header names. Spaces inside
any of them? Is this really a CSV file? (As in Comma Separated Values).
If so, you at least need to set the sep= argument, but how about
read.csv()? or if TAB separated, read.delim().
There is one column for a time variable (weeks since a given week of samples
were taken) and one for each week of sampling in the data file (Week 18
through Week 37 inclusive).  And there is only 19 rows.
The samples represented by the columns are independant, and the numbers in
the columns are the fraction of events sampled that result in an event of
another kind in the week since the sample was taken.

The samples are not the same size, and starting with week 20, the number of
values progressively gets smaller since there have been fewer than 37  weeks
since the samples were taken.

I can show you the contents of the data file if you wish.  It is
unremarkable, csv, with strings used for column names enclosed in double
quotes.

You might well have to. One man's "unremarkable" can be remarkably
different from others'...
I don't have to manually separate the samples into their own files do I?  I
was hoping to write a function that estimates the density function that best
fits each sample individually, and then iterate of the columns, applying
that function to each in turn.

What is the best way to handle this?

Thanks

Ted

O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
refdata =
read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv", header
= TRUE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 1 did not have 42 elements
refdata =
read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 2 did not have 42 elements

R interprets that you have 42 columns from the variable names. Do you? See
if removing spaces between column names helps (e.g., "week.1" instead of
"week 1").  Also, because yours is a csv file, fields are separated by
comas.  You can either use the "read.csv" command instead of the
"read.table" (see ?read.table for details), or add the argument sep="," to
tell R that fields are separated by comas.  You might also need to specify,
if you have empty cells, what to do with them (e.g., na.strings="")
View this message in context: http://www.nabble.com/Novice-question-about-getting-data-into-R-tp19576065p19576350.html
Sent from the R help mailing list archive at Nabble.com.
Try read.csv("K:\\MerchantData\\RiskModel\\refund_distribution.csv",header = TRUE)

From: Ted Byers <r.ted.byers at gmail.com>
Subject: [R]  Novice question about getting data into R
To: r-help at r-project.org
Received: Friday, September 19, 2008, 1:01 PM
I found it easy to use R when typing data manually into it. 
Now I need to
read data from a file, and I get the following errors:

refdata =

read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv",
header
= TRUE)
Error in scan(file, what, nmax, sep, dec, quote, skip,
nlines, na.strings, 
: 
  line 1 did not have 42 elements
refdata =

read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv")
Error in scan(file, what, nmax, sep, dec, quote, skip,
nlines, na.strings, 
: 
  line 2 did not have 42 elements

(I'd tried the first version above because the first
record has column
names.)

First, I don't know why R expects 42 elements in a
record.  
There is one column for a time variable (weeks since a
given week of samples
were taken) and one for each week of sampling in the data
file (Week 18
through Week 37 inclusive).  And there is only 19 rows.
The samples represented by the columns are independant, and
the numbers in
the columns are the fraction of events sampled that result
in an event of
another kind in the week since the sample was taken.

The samples are not the same size, and starting with week
20, the number of
values progressively gets smaller since there have been
fewer than 37  weeks
since the samples were taken.

I can show you the contents of the data file if you wish. 
It is
unremarkable, csv, with strings used for column names
enclosed in double
quotes.

I don't have to manually separate the samples into
their own files do I?  I
was hoping to write a function that estimates the density
function that best
fits each sample individually, and then iterate of the
columns, applying
that function to each in turn.

What is the best way to handle this?

Thanks

Ted

-- 
View this message in context:
http://www.nabble.com/Novice-question-about-getting-data-into-R-tp19576065p19576065.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,
reproducible code.

refdata =
read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv", header
= TRUE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 1 did not have 42 elements
refdata =
read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, 
: 
  line 2 did not have 42 elements
R interprets that you have 42 columns from the variable names. Do you? See
if removing spaces between column names helps (e.g., "week.1" instead of
"week 1").  Also, because yours is a csv file, fields are separated by
comas.  You can either use the "read.csv" command instead of the
"read.table" (see ?read.table for details), or add the argument sep="," to
tell R that fields are separated by comas.  You might also need to specify,
if you have empty cells, what to do with them (e.g., na.strings="")
You are of course right about the NA's (missing values, empty cells) as 
well as the possible blanks in the column names.  It might nevertheless 
be a good idea for him to at least submit a few of the lines at the top 
of the file.  A .csv file as generated by Excel on Windows is not 
necessarily comma-separated.  That depends on the "list separator" 
setting under "Regional Language Settings" found in the Control Panel. 
On my machine, the list separator is a semicolon for a .csv file.  The 
reason is simple, in Norway, the standard decimal separator is a comma, 
and you do not want to confuse the system too much.  So, that particular 
point is dependent on the settisngs for his locale (language, country).

Tom

+----------------------------------------------------------------+
| Tom Backer Johnsen, Psychometrics Unit,  Faculty of Psychology |
| University of Bergen, Christies gt. 12, N-5015 Bergen,  NORWAY |
| Tel : +47-5558-9185                        Fax : +47-5558-9879 |
| Email : backer at psych.uib.no    URL : http://www.galton.uib.no/ |
+----------------------------------------------------------------+
Thanks one and all.

Actually, I used OpenOffice's spreadsheet to creat the csv file, but I have
been using it long enough to know to specify how I wanted it, and sometimes,
when that proves annoying, I'll use Perl to finess it the way I want it.

It seems my principle error was to assume that it would ignore the character
strings within the double quotes and determine fields based on the commas. 
Silvia's remarks about empty cells and blanks in the middle of column names
were right on the mark.

Tom, I appreciate the caveats you mention.  I am aware of the complications
of i18n, but they don't affect me much as my stuff is run exclusively in
Canada (pretty much the same norms as the US).  They don't affect me (in a
sense because I have manipuated data around such issues using perl in order
to satisfy the peculiarities of the software used on one project or another
- I deal with it almost as a matter of course, as long as I already know the
peculiarities of the software I am working with), and I have plenty of
experience moving data between spreadsheets, RDBMS such as MS SQL,
PostgreSQl, MySQL, and XML files, and have had to resort to unusual
delimiters in the past because of peculiarities in the data feed.  While I
have tonnes of experience developing software (C++, Java, FORTRAN, perl) I
only started playing with R a few months ago, and this is the first I have
had to import real data into it.  While the tutorials I found were useful,
it seems there are key tidbits of information I need scattered through the
documentation and I am finding it challenging to find the peculiarities of
R.

Thanks again one and all.

Ted
Silvia Lomascolo wrote:

refdata =
read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv",
header
= TRUE)
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
na.strings, 
: 
  line 1 did not have 42 elements
refdata =
read.table("K:\\MerchantData\\RiskModel\\refund_distribution.csv")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
na.strings, 
: 
  line 2 did not have 42 elements
R interprets that you have 42 columns from the variable names. Do you?
See
if removing spaces between column names helps (e.g., "week.1" instead of
"week 1").  Also, because yours is a csv file, fields are separated by
comas.  You can either use the "read.csv" command instead of the
"read.table" (see ?read.table for details), or add the argument sep=","
to
tell R that fields are separated by comas.  You might also need to
specify,
if you have empty cells, what to do with them (e.g., na.strings="")
You are of course right about the NA's (missing values, empty cells) as 
well as the possible blanks in the column names.  It might nevertheless 
be a good idea for him to at least submit a few of the lines at the top 
of the file.  A .csv file as generated by Excel on Windows is not 
necessarily comma-separated.  That depends on the "list separator" 
setting under "Regional Language Settings" found in the Control Panel. 
On my machine, the list separator is a semicolon for a .csv file.  The 
reason is simple, in Norway, the standard decimal separator is a comma, 
and you do not want to confuse the system too much.  So, that particular 
point is dependent on the settisngs for his locale (language, country).

Tom

-- 
+----------------------------------------------------------------+
| Tom Backer Johnsen, Psychometrics Unit,  Faculty of Psychology |
| University of Bergen, Christies gt. 12, N-5015 Bergen,  NORWAY |
| Tel : +47-5558-9185                        Fax : +47-5558-9879 |
| Email : backer at psych.uib.no    URL : http://www.galton.uib.no/ |
+----------------------------------------------------------------+

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

View this message in context: http://www.nabble.com/Novice-question-about-getting-data-into-R-tp19576065p19577763.html
Sent from the R help mailing list archive at Nabble.com.