Skip to content

help with read.table() function

4 messages · Gabor Grothendieck, Duncan Murdoch

#
On 1/29/2006 1:24 PM, Gabor Grothendieck wrote:
> Normally one expects stdin to be the default on command line
 > programs and something like file.choose to be the default on GUI
 > programs and this would break that expectation.

We don't currently meet that expectation, so I don't think it would make 
things any worse.  As I mentioned to Brian, I wouldn't change the 
default for scan() (which is stdin everywhere).  I haven't done a 
complete survey yet, but after looking at a few, I think the rules I 
would use are these:

  - the function should use the filename argument to find an existing file
  - it should not already have a default
  - it should be something that would commonly be used interactively

Ones I would change which currently give an error with no filename:

read.table() and friends
dget()
read.dcf()
source()
read.ftable()
tkpager()
md5sum()
Rd_parse()

Ones I probably wouldn't touch:

unz()
file.create(), etc.
file() gives a temporary file for writing
dput(), write.dcf() write to the console
dev2bitmap(), bitmap()
file.show() - which might be called with an empty file list, which we 
should treat as a no-op

Ones I'm not sure about right now, because they're relatively obscure:

sys.source()
shell.exec()

Duncan Murdoch
 >
 > If there were a GUI version of read.table then that would reasonbly
 > have file.choose as the default.
 >
> On 1/29/06, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>> On 1/29/2006 11:28 AM, oliver wee wrote:
>>> hi,
 >>>
 >>> Sorry again to bother you, but I got the file.choose()
 >>> to work. Thanks for the help there.
 >>>
 >>> Unfortunately I encountered a new problem. After I
 >>> selected the data, I got this error message:
 >>>
 >>> Error in scan(file = file, what = what, sep = sep,
 >>> quote = quote, dec = dec,  :
 >>>         line 1 did not have 11 elements
 >>> In addition: Warning message:
 >>> incomplete final line found by readTableHeader on
 >>> 'D:\Oliver\Professional\Studies\Time Series
 >>> Analysis\spdc2693.data.txt'
 >>>
 >>> my time series data looks like this...
 >>>
 >>> ------------
 >>> Standard and Poor's 500 Index closing values from 1926
 >>> to 1993.
 >>>
 >>>   Date       Index
 >>>   260101     12.76
 >>>   260108     12.78
 >>>   260115     12.52
 >>>   260122     12.45
 >>>   260129     12.74
 >>>   260205     12.87
 >>>   260212     12.87
 >>>   260219     12.74
 >>>   260226     12.18
 >>>   260305     11.99
 >>>   260312     12.15
 >>>   260319     11.64
 >>>   260326     11.46
 >>> ...
 >>> (and so on)
 >>> ----------
 >>>
 >>> Should I insert additional attributes besides header =
 >>> TRUE?
 >> Yes, you need to tell it to skip over the lines of the comment at the
 >> start of the file.  That looks like 3 lines (including the blank line),
 >> so add skip=3 to your read.table call.
 >>
 >> Duncan Murdoch
 >>
 >>> thanks.
 >>>
 >>>
>>> --- Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
>>>
>>>> On 1/29/2006 10:26 AM, oliver wee wrote:
>>>>> hello, I have just started using R for doing a
 >>>> project
 >>>>> in time series...
 >>>>>
 >>>>> unfortunately, I am having trouble using the
 >>>>> read.table function for use in reading my data
 >>>> set.
 >>>>> This is what I'm getting:
 >>>>> I inputted:
 >>>>> data <-
 >>>>> read.table("D:/Oliver/Professional/Studies/Time
 >>>> Series
 >>>>> Analysis/spdc2693.data", header = TRUE)
 >>>> Generally it's easier to use the dialogs to specify
 >>>> the filename, e.g.
 >>>>
 >>>> read.table(file.choose(), header=TRUE)
 >>>>
 >>>> Then you shouldn't get the "no such file" message.
 >>>> If you do, you
 >>>> should check whether other programs (e.g. notepad)
 >>>> can open the file.
 >>>> Maybe you don't have read permission?
 >>>>
 >>>> Duncan Murdoch
 >>>>
 >>>>> I got:
 >>>>> Error in file(file, "r") : unable to open
 >>>> connection
 >>>>> In addition: Warning message:
 >>>>> cannot open file
 >>>> 'D:/Oliver/Professional/Studies/Time
 >>>>> Series Analysis/spdc2693.data', reason 'No such
 >>>> file
 >>>>> or directory'
 >>>>>
 >>>>> as I am just a novice programmer, I really would
 >>>>> appreciate help from you guys. Is there a need to
 >>>>> setpath in R, like in java or something like
 >>>> that...
 >>>>> I am using the windows version btw.
 >>>>>
 >>>>> I have also tried to put the file in the work
 >>>>> directory of R, so that I only typed
 >>>>> data <- read.table("spdc2693.data", header = TRUE)
 >>>>> Again, it won't work, with the same error message.
 >>>>>
 >>>>> I would appreciate any help. thanks again.
 >>>>>
 >>>>> ______________________________________________
 >>>>> R-help at stat.math.ethz.ch mailing list
 >>>>> https://stat.ethz.ch/mailman/listinfo/r-help
 >>>>> PLEASE do read the posting guide!
 >>>> http://www.R-project.org/posting-guide.html
 >>>>
 >>>>
 >>>
 >>> __________________________________________________
 >>> Do You Yahoo!?
 >>> Tired of spam?  Yahoo! Mail has the best spam protection around
 >>> http://mail.yahoo.com
 >> ______________________________________________
 >> R-help at stat.math.ethz.ch mailing list
 >> https://stat.ethz.ch/mailman/listinfo/r-help
 >> PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
 >>
#
On 1/29/06, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
II don't think you understood my point.  This is how most software works,
IN GENERAL, so R should be expected to work that way
too.   I don't think not having a default is so bad but having the wrong
default that breaks the stereotype that one expects in all software
is bad.

What could be done is to add something about file.choose to the
error message that one gets when one does read.table("myfile")
and it can't find "myfile".
#
On 1/29/2006 5:20 PM, Gabor Grothendieck wrote:
I think I understood that, but my point is that R doesn't act that way 
now, and this change won't make the situation worse.

 >I don't think not having a default is so bad but having the wrong
I don't follow your argument.  Why is it better to say

Error in read.table() : argument "file" is missing, with no default

than it would be to ask the user which file to read?  The first is 
unexpected in both of the situations you described, while the second is 
only unexpected in a command line program.

Consistency is a good thing, but there are a number of choices of what 
to be consistent with:

  - other similar functions in R (but they are inconsistent)
  - previous versions of R (which is why I wouldn't change scan())
  - other software a user would be familiar with (which is why 
file.choose() is a good idea in a GUI, but not in a command line program).
Currently our error messages explain what went wrong, they generally 
don't try to suggest alternative approaches (though a few do, e.g. 
help("dfdsfs")).  There are a lot of reasons read.table() could fail, 
and I think it would be very hard to get a good automatic rule on when 
file.choose() was the appropriate alternative.

Duncan Murdoch
#
On 1/29/06, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
Because that does not mix conventions.
Because its conventional that stdin is the default.  Even in R someone
must have realized that since that is the default for scan.
I wasn't suggesting an automatic solution but I think it could be helpful
if the error message pointed out the existence of file.choose.