Skip to content

feature request: comment character in read.table?

6 messages · Brian Ripley, Peter Kleiweg, Ben Bolker

#
How difficult would it be (I could try myself if someone thought it
would be straightforward) to change read.table to allow a comment
character such as # or %?  My thought would be that anything on a line
following a comment character would be ignored (so that the combination of
blank.lines.skip=TRUE and a comment at the beginning of the line would
lead to a line being skipped completely).
 I'm always encouraging my students to comment their data sets, and it
feels lame to tell them they have to count the number of initial lines in
the data set in order to set the "skip" parameter appropriately.  (The
comment character would also allow comments about a particular data
point.)

  I know I could hack this (a) with sed in Unix [but my students using
Windows are likely to have trouble] (b) within R, by processing the file
and creating a temporary file with comments deleted.  (b) is probably what
I'll do as a temporary fix, but this seems to be a reasonable piece of
functionality for R to have ...

  thoughts?

  Ben Bolker
#
On Wed, 12 Sep 2001, Ben Bolker wrote:

            
It's hard in read.table, especially given the changes in R-devle to mke it
more flexible.  The place to do this seems to me to be the internals of
scan.  They are far from transparent, though.
(b) seems easy to me. Use either a file() connection or an output text
connection.  (I don't know if file(), that's no arg, works on Mac for
example, though.)
#
It was pretty straightforward.
  I haven't tested it very extensively, and not on a Mac at all.  The only
subtlety is skipping commented lines at the beginning of the file because
read.table assumes that the first line (not the first non-blank line) is
the header line if header=TRUE.

read.table.c <- function(file,comment="#",debug=FALSE,...) {
  infile <- file(file,"r")
  tmpfile <- file()
  cchar <- "#"
  while (length(cline <- readLines(infile,1))>0) {
    s <- strsplit(cline,cchar)[[1]][1]
    if (nchar(s)>0)  { ## skip blank lines to not screw up header
      if (debug) cat(s,"\n")
      writeLines(s,tmpfile)
    }
  }
  r <- read.table(tmpfile,...)
  close(infile)
  close(tmpfile)
  r
}
On Wed, 12 Sep 2001, Prof Brian D Ripley wrote:

            

  
    
#
Ben Bolker scribeva...
That is not very robust. What about these:

    # a comment
    1 2 3  # a comment
           # a comment
    "1" "2" "3  # not a comment"
    "# not a comment"  # a comment

Comments don't have to start at the first column, and comments
can also exist after real data. A comment char within a string
should not be taken as the start of a comment, and you also have
to take into account that the tokens delimiting a string can
vary.
#
I do intend to do this at a lower level where it will be robust.
As Peter says, quotes are one of the issues I had been thinking about.
I would be happy in the interim to insist on starting with a comment
character.

B
On Thu, 13 Sep 2001, Peter Kleiweg wrote:

            

  
    
#
On Thu, 13 Sep 2001, Peter Kleiweg wrote:
[snip]
1>     # a comment
2>     1 2 3  # a comment
3>            # a comment
4>     "1" "2" "3  # not a comment"
5>     "# not a comment"  # a comment
As Brian Ripley has pointed out, he hopes to do this at a lower level,
more robustly, later.  In the meantime, in my defense: this code works for
lines 1, 2, and 3 (it's OK with comments that start after the first column
and that exist after real data -- that was part of my spec).  It doesn't
deal with comment characters embedded in quoted strings, but I don't have
any problem with telling people that they're not allowed to have comment
characters in quoted strings in their data -- it seems to be a perfectly
reasonable restriction.
  If I wanted to hack this further I would probably try to do a strsplit
on quotation characters, and look for comment characters only in the odd
parts of the split.  And if someone puts

  "\"\\"\\\" ## "  "#" "\\  \" \#"

in their data file, then they deserve what they get ... :-)

  Ben Bolker


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._