Skip to content
Prev 3872 / 63424 Next

Suggestion for comments in data files (i.e. read.table)

Prof Brian D Ripley wrote:

            
I agree that the basic problem is that R hasn't a linewise data import
utility. I guess, the reason for this is that anything working linewise has
to be written entirely in C for performance reasons, which means loosing the
flexibility of the R language.

However, there might be a way to solve this: processing batches of lines
instead of single lines. If we import m of n (m << n) lines at a time as a
string vector, we could use R vector functions to preprocess these strings
and then scan those. Thus scan() needed an extension to allow taking it's
input from string vectors. Or, perhaps better, we need seperate access to
the two functionalities of scan, (1) physical reading and (2) parsing.

Then read.table could be rewritten to work on batches of lines, with a
parameter nbatch=1000, and an optional parameter preprocess.func=NULL
which - if used - would return a preprocessed vector of strings, e.g.

  ThroughAwayCommentLines <- function(s)s[-grep("^#", s)]

to realize Telfors Tendys suggestion, or

  RemoveTrailingComments <- function(s){
    s <- gsub("#.*", "", s)
    s[nchar(s) > 0]
  }

for Prof. Ripley's suggestion
Final comment: any solution having # skip the rest of the line MUST be
optional, otherwise R looses it's ability to
import general ASCII-data. You never know whether some people use special
characters in their strings.

Any comments welcome


Jens

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._