Skip to content

read.table: check.names arg - feature request

2 messages · Vadim Ogranovich, Marc Schwartz

#
I admit I should have been more clear in my original posting. Let me try again (and I do know that by deafulat read.table discards everything after '#' which is why I use comment.char="", my bad not to mention this).


Here is a typical example of my data file:

#key	value
foo	1.2
boo	1.3

As you see the header line begins with '#' and then lists the column names, however make.names will convert the raw names  c("#key", "value") to c(".key", "value") while I need c("key", "value"), i.e. no dot before key. So I am asking to give us a hook to specify the function that will handle this situation.



I am not sure I understand how having this hook can result in an invalid data frame? It can return invalid names, but check.names=FALSE can too.

Thanks,
Vadim

-----Original Message-----
From: Martin Maechler [mailto:maechler at stat.math.ethz.ch]
Sent: Thursday, September 04, 2003 1:28 AM
To: Vadim Ogranovich
Cc: R-Help (E-mail)
Subject: Re: [R] read.table: check.names arg - feature request
Vadim> Hi, I thought it would be convenient if the
    Vadim> check.names argument to read.table, which currently
    Vadim> can only be TRUE/FALSE, could take a function value
    Vadim> as well. If the function is supplied it should be
    Vadim> used instead of the default make.names.

One could, but it's not necessary in your case (see below), and
it's a potential pit to fall in..  We want read.table() to
return valid  data frames.

    Vadim> Here is an example where it can come in handy. I tend
    Vadim> to keep my data in coma-separated files with a header
    Vadim> line. The header line is prefixed with a comment sign
    Vadim> '#' to simplify identification of these lines. Now
    Vadim> when I read.table the files the '#' is converted to
    Vadim> '.' while I want it to be discarded.

Hmm, are you using a very old version of R,
or haven't you seen the `comment.char = "#"' argument of
read.table()?

Reading "?read.table", also note the note about
`blank.lines.skip' , and then realize that the default for
blank.lines.skip is  ` !fill ' and that `fill = TRUE' for all
the read.csv* and read.delim* incantation of read.table().

In sum, it's very easy to use current read.table() for your
situation!

    Vadim> P.S. I don't know if r-help is the right place for
    Vadim> feature requests. If it's not please let me know
    Vadim> where the right one is.

Since your proposal can be interpreted as "How do I use
read.table() when my file has comment lines?",
r-help has been very appropriate.

Otherwise, and particularly if the proposal is more technical,
R-devel would be better suited.

Regards,
Martin Maechler <maechler at stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><
#
On Thu, 2003-09-04 at 12:00, Vadim Ogranovich wrote:
SNIP 

What's wrong with changing the colnames after the import:

(This is under R 1.7.1 under RH 9, using defaults)

# Note the conversion of the '#' to 'X.'
make.names(c("#key", "value"))
[1] "X.key" "value"

# Presuming you have dataframe 'df' now imported:
colnames(df)
[1] "X.key" "value"

# Now change them
colnames(df) <- gsub("X\\.", "", colnames(df))

# Check names
colnames(df)
[1] "key"   "value"



Is there a reason that you could not do this after, rather than before
or during the import? You are asking R Core to make a substantive change
to a function, when an alternative already exists to resolve a rather
unique situation.

HTH,

Marc Schwartz