read.table() and NULL for colClasses

Wed, Jul 28, 2004 3:13 PM

NULL is not a valid value for colClasses and I don't see why you thought
it was.  colClasses has to be character according to the documentation, so
"NULL" is allowed but not NULL.

Your diff appears to be backwards for a patch.  A patch against the 
current R-devel sources is what is needed, including some regression 
tests.

On Wed, 28 Jul 2004, Henrik Bengtsson wrote:

Is that a common enough case to make this worth the code complication,
given that scan() (or better, a DBMS) can be used?  The usual reason is
that R is maintained by a small and overworked team and adding
complications needs justification, not not adding them.

I've modfied read.table() to so it calls scan(what=...) also with NULLs for
the fields to be skipped. Here's the diff of readtable.R (from the
R-1.9.1.tgz; 9,591,217 bytes):

diff readtable.new.R readtable.R
117,123d116
<     # Skip NULL columns in scan()
<     void <- sapply(colClasses, FUN=identical, "NULL") |
<             sapply(colClasses, FUN=is.null)
<     # If all (data) columns are NULL, return empty data frame.
<     if (sum(!void) <= 1*rlabp)
<       return(data.frame())
<     what[void] <- list(NULL)
131c124
<     nlines <- length(data[[which(!void)[1]]])
---

    nlines <- length(data[[1]])

161c154
<     for (i in (1:cols)[!known & !void]) {
---

    for (i in 1:cols) {

171,178d163
<     # Skipped row names equals row.names=NULL.
<     if (rlabp) {
<       if (void[1]) {
<         row.names <- NULL
<         data <- data[-1]
<       }
<       void <- void[-1]
<     }
201,202d185
<     # Remove NULL columns
<     data[void] <- NULL

and a diff for read.table.Rd:

diff read.table.new.Rd read.table.Rd
102,104c102
<     \code{NA} when \code{\link{type.convert}} is used.  Columns for
<     which the value is \code{"NULL"} (or \code{NULL} in a list) are
<     skipped. NB: \code{as} is
---

    \code{NA} when \code{\link{type.convert}} is used.  NB: \code{as} is

181,183c179
<   the five atomic vector classes. Skipping columns with \code{"NULL"}
<   (or \code{NULL} will also require less memory.
<
---

  the five atomic vector classes.

Note that there is already an, what I assume is unintentional, effect of
setting a colClasses to "NULL". The data conversion, which happens *after*
scan() has read the data anyway, "NULL" will NULL a column via as(x,
"NULL"), but unfortunately the wrong column. If not the above modifications,
maybe a warning for the latter?

That's not usage as documented so the effect is definitely unintentional.
We can't catch all misuses!

Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

read.table() and NULL for colClasses

Thread (2 messages)