Skip to content

read.table: deciding automatically between two colClasses values

3 messages · Joshua Wiley, Oliver Kullmann

#
Hello,

I have a function for reading a data-frame from a file, which contains

  E = read.table(file = filename,
        header = T,
        colClasses = c(rep("integer",6),"numeric","integer",rep("numeric",8)),
        ...)

Now a small variation arose, where 

colClasses = c(rep("integer",4),"numeric","integer",rep("numeric",8))

needed to be used (so just a small change).
I want to have it convenient for the user, so no user intervention shall
be needed, but the function should choose between the two different values
"4" and "6" here according to the header-line.

Now this seems to be a problem: I found only count.fields, which
however is not able just to read the first line. Reading the
whole file (just to read the first line) is awkward, and also these
files typically have millions of lines. The only possibility to influence
count.fields seems via skip, but this I could only use to skip to the
last line, which reads the file nevertheless, and I also don't know
the number of lines in the file.

Perhaps one could catch an error, when the first invocation of
read.table fails, and try the second one. However tryCatch doesn't
seem to make it simple to write something like

E = try(expr1 otherwise expr2)

(if expr1 fails, evaluate expr2 instead) ?

Oliver
#
Hi Oliver,

Look at ?readLines

I imagine something like:

tmp <- readLines(filename, n = 1L)
(do stuff with the first line to decide)
IntN <- 6 (or 4)
NumN <- 8 (or whatever)
E <- read.table(file = filename, header = TRUE, colClasses =
  c(rep("integer", IntN), "numeric", "integer", rep("numeric", NumN)), ...)

Cheers,

Josh

On Sun, Aug 28, 2011 at 7:13 AM, Oliver Kullmann
<O.Kullmann at swansea.ac.uk> wrote:

  
    
#
Hi Josh,

thanks, that worked!
For the record, here is a function to determine the
number of strings, space-separated, in the first line
of a file:

# Removes leading and trailing whitespaces from string x:
trim = function(x) gsub("^\\s+|\\s+$", "", x)

# The number of strings in the first line in the file with name f:
lengthfirstline = function(f) {
  length(unlist(strsplit(trim(readLines(f,1)), " ")))
}

Oliver
On Sun, Aug 28, 2011 at 07:23:07AM -0700, Joshua Wiley wrote: