Hello,
I have a function for reading a data-frame from a file, which contains
E = read.table(file = filename,
header = T,
colClasses = c(rep("integer",6),"numeric","integer",rep("numeric",8)),
...)
Now a small variation arose, where
colClasses = c(rep("integer",4),"numeric","integer",rep("numeric",8))
needed to be used (so just a small change).
I want to have it convenient for the user, so no user intervention shall
be needed, but the function should choose between the two different values
"4" and "6" here according to the header-line.
Now this seems to be a problem: I found only count.fields, which
however is not able just to read the first line. Reading the
whole file (just to read the first line) is awkward, and also these
files typically have millions of lines. The only possibility to influence
count.fields seems via skip, but this I could only use to skip to the
last line, which reads the file nevertheless, and I also don't know
the number of lines in the file.
Perhaps one could catch an error, when the first invocation of
read.table fails, and try the second one. However tryCatch doesn't
seem to make it simple to write something like
E = try(expr1 otherwise expr2)
(if expr1 fails, evaluate expr2 instead) ?
Oliver
read.table: deciding automatically between two colClasses values
3 messages · Joshua Wiley, Oliver Kullmann
Hi Oliver,
Look at ?readLines
I imagine something like:
tmp <- readLines(filename, n = 1L)
(do stuff with the first line to decide)
IntN <- 6 (or 4)
NumN <- 8 (or whatever)
E <- read.table(file = filename, header = TRUE, colClasses =
c(rep("integer", IntN), "numeric", "integer", rep("numeric", NumN)), ...)
Cheers,
Josh
On Sun, Aug 28, 2011 at 7:13 AM, Oliver Kullmann
<O.Kullmann at swansea.ac.uk> wrote:
Hello,
I have a function for reading a data-frame from a file, which contains
?E = read.table(file = filename,
? ? ? ?header = T,
? ? ? ?colClasses = c(rep("integer",6),"numeric","integer",rep("numeric",8)),
? ? ? ?...)
Now a small variation arose, where
colClasses = c(rep("integer",4),"numeric","integer",rep("numeric",8))
needed to be used (so just a small change).
I want to have it convenient for the user, so no user intervention shall
be needed, but the function should choose between the two different values
"4" and "6" here according to the header-line.
Now this seems to be a problem: I found only count.fields, which
however is not able just to read the first line. Reading the
whole file (just to read the first line) is awkward, and also these
files typically have millions of lines. The only possibility to influence
count.fields seems via skip, but this I could only use to skip to the
last line, which reads the file nevertheless, and I also don't know
the number of lines in the file.
Perhaps one could catch an error, when the first invocation of
read.table fails, and try the second one. However tryCatch doesn't
seem to make it simple to write something like
E = try(expr1 otherwise expr2)
(if expr1 fails, evaluate expr2 instead) ?
Oliver
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/
Hi Josh,
thanks, that worked!
For the record, here is a function to determine the
number of strings, space-separated, in the first line
of a file:
# Removes leading and trailing whitespaces from string x:
trim = function(x) gsub("^\\s+|\\s+$", "", x)
# The number of strings in the first line in the file with name f:
lengthfirstline = function(f) {
length(unlist(strsplit(trim(readLines(f,1)), " ")))
}
Oliver
On Sun, Aug 28, 2011 at 07:23:07AM -0700, Joshua Wiley wrote:
Hi Oliver,
Look at ?readLines
I imagine something like:
tmp <- readLines(filename, n = 1L)
(do stuff with the first line to decide)
IntN <- 6 (or 4)
NumN <- 8 (or whatever)
E <- read.table(file = filename, header = TRUE, colClasses =
c(rep("integer", IntN), "numeric", "integer", rep("numeric", NumN)), ...)
Cheers,
Josh
On Sun, Aug 28, 2011 at 7:13 AM, Oliver Kullmann
<O.Kullmann at swansea.ac.uk> wrote:
Hello,
I have a function for reading a data-frame from a file, which contains
?E = read.table(file = filename,
? ? ? ?header = T,
? ? ? ?colClasses = c(rep("integer",6),"numeric","integer",rep("numeric",8)),
? ? ? ?...)
Now a small variation arose, where
colClasses = c(rep("integer",4),"numeric","integer",rep("numeric",8))
needed to be used (so just a small change).
I want to have it convenient for the user, so no user intervention shall
be needed, but the function should choose between the two different values
"4" and "6" here according to the header-line.
Now this seems to be a problem: I found only count.fields, which
however is not able just to read the first line. Reading the
whole file (just to read the first line) is awkward, and also these
files typically have millions of lines. The only possibility to influence
count.fields seems via skip, but this I could only use to skip to the
last line, which reads the file nevertheless, and I also don't know
the number of lines in the file.
Perhaps one could catch an error, when the first invocation of
read.table fails, and try the second one. However tryCatch doesn't
seem to make it simple to write something like
E = try(expr1 otherwise expr2)
(if expr1 fails, evaluate expr2 instead) ?
Oliver
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.