Skip to content

reading in data with variable length

5 messages · Liaw, Andy, John McHenry, Gabor Grothendieck

#
Use file() connection in conjunction with readLines() and strsplit() should
do it.  I would try to count the number of lines in the file first, and
create a list with that many components, then fill it in.  I believe the
"array of cells" in Matlab is sort of equivalent to a list in R, but that's
beyond my knowledge of Matlab...

Andy

From: John McHenry
#
Could you time these and see how each of these do:

# 1
ta.split <- strsplit(ta, split = ",")
ta.num <- lapply(ta.split, function(x) as.numeric(x[-(1:2)]))

# 2
ta0 <- sub("^[^,]*,[^.]*,", "", ta)
ta.num <- lapply(ta0, scan, sep = ",")

# 3 - loop version of #1
n <- length(ta)
ta.split <- strsplit(ta, split = ",")
ta.num <- list(length = n)
for(i in 1:n) ta.num[[i]] <- as.numeric(ta.split[[i]][-(1:2)])

# 4 - loop version of #2
n <- length(ta)
ta0 <- sub("^[^,]*,[^.]*,", "", ta)
ta.num <- list(length = n)
for(i in 1:n) ta.num[[i]] <- scan(t0[[i])
On 12/6/05, John McHenry <john_d_mchenry at yahoo.com> wrote:
#
On 12/6/05, John McHenry <john_d_mchenry at yahoo.com> wrote:
Building on Andy's variation:

n <- length(ta)
ta.sub <- sub("^[^,]*,[^.]*,", "", ta)
ta.con <- textConnection(ta.sub)
out <- replicate(n, scan(ta.con, nlines = 1, sep = ","))
close(ta.con)

Also consider writing ta.sub back out and defining ta.con as a
file connection to that file but testing both would be needed to
determine which is faster.