Skip to content
Prev 308365 / 398506 Next

speeding read.table

Hello,

Time down by a factor of 4. It still takes some minutes, 2 mins for a 
file of 380Mb/3.6M lines. So maybe system commands (maybe awk?) can do 
the job better.

fun <- function(infile, outfile, lines = 10000L){
     remove <- function(x){
         i1 <- grep("TABLE", x)
         i2 <- grep("COL", x)
         x[-c(i1, i2)]
     }
     fin <- file(infile, open = "rt")
     on.exit(close(fin))
     while(TRUE){
         x <- try(readLines(fin, n = lines))
         if(class(x) == "try-error") return(NULL)
         y <- remove(x[ x != "" ])
         if(length(y) == 0) return(NULL)
         lst <- lapply(strsplit(y, " "), function(.y)
             as.numeric(.y[ .y != "" ]))
         mat <- do.call(rbind, lst)
         write.table(mat, outfile, append = TRUE, row.names = FALSE, 
col.names = FALSE)
     }
}

fun("test", "clean")

Hope this helps,

Rui Barradas
Em 18-10-2012 18:14, Rui Barradas escreveu: