Skip to content
Prev 312295 / 398506 Next

Speeding reading of large file

Dennis,

I used your code to create the test file, and then used two different method to read the file

# method 1
system.time({
fisher <- read.table('c:/tmp/fisher.txt', header=TRUE,skip=1,fill=TRUE, as.is=TRUE)
fisher <- data.frame(apply(fisher,2,as.numeric))
fisher <- fisher[!is.na(fisher$PTID),]
})
   user  system elapsed 
   0.14    0.00    0.14 
There were 12 warnings (use warnings() to see them)

# method 2
system.time({
raw <- readLines(con='c:/tmp/fisher.txt')
fisher2 <- read.table(text=raw[!grepl("[A:DF:Z]" ,raw)], header=FALSE, fill=TRUE)
names <- read.table('c:/tmp/fisher.txt',header=TRUE,skip=1,nrows=1)
colnames(fisher2) <- colnames(names)
})
   user  system elapsed 
   1.31    0.00    1.31 

Method 1 was substantially faster than method 2.  One thing I don't like about method 1 is the warnings (about NA's being created by as.numeric).  However they are essentially harmless.


Hope this is helpful,

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204