Speeding reading of large file

Jim,

My original file used Dennis' script, so it was 10000 lines.  I created a 100,000 line file and the relative results were the same.  I ran your code on the file and your second and third approaches did not produce correct results.  It may be because the original data example had 2 header lines interspersed throughout the file and some of the numbers were in scientific notation.

I modified the grep function to work with the data file, but not really being a proficient R programmer I make no claims about efficiency.  But here are my results.

cat(c("TABLE NO.  1", " PTID        TIME        AMT         FORM        PERIOD      IPRED       CWRES       EVID        CP          PRED        RES         WRES", 
"  2.0010E+03  3.9375E-01  5.0000E+03  2.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  1.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00  0.0000E+00", 
"  2.0010E+03  8.9583E-01  5.0000E+03  2.0000E+00  0.0000E+00  3.3389E+00  0.0000E+00  1.0000E+00  0.0000E+00  3.5321E+00  0.0000E+00  0.0000E+00", 
"  2.0010E+03  1.4583E+00  5.0000E+03  2.0000E+00  0.0000E+00  5.8164E+00  0.0000E+00  1.0000E+00  0.0000E+00  5.9300E+00  0.0000E+00  0.0000E+00", 
"  2.0010E+03  1.9167E+00  5.0000E+03  2.0000E+00  0.0000E+00  8.3633E+00  0.0000E+00  1.0000E+00  0.0000E+00  8.7011E+00  0.0000E+00  0.0000E+00", 
"  2.0010E+03  2.4167E+00  5.0000E+03  2.0000E+00  0.0000E+00  1.0092E+01  0.0000E+00  1.0000E+00  0.0000E+00  1.0324E+01  0.0000E+00  0.0000E+00", 
"  2.0010E+03  2.9375E+00  5.0000E+03  2.0000E+00  0.0000E+00  1.1490E+01  0.0000E+00  1.0000E+00  0.0000E+00  1.1688E+01  0.0000E+00  0.0000E+00", 
"  2.0010E+03  3.4167E+00  5.0000E+03  2.0000E+00  0.0000E+00  1.2940E+01  0.0000E+00  1.0000E+00  0.0000E+00  1.3236E+01  0.0000E+00  0.0000E+00", 
"  2.0010E+03  4.4583E+00  5.0000E+03  2.0000E+00  0.0000E+00  1.1267E+01  0.0000E+00  1.0000E+00  0.0000E+00  1.1324E+01  0.0000E+00  0.0000E+00"
)[rep(1:10, 10000)], file="c:/tmp/fisher.txt", sep="\n")

system.time({
+     # approach #1 - read in file and then delete rows with NAs
+     x <- read.table('c:/tmp/fisher.txt', as.is = TRUE, skip=1, fill=TRUE, header = TRUE)
+     # convert to numeric
+     x[] <- lapply(x, as.numeric)
+     x <- x[!is.na(x[,1]), ]
+ })
   user  system elapsed 
   1.32    0.04    1.37 
There were 12 warnings (use warnings() to see them)

Speeding reading of large file

Thread (8 messages)