Can I improve the efficiency of my scan() command?

From: Pierre Kleiber [mailto:pkleiber at honlab.nmfs.hawaii.edu]

Ko-Kang Kevin Wang wrote:
[snipped]
It worked all right, but I'm just wondering if there is a 
more efficient 
way (it takes about 10 minutes to run the above scripts, 
for my 300,000 x 
25 CSV file)?

For example, the CSV file has 25 columns but I don't need 3 
of them (6, 7, 
and 22).  What I have done is to scan them in anyway, 
convert the list 
into a data frame then remove the 3 columns.  Just wonder if it is 
possible to simply ignore them in scan() to make the process faster?

It might not make a lot of difference in your case where you are
reading many fields and want to ignore a few, but if you want to read
a few out of many, it would help to preprocess the input file using,
for example, awk as in the following, which would pick up fields 1, 2,
and 4:

 > con <- pipe("awk -F , '{print $1,$3 $4}' ../Data/Rating.csv")
 > rating <- scan(con, what = list(
+                  usage = "",
+                  mileage = 0,
+                  excess = "")
+            , quiet = TRUE, skip = 1)
 > close(con)
Or even pipe("cut -d, -f1,3-4 ...")

Andy
I do this sort of thing a lot using various utilities; so I've defined
the following function to take care of opening and closing the
connection:

scanpipe <- function(x,...) {
   con <- pipe(x)
   out <- scan(con,...)
   close(con)
   out
}

-- 
-----------------------------------------------------------------
Pierre Kleiber             Email: pkleiber at honlab.nmfs.hawaii.edu
Fishery Biologist                     Tel: 808 983-5399/737-7544
NOAA FISHERIES - Honolulu Laboratory         Fax: 808 983-2902
2570 Dole St., Honolulu, HI 96822-2396
-----------------------------------------------------------------

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

------------------------------------------------------------------------------