Can I improve the efficiency of my scan() command?
From: Pierre Kleiber [mailto:pkleiber at honlab.nmfs.hawaii.edu] Ko-Kang Kevin Wang wrote:
[snipped]
It worked all right, but I'm just wondering if there is a
more efficient
way (it takes about 10 minutes to run the above scripts,
for my 300,000 x
25 CSV file)? For example, the CSV file has 25 columns but I don't need 3
of them (6, 7,
and 22). What I have done is to scan them in anyway,
convert the list
into a data frame then remove the 3 columns. Just wonder if it is possible to simply ignore them in scan() to make the process faster?
It might not make a lot of difference in your case where you are reading many fields and want to ignore a few, but if you want to read a few out of many, it would help to preprocess the input file using, for example, awk as in the following, which would pick up fields 1, 2, and 4:
> con <- pipe("awk -F , '{print $1,$3 $4}' ../Data/Rating.csv")
> rating <- scan(con, what = list(
+ usage = "", + mileage = 0, + excess = "") + , quiet = TRUE, skip = 1)
> close(con)
Or even pipe("cut -d, -f1,3-4 ...")
Andy
I do this sort of thing a lot using various utilities; so I've defined
the following function to take care of opening and closing the
connection:
scanpipe <- function(x,...) {
con <- pipe(x)
out <- scan(con,...)
close(con)
out
}
--
-----------------------------------------------------------------
Pierre Kleiber Email: pkleiber at honlab.nmfs.hawaii.edu
Fishery Biologist Tel: 808 983-5399/737-7544
NOAA FISHERIES - Honolulu Laboratory Fax: 808 983-2902
2570 Dole St., Honolulu, HI 96822-2396
-----------------------------------------------------------------
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
------------------------------------------------------------------------------