List of lists? Data frames? (Or other data structures?)
"R A F" <raf1729 at hotmail.com> writes:
Thanks for your comments. I'm not too familiar with these differences, but here's a simple experiment. In a data file with 139,000 rows and 5 columns (double string double double double),
system.time( aaa <- read.table( "file" ) )
20.67 0.41 21.10 0.00 0.00
system.time( aaa <- scan( "file", list( 0, "", 0, 0, 0 ) ) )
6.07 0.01 6.09 0.00 0.00 It seems like scan is much faster -- and as the data file grows, read.table seems to choke. (I actually tried this with a data file with over 2 million rows.)
You're not taking Brian's hint!:
Only if you don't specify colClasses: if you do (and you would need the information to use scan()) there should be no performance penalty. (Note that matrices can be scan()-ed into a vector and the dimensions added, and that will be faster.)
Try this:
cls <- sapply(list(0,"",0,0,0),class)
# older versions may need cls <- c("numeric","character",rep("numeric",3))
aaa <- read.table( "file", colClasses=cls )
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907