Why are big data.frames slow? What can I do to get it faster?
I wanted to know why not-vectorized operations are slow. Thank you for your suggestions. I did three things: -Beside looking at the total computation time, I analyzed the GarbageCollection-time (gc()). -I told R to use more memory. I use version 1.6.0 and used the command "Rgui --min-vsize=600M --min-nsize=10M" -I used test$Fieldname[i] instead of test[i, 6]. My results show that it saves a lot of time when I use enough memory and the fieldnames. So thank?s a lot! Here are the details: Without fieldnames and without use of more memory: GC-Time: 494Seconds, other calculations 124Seconds, Total 619Seconds. Without fieldnames, with "Rgui --min-vsize=600M --min-nsize=10M" GC-Time: 34Seconds, other calculations 114Seconds, Total 148Seconds. With fieldnames, without use of more memory: GC-Time: 0,5 Seconds, other calculations 2 Seconds, Total 2,5 Seconds. (but long time for loading the matrix) with fieldnames, with "Rgui --min-vsize=600M --min-nsize=10M" GC-Time: < 1 Second, other calculations < 1 Second, Total < 1 second Marcus Jellinghaus Peter Dalgaard writes:
You'll likely have to invoke the garbage collector a couple of times, and there might also be issues of memory growth kicking in. Once you get beyond some threshold, the machine starts swapping bits and pieces of the workspace in and out of physical memory,
Andy Liaw writes:
If you are on Windows and using R version prior to 1.6.0, make sure R can use all 1GB of the ram, as the default is to use up to 256MB or physical RAM, which ever is smaller. In R-1.6.0, that limit is raised to the
smaller
of 1GB and physical RAM.
[..]
Extracting from data frame one element at a time the way you did is expensive. I.e., test[i, 6] is slower than test$whatever[i].
Peter Dalgaard writes:
It's somewhat difficult to reproduce the behaviour, since you only give part of the code necessary (e.g. how many *columns* do you have in your data frame?)
summary(test)
datetime CCY1 CCY2 Bid Ask CCYPair Min. :2002-05-28 00:00:02 Length:500000 Length:500000 Min. : 0.557 Min. : 0.5574 Length:500000 1st Qu.:2002-05-28 17:30:47 Mode :character Mode :character 1st Qu.: 1.532 1st Qu.: 1.5319 Mode :character Median :2002-05-29 14:43:02 Median : 4.047 Median : 4.0476 Mean :2002-05-29 14:42:36 Mean : 38.664 Mean : 38.6858 3rd Qu.:2002-05-30 10:22:30 3rd Qu.: 32.888 3rd Qu.: 32.8891 Max. :2002-05-31 02:58:54 Max. :182.150 Max. :182.3000 -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._