Hello, I exprerienced a problem with large data sets in R: I cannot import these data via procedure "read.table" (insuficient memory) and some other functions end with the same expception. Could you tell me how to handle large data sets in R? Thank you. Pavel Vanecek ____________________________________________________________ Licitovat nejvyhodnejsi nab?dku je postavene na hlavu! Skoda Fabia nyni se zvyhodnenim az 50.000 Kc!http://ad2.seznam.cz/redir.cgi?instance=68739%26url=http://www.skoda-auto.cz/action/fast
problems with large data
4 messages · Brian Ripley, PaTa PaTaS, Spencer Graves
On Fri, 9 Jan 2004, PaTa PaTaS wrote:
I exprerienced a problem with large data sets in R: I cannot import these data via procedure "read.table" (insuficient memory) and some other functions end with the same expception. Could you tell me how to handle large data sets in R?
We need more details. Have you followed all the hints in ?read.table and the Data Import/Export manual? If you have, then probably your data set is too large for the memory of your version of R, and the simplest solution is to get more memory. To be more helpful we would need full details of the dataset and of the commands you used and the environment you are using (OS, how much RAM and how much virtual memory at least).
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Thank you all for your help. The problem is not only with reading the data (5000 cases times 2000 integer variables, imported either from SPSS or TXT file) into my R 1.8.0 but also with the procedure I would like to use = "randomForest" from library "randomForest". It is not possible to run it with such a data set (because of the insuficient memory exception). Moreover, my data has factors with more than 32 classes, which causes another error. Could you suggest any solution for my problem? Thank you a lot. ____________________________________________________________ Licitovat nejvyhodnejsi nab?dku je postavene na hlavu! Skoda Octavia nyni se zvyhodnenim az 90.000 Kc! http://ad2.seznam.cz/redir.cgi?instance=68740%26url=http://www.skoda-auto.cz/action/fast
If you can't get more memory, you could read portions of the file
using "scan(..., skip = ..., nlines = ...)" and then compress the data
somehow to reduce the size of the object you pass to "randomForest".
You could run "scan" like this in a loop each time processing, e.g., 10%
of the data file.
Alternatively, you could pass each portion to "randomForest" and
compare the results from several calls to "randomForest". This would
produce a type of cross validation, which might be a wise thing to do,
anyway.
hope this helps.
spencer graves
PaTa PaTaS wrote:
Thank you all for your help. The problem is not only with reading the data (5000 cases times 2000 integer variables, imported either from SPSS or TXT file) into my R 1.8.0 but also with the procedure I would like to use = "randomForest" from library "randomForest". It is not possible to run it with such a data set (because of the insuficient memory exception). Moreover, my data has factors with more than 32 classes, which causes another error. Could you suggest any solution for my problem? Thank you a lot.
____________________________________________________________ Licitovat nejvyhodnejsi nab?dku je postavene na hlavu! Skoda Octavia nyni se zvyhodnenim az 90.000 Kc! http://ad2.seznam.cz/redir.cgi?instance=68740%26url=http://www.skoda-auto.cz/action/fast ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html