Skip to content
Prev 6305 / 15274 Next

How to input large datasets into R

warmstrong at research:~$ R
used  (Mb) gc trigger   (Mb)  max used  (Mb)
Ncells    114397   6.2     350000   18.7    350000  18.7
Vcells 128222013 978.3  136994473 1045.2 128222477 978.3
1GB for the raw data doesn't seem so bad.

If you can't find a server somewhere that has a decent amount of ram,
then you have a couple of choices.

1) aggregate the data to 10min bars, or 30min bars to get started
2) use only half or a quarter of your data (which you have tried already)
3) work with only one stock in memory at a time (if you are pooling
data, this obviously wont' work)
4) use less memory hungry methods (look at Armadillo for instance:
http://dirk.eddelbuettel.com/code/rcpp.armadillo.html)
5) also check out these packages: bigmemory
(http://cran.r-project.org/web/packages/bigmemory/index.html) and
biglm  (http://cran.r-project.org/package=biglm)

This list is a great resource. Keep posting here as you progress.

-Whit


On Tue, Jun 29, 2010 at 1:51 AM, Aaditya Nanduri
<aaditya.nanduri at gmail.com> wrote: