An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110902/3e0f0b15/attachment.pl>
Advice on large data structures
4 messages · Worik R, jim holtman, Joe Conway +1 more
i would suggest that if you want to use R that you get a 64-bit version with 24GB of memory to start. if your data is a numeric matrix, you will need 8GB for a single copy. Do you really need it all in memory at once, or can you partition the problem? Can you use a database to access the portion you need at any time? If you only need one, or two, columns at a time, then the use of a database storing the columns might work. You probably need some more analysis on exactly how you want to solve your problem understanding the limitations of the system. Sent from my iPad
On Sep 2, 2011, at 1:13, Worik R <worikr at gmail.com> wrote:
Friends I am starting on a (section of the) project where I need to build a matrix with on the order of 5 million rows and 200 columns I am wondering if I can stay in R. I need to do rollapply type operations on the columns, including some that will be functions of (windows of) two columns. I have been looking at the ff and bigmemory packages but am not sure that they will do. Before I get too deep can some one offer some wisdom about what the best direction to go would be? Switching to C/C++ is definitely an option if it is all too hard cheers Worik [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 09/01/2011 10:13 PM, Worik R wrote:
I am starting on a (section of the) project where I need to build a matrix with on the order of 5 million rows and 200 columns I am wondering if I can stay in R. I need to do rollapply type operations on the columns, including some that will be functions of (windows of) two columns.
Perhaps useful to you -- I recently added WINDOW FUNCTION support to
PL/R*. Currently this new feature is only available in git master, but
within a few days I will push a new release. You can download the source
from git here if you want:
https://github.com/jconway/plr
The official docs have not been updated yet, but see the pre-release
docs here (specifically chapter 9):
http://www.joeconway.com/plr/doc/plr-git-US.pdf
HTH,
Joe
*PL/R allows you to execute R functions from within a PostgreSQL database
Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support
Along the lines of one of Jim's suggestions, if you have some basic MySQL knowledge check out the RMySQL package. I use it to convert / partition a matrix similar to yours to R objects and it works fine. Hope this helps, A. On Fri, 2 Sep 2011 06:33:13 -0400
Jim Holtman <jholtman at gmail.com> wrote:
i would suggest that if you want to use R that you get a 64-bit version with 24GB of memory to start. if your data is a numeric matrix, you will need 8GB for a single copy. Do you really need it all in memory at once, or can you partition the problem? Can you use a database to access the portion you need at any time? If you only need one, or two, columns at a time, then the use of a database storing the columns might work. You probably need some more analysis on exactly how you want to solve your problem understanding the limitations of the system. Sent from my iPad On Sep 2, 2011, at 1:13, Worik R <worikr at gmail.com> wrote:
Friends I am starting on a (section of the) project where I need to build a matrix with on the order of 5 million rows and 200 columns I am wondering if I can stay in R. I need to do rollapply type operations on the columns, including some that will be functions of (windows of) two columns. I have been looking at the ff and bigmemory packages but am not sure that they will do. Before I get too deep can some one offer some wisdom about what the best direction to go would be? Switching to C/C++ is definitely an option if it is all too hard cheers Worik [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.