Skip to content

Ideal (possible) configuration for an exalted R system

2 messages · Harsh, Kingsford Jones

#
Hi All,
I am trying to assemble a system that will allow me to work with large
datasets (45-50 million rows, 300-400 columns) possibly amounting to
10GB + in size.

I am aware that R 64 bit implementations on Linux boxes are suitable
for such an exercise but I am looking for configurations that R users
out there may have used in creating a high-end R system.
Due to a lot of apprehensions that SAS users have about R's data
limitations, I want to demonstrate R's usability even with very large
datasets as mentioned above.
I would be glad to hear from users(share configurations and system
specific information) who have desktops/servers on which they use R to
crunch massive datasets.


Any suggestions in expanding R's functionality in the face of gigabyte
class datasets would be appreciated.

Thanks
Harsh Singhal
Decision Systems,
Mu Sigma Inc.
Chicago, IL
#
Hi Harsh,

The useR! 2008 site has useful information.  E.g. talks by

Graham Williams:

http://www.statistik.uni-dortmund.de/useR-2008/slides/Williams.pdf

Dirk Eddelbuettel

http://www.statistik.uni-dortmund.de/useR-2008/tutorials/useR2008introhighperfR.pdf

and others

http://www.statistik.uni-dortmund.de/useR-2008/abstracts/AbstractsByTopic.html#High%20Performance%20Computing



A few days ago I was googling to see what types of workstations are
available these days.  Here's some with up to 64gb ram:

http://www.colfax-intl.com/jlrid/SpotLight.asp?IT=0&RID=80

Perhaps it won't be long before we see such memory in laptops:

http://www.ubergizmo.com/15/archives/2009/01/samsung_opens_door_to_32gb_ram_stick.html

Like you, I'd also be interested in hearing about configurations folks
have used to work w/ large datasets.


hth,

Kingsford Jones
On Mon, Feb 16, 2009 at 5:10 AM, Harsh <singhalblr at gmail.com> wrote: