Skip to content
Back to formatted view

Raw Message

Message-ID: <6E8D8DFDE5FA5D4ABCB8508389D1BF88D4C9D5@SRVEXCHMBX.precheza.cz>
Date: 2012-10-05T16:09:35Z
From: PIKAL Petr
Subject: R: machine for moderately large data
In-Reply-To: <E47CE0F6CFDF334186ADBE73DEDBE199200A9B8B19@NUEW-EXMBCRB1.gfk.com>

Hi

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Sk?la, Zden?k (INCOMA GfK)
> Sent: Friday, October 05, 2012 3:38 PM
> To: r-help at r-project.org
> Subject: [R] R: machine for moderately large data
> 
> Dear all,
> 
> I would like to ask your advice about a suitable computer for the
> following usage.
> I (am starting to) work with moderately big data in R:
> - cca 2 - 20 million rows * 100 - 1000 columns (market basket data)
> - mainly clustering, classification trees, association analysis (e.g.
> libraries rpart, cba, proxy, party)

If I compute correctly, such a big matrix (20e6*1000) needs about 160 GB just to be in memory. Are you prepared for this?

Maybe some suitable database interface shall be preferable.

Regards
Petr

> 
> Can you recommend a sufficient computer for this volume?
> I am routinely working in Windows but feel that Mac or some linux
> machine might be needed.
> 
> Please, respond directly to my email.
> Many thanks!
> 
> Zdenek Skala
> zdenek.skala at gfk.com
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.