Skip to content

Memory problem on a linux cluster using a large data set [Broadcast]

3 messages · Iris Kolder, Thomas Lumley, Martin Morgan

#
On Thu, 21 Dec 2006, Iris Kolder wrote:

            
Huh?  R 2.4.x runs perfectly happily accessing large memory under Linux on 
64bit processors (and Solaris, and probably others). I think it even works 
on Mac OS X now.

For example:
used   (Mb) gc trigger   (Mb)   max used   (Mb)
Ncells     222881   12.0     467875   25.0     350000   18.7
Vcells 1000115046 7630.3 1000475743 7633.1 1000115558 7630.3


         -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle
#
Section 8 of the Installation and Administration guide says that on
64-bit architectures the 'size of a block of memory allocated is
limited to 2^32-1 (8 GB) bytes'.

The wording 'a block of memory' here is important, because this sets a
limit on a single allocation rather than the memory consumed by an R
session. The size of the allocation of the original poster was
something like 300,000 SNPs x 1000 individuals x 8 bytes (depending on
representation, I guess) = about 2.3 GB so there is still some room
for even larger data.

Obviously it's important to think carefully about how the statistical
analysis of such a large volume of data will proceed, and be
interpreted.

Martin

Thomas Lumley <tlumley at u.washington.edu> writes: