Skip to content
Prev 59345 / 398502 Next

Enormous Datasets

Thomas W Volscho <THOMAS.VOLSCHO at huskymail.uconn.edu> writes:
With a big machine... If that is numeric, non-integer data, you are
looking at something like
[1] 2.8e+09

i.e. roughly 3 GB of data for one copy of the data set. You easily
find yourself with multiple copies, so I suppose a machine with 16GB
RAM would cut it. These days that basically suggests x86_64
architecture running Linux (e.g. multiprocessor Opterons), but there
are also 64 bit Unix "big iron" solutions (Sun, IBM, HP,...).

If you can avoid dealing with the whole dataset at once, smaller
machines might get you there. Notice that 1 column is "only" 56MB, and
you may be able to work with aggregated data from some step onwards.