Skip to content

data mining & R

3 messages · Mohd Zamri Murah, Ross Ihaka, Brian Ripley

#
I am new to R. currenty reading a few intereting articles about data mining.
data mining, if I conclude right, is a method to analyze large data set.
R (currently) uses a _static_ memory model.  This means that when it
   starts up, it asks the operating system to reserve a fixed amount of memory
   for it.  The size of this chunk cannot be changed subsequently.  Hence, it
   can happen that not enough memory was allocated, e.g., when trying to read
   large data sets into R.
   

out of curiousity, what is the upper limit of data size that R can process
in term or number of rows/columns or in MBytes? Or, if this limit exist, is
it hardware related? (e.g computer with 256MB can process more data than one
with 64MB) 

zamri
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Fri, Sep 08, 2000 at 10:23:34AM +0800, Mohd Zamri Murah wrote:
This is about to change in 1.2.  Luke Tierney has rewritten the memory
management in R so that this restriction no longer applies.  On the other
hand, the computational model used within R is really only suitable for
data sets consisting of at most a few 10s of megabytes.  The problem is
that data sets are memory resident and some computations will copy the
entire data set.

	Ross
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Fri, 8 Sep 2000, Ross Ihaka wrote:

            
Not really.  It's about searching for structure in largish datasets,
usually with many observations per subject.  Think of it as really multi-
multivariate analysis.
10Mb datasets are certainly challenging enough for the current state of
data mining methodology. We routinely use R 1.1.1 (with a few judious
operations coded in C called from R) to analyse fMRI experiments in the
10-50Mb region, on machines with 128-512Mb of RAM.

There is another part of data mining about managing the databases, and
sometimes that's needed to extract a suitable <10Mb subset from a data
warehouse.

Finally, `data mining' is a buzz phrase and not well-defined: the above
reflects what people who talk to me (e.g. as a consultant) mean by it!