data mining & R - R-help | R Mailing Lists

Thu, Sep 7, 2000 7:23 PM #

I am new to R. currenty reading a few intereting articles about data mining.
data mining, if I conclude right, is a method to analyze large data set.

R (currently) uses a _static_ memory model.  This means that when it
   starts up, it asks the operating system to reserve a fixed amount of memory
   for it.  The size of this chunk cannot be changed subsequently.  Hence, it
   can happen that not enough memory was allocated, e.g., when trying to read
   large data sets into R.
   

out of curiousity, what is the upper limit of data size that R can process
in term or number of rows/columns or in MBytes? Or, if this limit exist, is
it hardware related? (e.g computer with 256MB can process more data than one
with 64MB) 

zamri
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Ross Ihaka

Thu, Sep 7, 2000 8:37 PM #

On Fri, Sep 08, 2000 at 10:23:34AM +0800, Mohd Zamri Murah wrote:

This is about to change in 1.2.  Luke Tierney has rewritten the memory
management in R so that this restriction no longer applies.  On the other
hand, the computational model used within R is really only suitable for
data sets consisting of at most a few 10s of megabytes.  The problem is
that data sets are memory resident and some computations will copy the
entire data set.

	Ross
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brian Ripley

Thu, Sep 7, 2000 11:13 PM #

On Fri, 8 Sep 2000, Ross Ihaka wrote:

Not really.  It's about searching for structure in largish datasets,
usually with many observations per subject.  Think of it as really multi-
multivariate analysis.

10Mb datasets are certainly challenging enough for the current state of
data mining methodology. We routinely use R 1.1.1 (with a few judious
operations coded in C called from R) to analyse fMRI experiments in the
10-50Mb region, on machines with 128-512Mb of RAM.

There is another part of data mining about managing the databases, and
sometimes that's needed to extract a suitable <10Mb subset from a data
warehouse.

Finally, `data mining' is a buzz phrase and not well-defined: the above
reflects what people who talk to me (e.g. as a consultant) mean by it!

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._