Skip to content

tree model large dataset -error

2 messages · srinivasa raghavan, Brian Ripley

#
Hi R-users,

     I was trying to do CART using the tree package
for a dataset with 6 groups and 1000 predictor
variables, number of observations 5000. The following
error message is generated

Error:cannot allocate vector of size 3910 kb
In addition: warning message:
Reached total allocation of 125Mb:

I am using R 1.3 for Windows 98 in a pII with 128
MBRAM

    If RAM should be increased what is the ideal
system configuration for processing large datasets say
more than 500 MB size to perform Multivariate and
Exploratory data analysis( I maynot be able to switch
to mainframe or supercomputers)

any suggestion will be highly appreciated

srinivas

__________________________________________________
Terrorist Attacks on U.S. - How can you help?
Donate cash, emergency relief information
http://dailynews.yahoo.com/fc/US/Emergency_Information/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Mon, 17 Sep 2001, srinivasa raghavan wrote:

            
Please, CART is a trademark (so don't misuse it), and what tree does is not
CART anyway. I suggest you use rpart (a closer approximation to CART).

I don't think this is sensible statistically.  With 1000 predictors
to choose from and only 5000 observations, you will just suffer from
data-dredging.  You must know somethign about the 1000 predictors,
so used structured subgroups.

However, let's do some calculations.  R stores data in memory.  You have 5
million data items, and they will be stored in doubles, so your dataset is
ca 40Mb.  You will need at least a couple more copies.  So you need more
than 128Mb.
(There's a lot between Windows 98 and mainframes or supercomputers.
You can let R use virtual memory: see the rw-FAQ, but Windows 98
is not good at this.  Linux runs R well on machines with 1Gb RAM:
Windows 2000 also runs R well but may use memory less efficiently.)