Hi all, I am using R1.5.0 under Unix, I have a couple of questions here. 1. My program is running out of memory. I am writing a program to grow a list of trees using rpart() on a subset of a large dataset(5807x693) with a different response for every tree. I saw that after each tree was constucted, 116 MB of data was being added to the Vcells. I have no idea what this data is. My dataset is 30MB large and each tree is 1.6 MB large. Could someone tell me how to monitor what data is getting stored in the Vcells? 2. This is related to the same program as above. When growing a tree I used the expression: fit <- rpart(formula= x[[34]] ~ ., data = x) This does not give an error but does give an obviously wrong answer. But when I rearranged the data.frame, x, so that the response variable comes in the first column and all the other variables in the remaining columns and tried using fit <- rpart(x) it worked perfectly i.e gave the correct tree. Could someone tell me what to do if I want the 34th column of the data.frame to be the response variable but dont want to use the column names in the formula for growing the tree. Thanks in advance. -Saket. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
2 questions
2 messages · Saket Joshi
1 day later
Hi all, My sincere apologies to all those who could not understand my previous question and so could not answer it. I am not a statistitian and neither have I worked on R for long. So please excuse my naive language. I hope I can explain my question better this time. I have a data.frame named 'temp'. The following are the series of commands that followed after I obtained this data.frame
x <- rpart(temp)
attributes(x)
$names [1] "frame" "where" "call" "terms" "cptable" "splits" [7] "method" "parms" "control" "functions" "y" "ordered" $class [1] "rpart"
x$functions
$summary
function (yval, dev, wt, ylevel, digits)
{
paste(" mean=", formatg(yval, digits), ", MSE=", formatg(dev/wt,
digits), sep = "")
}
<environment: 4494214>
$text
function (yval, dev, wt, ylevel, digits, n, use.n)
{
if (use.n) {
paste(formatg(yval, digits), "\nn=", n, sep = "")
}
else {
paste(formatg(yval, digits))
}
}
<environment: 4494214>
gc()
used (Mb) gc trigger (Mb) Ncells 330122 8.9 1162530 31.1 Vcells 46072722 351.6 64233246 490.1
x$functions <- NULL
gc()
used (Mb) gc trigger (Mb) Ncells 326469 8.8 1162530 31.1 Vcells 34321042 261.9 64233246 490.1 When the "functions" attribute of x was set to NULL, the storage in the Vcells reduced from 351.6 Mb to 261.9 Mb as can be seen from the 2 gc() commands executed above. I imagined that the rpart object 'x', is storing a pointer by the name of 'functions' to a large amount of data in the Vcells. This data was garbage collected when the pointer 'functions' was NULLed. However I am not sure that I am right on this count. My question is: Is there a way in which the options to rpart or otherwise can be set so as to never create the pointer 'functions' while fitting the rpart model in the first place instead of having to delete it later in order to save memory? Thanks in advance, Saket. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._