Large file size while persisting rpart model to disk
Dear Prof. Ripley, Thanks for the quick reply. I do notice an <environment...> in the print output. I assume it is used to keep copies of the initial data used for the model. - Is it safe to assume that it would not affect any other functionality, apart from the usage of those particular functions? - Is there a better/recommended way of reducing the size? Thanks, Tan
On Feb 3, 4:56?pm, Prof Brian Ripley <rip... at stats.ox.ac.uk> wrote:
On Tue, 3 Feb 2009, tan wrote:
I am using rpart to build a model for later predictions. To save the prediction across restarts and share the data across nodes I have been using "save" to persist the result of rpart to a file and "load" it later. But the saved size was becoming unusually large (even with binary, compressed mode). The size was also proportional to the amount of data that was used to create the model.
After tinkering a bit, I figured out that most of the size was because of the rpart$functions attribute. If I set it to NULL, the size seems to drop dramatically. It can be seen with the following lines of R code, where there is a difference, though it is small. The difference is more pronounced with large datasets.
library(rpart) fit <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis) save(fit, file="fit1.sav") fit$functions <- NULL save(fit, file="fit2.sav")
What is the reason behind it? The functions themselves seem small, so where it all the bulk coming from?
Their environments. -- Brian D. Ripley, ? ? ? ? ? ? ? ? ?rip... at stats.ox.ac.uk Professor of Applied Statistics, ?http://www.stats.ox.ac.uk/~ripley/ University of Oxford, ? ? ? ? ? ? Tel: ?+44 1865 272861 (self) 1 South Parks Road, ? ? ? ? ? ? ? ? ? ? +44 1865 272866 (PA) Oxford OX1 3TG, UK ? ? ? ? ? ? ? ?Fax: ?+44 1865 272595
______________________________________________ R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.