Ein eingebundener Text mit undefiniertem Zeichensatz wurde abgetrennt. Name: nicht verf?gbar URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120201/cc22025d/attachment.pl>
randomForest: proximity for new objects using an existing rf
2 messages · Kilian, Liaw, Andy
There's an alternative, but it may not be any more efficient in time or memory... You can run predict() on the training set once, setting nodes=TRUE. That will give you a n by ntree matrix of which node of which tree the data point falls in. For any new data, you would run predict() with nodes=TRUE, then compute the proximity "by hand" by counting how often any given pair landed in the same terminal node of each tree. Andy
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Kilian Sent: Wednesday, February 01, 2012 5:39 AM To: r-help at r-project.org Subject: [R] randomForest: proximity for new objects using an existing rf Dear all, using an existing random forest, I would like to calculate the proximity for a new test object, i.e. the similarity between the new object and the old training objects which were used for building the random forest. I do not want to build a new random forest based on both old and new objects. Currently, my workaround is to calculate the proximites of a combined data set consisting of training and new objects like this: model <- randomForest(Xtrain, Ytrain) # build random forest nnew <- nrow(Xnew) # number of new objects Xcombi <- rbind(Xnew, Xtrain) # combine new objects and training objects predcombi <- predict(model, Xcombi, proximity=TRUE) # calculate proximities proxcombi <- predcombi$proximity # get proximities of combined dataset proxnew <- proxcombi[(1:nnew),-(1:nnew)] # get proximities of new objects only But this approach causes a lot of wasted computation time as I am not interested in the proximities among the training objects themselves but only among the training objects and the new objects. With 1000 training objects and 5 new objects, I have to calculate a 1005x1005 proximity matrix to get the essential 5x1000 matrix of the new objects only. Am I doing something wrong? I read through the documentation but could not find another solution. Any advice would be highly appreciated. Thanks in advance! Kilian [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Notice: This e-mail message, together with any attachme...{{dropped:11}}