Given this general example: set.seed(1) data(iris) iris.rf <- randomForest(Species ~ ., iris, proximity=TRUE, keep.forest=TRUE) #varImpPlot(iris.rf) #varUsed(iris.rf) MDSplot(iris.rf, iris$Species) I?ve been reading the documentation about random forest (at best of my - poor - knowledge) but I?m in trouble with the correct interpretation of the MDS plot and I hope someone can give me some clues What is intended for ?the scaling coordinates of the proximity matrix?? I think to understand that the objective is here to present the distance among species in a parsimonious and visual way (of lower dimensionality) Is therefore a parallelism to what are intended the principal components in a classical PCA? Are the scaling coordinates DIM 1 and DIM2 the eigenvectors of the proximity matrix? If that is correct, how would you find the eigenvalues for that eigenvectors? And what are the eigenvalues repreenting? What are saying these two dimensions in the plot about the different iris species? Their relative distance in terms of proximity within the space DIM1 and DIM2? How to choose for the k parameter (number of dimensions for the scaling coordinates)? And finally how would you explain the plot in simple terms? Thank you for any feedback Best regards
interpretation of MDS plot in random forest
5 messages · Massimo Bressan, Liaw, Andy
Yes, that's part of the intention anyway. One can also use them to do clustering. Best, Andy -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Massimo Bressan Sent: Monday, December 02, 2013 6:34 AM To: r-help at r-project.org Subject: [R] interpretation of MDS plot in random forest Given this general example: set.seed(1) data(iris) iris.rf <- randomForest(Species ~ ., iris, proximity=TRUE, keep.forest=TRUE) #varImpPlot(iris.rf) #varUsed(iris.rf) MDSplot(iris.rf, iris$Species) I?ve been reading the documentation about random forest (at best of my - poor - knowledge) but I?m in trouble with the correct interpretation of the MDS plot and I hope someone can give me some clues What is intended for ?the scaling coordinates of the proximity matrix?? I think to understand that the objective is here to present the distance among species in a parsimonious and visual way (of lower dimensionality) Is therefore a parallelism to what are intended the principal components in a classical PCA? Are the scaling coordinates DIM 1 and DIM2 the eigenvectors of the proximity matrix? If that is correct, how would you find the eigenvalues for that eigenvectors? And what are the eigenvalues repreenting? What are saying these two dimensions in the plot about the different iris species? Their relative distance in terms of proximity within the space DIM1 and DIM2? How to choose for the k parameter (number of dimensions for the scaling coordinates)? And finally how would you explain the plot in simple terms? Thank you for any feedback Best regards ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates Direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system.
thanks andy it's a real honour form me to get a reply by you; I'm still a bit faraway from a proper grasp of the purpose of the plot... may I ask you for a more technical (trivial) issue? is it possible to add a legend in the MDS plot? my problem is to link the color points in the chart to the factor that was used as response to train rf, how to? best max
Yes, that's part of the intention anyway. One can also use them to do clustering. Best, Andy -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Massimo Bressan Sent: Monday, December 02, 2013 6:34 AM To: r-help at r-project.org Subject: [R] interpretation of MDS plot in random forest Given this general example: set.seed(1) data(iris) iris.rf <- randomForest(Species ~ ., iris, proximity=TRUE, keep.forest=TRUE) #varImpPlot(iris.rf) #varUsed(iris.rf) MDSplot(iris.rf, iris$Species) I?ve been reading the documentation about random forest (at best of my - poor - knowledge) but I?m in trouble with the correct interpretation of the MDS plot and I hope someone can give me some clues What is intended for ?the scaling coordinates of the proximity matrix?? I think to understand that the objective is here to present the distance among species in a parsimonious and visual way (of lower dimensionality) Is therefore a parallelism to what are intended the principal components in a classical PCA? Are the scaling coordinates DIM 1 and DIM2 the eigenvectors of the proximity matrix? If that is correct, how would you find the eigenvalues for that eigenvectors? And what are the eigenvalues repreenting? What are saying these two dimensions in the plot about the different iris species? Their relative distance in terms of proximity within the space DIM1 and DIM2? How to choose for the k parameter (number of dimensions for the scaling coordinates)? And finally how would you explain the plot in simple terms? Thank you for any feedback Best regards
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attach...{{dropped:12}}
sorry, in fact it was a trivial question!
by just peeping into the function I've worked out this simple solution:
MDSplot(iris.rf, iris$Species)
legend("topleft", legend=levels(iris$Species), fill=brewer.pal(3, "Set1"))
thank you
thanks andy it's a real honour form me to get a reply by you; I'm still a bit faraway from a proper grasp of the purpose of the plot... may I ask you for a more technical (trivial) issue? is it possible to add a legend in the MDS plot? my problem is to link the color points in the chart to the factor that was used as response to train rf, how to? best max
Yes, that's part of the intention anyway. One can also use them to do clustering. Best, Andy -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Massimo Bressan Sent: Monday, December 02, 2013 6:34 AM To: r-help at r-project.org Subject: [R] interpretation of MDS plot in random forest Given this general example: set.seed(1) data(iris) iris.rf <- randomForest(Species ~ ., iris, proximity=TRUE, keep.forest=TRUE) #varImpPlot(iris.rf) #varUsed(iris.rf) MDSplot(iris.rf, iris$Species) I???ve been reading the documentation about random forest (at best of my - poor - knowledge) but I???m in trouble with the correct interpretation of the MDS plot and I hope someone can give me some clues What is intended for ???the scaling coordinates of the proximity matrix???? I think to understand that the objective is here to present the distance among species in a parsimonious and visual way (of lower dimensionality) Is therefore a parallelism to what are intended the principal components in a classical PCA? Are the scaling coordinates DIM 1 and DIM2 the eigenvectors of the proximity matrix? If that is correct, how would you find the eigenvalues for that eigenvectors? And what are the eigenvalues repreenting? What are saying these two dimensions in the plot about the different iris species? Their relative distance in terms of proximity within the space DIM1 and DIM2? How to choose for the k parameter (number of dimensions for the scaling coordinates)? And finally how would you explain the plot in simple terms? Thank you for any feedback Best regards
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates Direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system.
here it is an amended (more general) version
library(randomForest)
set.seed(1)
data(iris)
iris.rf <- randomForest(Species ~ ., iris, proximity=TRUE, keep.forest=TRUE)
x<-MDSplot(iris.rf, iris$Species)
#add legend
legend("topleft", legend=levels(iris.rf$predicted),
fill=brewer.pal(length(levels(iris.rf$predicted)), "Set1"))
#str(x)
# need to identify points?
text(x$points,labels=attr(x$points,"dimnames")[[1]], cex=0.5)
bye
m
Il 03/12/2013 12:15, mbressan at arpa.veneto.it ha scritto:
sorry, in fact it was a trivial question!
by just peeping into the function I've worked out this simple solution:
MDSplot(iris.rf, iris$Species)
legend("topleft", legend=levels(iris$Species), fill=brewer.pal(3, "Set1"))
thank you
thanks andy it's a real honour form me to get a reply by you; I'm still a bit faraway from a proper grasp of the purpose of the plot... may I ask you for a more technical (trivial) issue? is it possible to add a legend in the MDS plot? my problem is to link the color points in the chart to the factor that was used as response to train rf, how to? best max
Yes, that's part of the intention anyway. One can also use them to do clustering. Best, Andy -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Massimo Bressan Sent: Monday, December 02, 2013 6:34 AM To: r-help at r-project.org Subject: [R] interpretation of MDS plot in random forest Given this general example: set.seed(1) data(iris) iris.rf <- randomForest(Species ~ ., iris, proximity=TRUE, keep.forest=TRUE) #varImpPlot(iris.rf) #varUsed(iris.rf) MDSplot(iris.rf, iris$Species) I???ve been reading the documentation about random forest (at best of my - poor - knowledge) but I???m in trouble with the correct interpretation of the MDS plot and I hope someone can give me some clues What is intended for ???the scaling coordinates of the proximity matrix???? I think to understand that the objective is here to present the distance among species in a parsimonious and visual way (of lower dimensionality) Is therefore a parallelism to what are intended the principal components in a classical PCA? Are the scaling coordinates DIM 1 and DIM2 the eigenvectors of the proximity matrix? If that is correct, how would you find the eigenvalues for that eigenvectors? And what are the eigenvalues repreenting? What are saying these two dimensions in the plot about the different iris species? Their relative distance in terms of proximity within the space DIM1 and DIM2? How to choose for the k parameter (number of dimensions for the scaling coordinates)? And finally how would you explain the plot in simple terms? Thank you for any feedback Best regards
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Notice: This e-mail message, together with any attachments, contains information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New Jersey, USA 08889), and/or its affiliates Direct contact information for affiliates is available at http://www.merck.com/contact/contacts.html) that may be confidential, proprietary copyrighted and/or legally privileged. It is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please notify us immediately by reply e-mail and then delete it from your system.