Hi all, I am attempting to do Regularized Discriminant Analysis (RDA) on a large dataset, and I want to extract the RDA discriminant score matrix. But the predict function in the "klaR" package, unlike the predict function for LDA in the "MASS" package, doesn't seem to give me an option to extract the scores. Any suggestions? i have already tried (and failed; ran out of 16 GB of memory) to do this with the "rda" package: don't know why, but the klaR package seems to be much more efficient with memory. I have included an example below: library(klaR) library(MASS) data(iris) x <- rda(Species ~ ., data = iris, gamma = 0.05, lambda = 0.2) rda1<-predict(x, iris[, 1:4]) str(rda1) # This gets you an object with posterior probabilities and classes, but no discriminant scores! # if you run lda y <- lda(Species ~ ., data = iris) lda1<-predict(y, iris[, 1:4]) str(lda1) head(lda1$x) # gets you the discriminant scores for the LDA. But how to do this for RDA? # curiously, the QDA function in MASS has this same problem, although you can get around it using the rrcov package. Regards, and thank very much for any help, Matt
Regularized Discriminant Analysis scores, anyone?
4 messages · Matthew Fagan, Uwe Ligges
On 02.06.2013 05:01, Matthew Fagan wrote:
Hi all, I am attempting to do Regularized Discriminant Analysis (RDA) on a large dataset, and I want to extract the RDA discriminant score matrix. But the predict function in the "klaR" package, unlike the predict function for LDA in the "MASS" package, doesn't seem to give me an option to extract the scores. Any suggestions?
There are no such scores: same as for qda, you do not follow the Fisher idea of the linear discriminant components any more: Your space is now partitioned by ellipsoid like structures based on the estimation of the inner-class covariance matrices. rda as implemented in klaR (see the reference given on the help page) is a regularization that helps to overcome problems when estimating non-singular covariance matrices for the separate classes.
i have already tried (and failed; ran out of 16 GB of memory) to do this with the "rda" package: don't know why, but the klaR package seems to be much more efficient with memory. I have included an example below:
The rda package provides a completely different regularization technique, see the reference given on the help page. Best, Uwe Ligges
library(klaR) library(MASS) data(iris) x <- rda(Species ~ ., data = iris, gamma = 0.05, lambda = 0.2) rda1<-predict(x, iris[, 1:4]) str(rda1) # This gets you an object with posterior probabilities and classes, but no discriminant scores! # if you run lda y <- lda(Species ~ ., data = iris) lda1<-predict(y, iris[, 1:4]) str(lda1) head(lda1$x) # gets you the discriminant scores for the LDA. But how to do this for RDA? # curiously, the QDA function in MASS has this same problem, although you can get around it using the rrcov package. Regards, and thank very much for any help, Matt
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thank you Dr. Ligges, i very much appreciate the quick reply. i wondered if that was the case, based on the math as I (poorly) understood it. However i remain confused. page 107 from the "rrcov" package PDF makes me think I can derive LDA-style discriminant scores for a QDA: library(rrcov) data(iris) qda1<-QdaClassic(x=iris[,1:4], grouping=iris[,5]) pred_qda<-predict(qda1, iris[,1:4]) head(pred_qda at x) plotdat<-pred_qda at x plot(plotdat[,1], plotdat[,2]) plot(plotdat[,2], plotdat[,3]) pred_qda$x looks like QDA discriminant scores. No doubt you are right, but if you have a moment, I'd love to know what these scores are and what they summarize. In addition, I have run into this nice set of lengthy R code to manually calculate discriminant scores for a QDA: https://cs.uwaterloo.ca/~a2curtis/courses/2005/ML-classification.pdf None of this means i can calculate discriminant scores for a RDA, of course, but QDA is my back-up choice. Bottom line: am i am completely misinterpreting what I am seeing here, mathematically? Or is this just the result of different ways of implementing QDA in R? Regards, and thanks again, Matt
On 6/2/2013 10:39 AM, Uwe Ligges wrote:
On 02.06.2013 05:01, Matthew Fagan wrote:
Hi all, I am attempting to do Regularized Discriminant Analysis (RDA) on a large dataset, and I want to extract the RDA discriminant score matrix. But the predict function in the "klaR" package, unlike the predict function for LDA in the "MASS" package, doesn't seem to give me an option to extract the scores. Any suggestions?
There are no such scores: same as for qda, you do not follow the Fisher idea of the linear discriminant components any more: Your space is now partitioned by ellipsoid like structures based on the estimation of the inner-class covariance matrices. rda as implemented in klaR (see the reference given on the help page) is a regularization that helps to overcome problems when estimating non-singular covariance matrices for the separate classes.
i have already tried (and failed; ran out of 16 GB of memory) to do this with the "rda" package: don't know why, but the klaR package seems to be much more efficient with memory. I have included an example below:
The rda package provides a completely different regularization technique, see the reference given on the help page. Best, Uwe Ligges
library(klaR) library(MASS) data(iris) x <- rda(Species ~ ., data = iris, gamma = 0.05, lambda = 0.2) rda1<-predict(x, iris[, 1:4]) str(rda1) # This gets you an object with posterior probabilities and classes, but no discriminant scores! # if you run lda y <- lda(Species ~ ., data = iris) lda1<-predict(y, iris[, 1:4]) str(lda1) head(lda1$x) # gets you the discriminant scores for the LDA. But how to do this for RDA? # curiously, the QDA function in MASS has this same problem, although you can get around it using the rrcov package. Regards, and thank very much for any help, Matt
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Matthew Fagan Columbia University Department of Ecology, Evolution, and Environmental Biology 512-569-1417 (cell/home) (212) 854-9987 (office) (212) 854-8188 (fax)
1 day later
On 02.06.2013 17:57, Matthew Fagan wrote:
Thank you Dr. Ligges, i very much appreciate the quick reply. i wondered if that was the case, based on the math as I (poorly) understood it. However i remain confused. page 107 from the "rrcov" package PDF makes me think I can derive LDA-style discriminant scores for a QDA: library(rrcov) data(iris) qda1<-QdaClassic(x=iris[,1:4], grouping=iris[,5]) pred_qda<-predict(qda1, iris[,1:4]) head(pred_qda at x) plotdat<-pred_qda at x plot(plotdat[,1], plotdat[,2]) plot(plotdat[,2], plotdat[,3]) pred_qda$x looks like QDA discriminant scores. No doubt you are right, but if you have a moment, I'd love to know what these scores are and what they summarize. In addition, I have run into this nice set of lengthy R code to manually calculate discriminant scores for a QDA: https://cs.uwaterloo.ca/~a2curtis/courses/2005/ML-classification.pdf None of this means i can calculate discriminant scores for a RDA, of course, but QDA is my back-up choice. Bottom line: am i am completely misinterpreting what I am seeing here, mathematically? Or is this just the result of different ways of implementing QDA in R?
What you see in your code above is the result of the formula on page 2 of the cited paper. And you need one vector for each class - choosing the max value or deciding on the classification. This corresponds to the posterior probabilities. You originally asked for the coefficients of the discriminant components (i.e. direction in the space that separates the classes according to Fisher's criterion in the best way) given in the output of lda() (and here you will have max(dimension, number of classes - 1) of them). These are very different from the scores you are talking about now and do not exists for neither QDA nor RDA. Please carefully re-read about Fisher LDA and its discriminant components. Best, Uwe Ligges
Regards, and thanks again, Matt On 6/2/2013 10:39 AM, Uwe Ligges wrote:
On 02.06.2013 05:01, Matthew Fagan wrote:
Hi all, I am attempting to do Regularized Discriminant Analysis (RDA) on a large dataset, and I want to extract the RDA discriminant score matrix. But the predict function in the "klaR" package, unlike the predict function for LDA in the "MASS" package, doesn't seem to give me an option to extract the scores. Any suggestions?
There are no such scores: same as for qda, you do not follow the Fisher idea of the linear discriminant components any more: Your space is now partitioned by ellipsoid like structures based on the estimation of the inner-class covariance matrices. rda as implemented in klaR (see the reference given on the help page) is a regularization that helps to overcome problems when estimating non-singular covariance matrices for the separate classes.
i have already tried (and failed; ran out of 16 GB of memory) to do this with the "rda" package: don't know why, but the klaR package seems to be much more efficient with memory. I have included an example below:
The rda package provides a completely different regularization technique, see the reference given on the help page. Best, Uwe Ligges
library(klaR) library(MASS) data(iris) x <- rda(Species ~ ., data = iris, gamma = 0.05, lambda = 0.2) rda1<-predict(x, iris[, 1:4]) str(rda1) # This gets you an object with posterior probabilities and classes, but no discriminant scores! # if you run lda y <- lda(Species ~ ., data = iris) lda1<-predict(y, iris[, 1:4]) str(lda1) head(lda1$x) # gets you the discriminant scores for the LDA. But how to do this for RDA? # curiously, the QDA function in MASS has this same problem, although you can get around it using the rrcov package. Regards, and thank very much for any help, Matt
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.