Skip to content

Regularized Discriminant Analysis scores, anyone?

4 messages · Matthew Fagan, Uwe Ligges

#
Hi all,

I am attempting to do Regularized Discriminant Analysis (RDA) on a large 
dataset, and I want to extract the RDA  discriminant score matrix.  But 
the predict function in the "klaR" package, unlike the predict function 
for LDA in the "MASS" package, doesn't seem to give me an option to 
extract the scores.  Any suggestions?

i have already tried (and failed; ran out of 16 GB of memory) to do this 
with the "rda" package: don't know why, but the klaR package seems to be 
much more efficient with memory.  I have included an example below:

library(klaR)
library(MASS)

data(iris)

x <- rda(Species ~ ., data = iris, gamma = 0.05, lambda = 0.2)
rda1<-predict(x, iris[, 1:4])
str(rda1)

#  This gets you an object with posterior probabilities and classes, but 
no discriminant scores!

#  if you run lda

y <- lda(Species ~ ., data = iris)
lda1<-predict(y, iris[, 1:4])
str(lda1)

head(lda1$x)  #  gets you the discriminant scores for the LDA.  But how 
to do this for RDA?

#  curiously, the QDA function in MASS has this same problem, although 
you can get around it using the rrcov package.

Regards, and thank very much for any help,
Matt
#
On 02.06.2013 05:01, Matthew Fagan wrote:
There are no such scores:

same as for qda, you do not follow the Fisher idea of the linear 
discriminant components any more: Your space is now partitioned by 
ellipsoid like structures based on the estimation of the inner-class 
covariance matrices.

rda as implemented in klaR (see the reference given on the help page) is 
a regularization that helps to overcome problems when estimating 
non-singular covariance matrices for the separate classes.
The rda package provides a completely different regularization 
technique, see the reference given on the help page.

Best,
Uwe Ligges
#
Thank you Dr. Ligges, i very much appreciate the quick reply.  i 
wondered if that was the case, based on the math as I (poorly) 
understood it.  However i remain confused.   page 107 from the "rrcov" 
package PDF makes me think I can derive LDA-style discriminant scores 
for a QDA:

library(rrcov)
data(iris)
qda1<-QdaClassic(x=iris[,1:4], grouping=iris[,5])
pred_qda<-predict(qda1, iris[,1:4])
head(pred_qda at x)
plotdat<-pred_qda at x
plot(plotdat[,1], plotdat[,2])
plot(plotdat[,2], plotdat[,3])

pred_qda$x looks like QDA discriminant scores.   No doubt you are right, 
but if you have a moment, I'd love to know what these scores are and 
what they summarize.

In addition, I have run into this nice set of lengthy R code to manually 
calculate discriminant scores for a QDA:
https://cs.uwaterloo.ca/~a2curtis/courses/2005/ML-classification.pdf

None of this means i can calculate discriminant scores for a RDA, of 
course, but QDA is my back-up choice.

Bottom line: am i am completely misinterpreting what I am seeing here, 
mathematically?  Or is this just the result of different ways of 
implementing QDA in R?

Regards, and thanks again,
Matt
On 6/2/2013 10:39 AM, Uwe Ligges wrote:

  
    
1 day later
#
On 02.06.2013 17:57, Matthew Fagan wrote:
What you see in your code above is the result of the formula on page 2 
of the cited paper. And you need one vector for each class - choosing 
the max value or deciding on the classification.
This corresponds to the posterior probabilities.

You originally asked for the coefficients of the discriminant components 
(i.e. direction in the space that separates the classes according to 
Fisher's criterion in the best way) given in the output of lda() (and 
here you will have max(dimension, number of classes - 1) of them). These 
are very different from the scores you are talking about now and do not 
exists for neither QDA nor RDA.

Please carefully re-read about Fisher LDA and its discriminant components.

Best,
Uwe Ligges