linear discriminant analysis in MASS - R-help

Mon, Feb 20, 2006 3:21 PM #

Hello R people

I now know how to run my discriminant analysis with the lda function in 
MASS:
lda.alain=lda(Groupes ~ Ht.D0 + Lc.Dc + Ram + IDF, gr, CV = FALSE)
and it works fine.

But I am missing a test and cannot find any help on how to get it, if it 
exist.

The "S" equivalent:
discrim(structure(.Data = Groupes ~ Ht.D0 + Lc.Dc + Ram + IDF, class = 
"formula"), data = gr, family = Canonical(cov.structure = 
"homoscedastic"), na.action = na.omit, prior = "proportional")
outputs a nice matrix of Mahalanobis distances between groups and even 
tests (Hotelling's T Squared) for significant distances.

Why don't I just take the "S" output you say?  Because like you, I'd 
rather put in my paper that I did it using R of course!
Does anyone know of a way to get this test out of lda?  Or of another R 
package that does it?

Thanks
Alain
(on peut me r??pondre en fran??ais aussi, ??videmment!)

Alain Paquette
Laboratoire d'??cologie v??g??tale
Institut de recherche en biologie v??g??tale
Universit?? de Montr??al
4101 rue Sherbrooke Est
Montr??al (Qu??bec) H1X 2B2
 
alain.paquette at umontreal.ca
labo (514) 872-8488
fax (514) 872-9406
http://www.irbv.umontreal.ca/francais/personnel/cogliastro-paquette.htm

Brian Ripley

Tue, Feb 21, 2006 12:01 AM #

On Mon, 20 Feb 2006, Alain Paquette wrote:

CV=FALSE is the default and so not needed.

There is no such function in S, and I rather object as the S equivalent is 
lda() (and as the author of both I should know).  Credit where credit is 
due: discrim() is an S-PLUS function, indebted to lda().

Well, it seems not to.  That is part of the output of the summary() 
method, which itself calls the multicomp() method.

No `of course' applies. If you learnt of this output from S-PLUS, I urge 
you to credit it honestly and accurately.  (If you refer to lda, you 
should credit that, not just R.)

Mahalanobis distance between groups is easy, as this is just Euclidean 
distance between group centres in the scaled space.  The test statistics 
can be produced, but

- they are critically dependent on the unrealistic assumptions of 
multivariate normality and variance homogeneity and

- there needs to be an adjustment for multiple comparisons.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Alain Paquette

Wed, Feb 22, 2006 6:23 AM #

Dear Prof. Ripley
I'm sorry about the confusion; this reply will simply avoid any humor 
attempts (good or bad).

About "S"
I'm sorry, as a "user" I was not aware of any "S" still existing outside 
of s-plus or R.  So your right, the procedure I was referring to was 
conducted on s-plus.  I used the GUI to construct the analysis, so I 
really don't know if the discrim() procedure I copied from the "command" 
window is accurate.  But when I re-run the analysis with that as the 
command line, I get the same results.  And it does provide a matrix of 
Mahalanobis distances between groups and a test of their significance 
(Hotelling's T Squared for Differences in Means Between Each Group).

About the credits
My data set is on JMP (SAS).  It's great at manipulating and exploring 
data sets.  The software does allow for many analysis types too, so my 
very first discriminant analysis was actually on JMP.  But like many GUI 
softwares, it lacks options.  JMP approaches the distance problem by 
drawing 95% confidence interval spheres around group means.  Thats very 
nice (although it doesn't account for multiple comparisons) for LDA 
problems with few groups, but I have 12 so it became messy 
(graphically).  Besides, I have the - I think very healthy - problem of 
never trusting just one software, especially the black box type, for my 
analysis.

I was also accumulating literature on the subject (ecophysiology of 
trees, not statistics!) and I came across this paper

Delagrange, S., Messier, C., Lechowicz, M.J. and Dizengremel, P. 2004. 
Physiological, morphological and allocational plasticity in understory 
deciduous trees: importance of plant size and light availability. Tree 
Physiol. 24(7): 775-784.

which presented a test on Mahalanobis distances from LDA analysis.  Now 
they used SAS (CAN-DISC with the ANOVA option) for their analysis.  I 
tried it on R (lda in MASS and discrimin in ade4), without success (I 
get the discriminant analysis, but not the test).  So I tried it on 
S-PLUS, and voil??!  You could say that actually my first encounter with 
the procedure was with SAS, then on R, and only then on S-PLUS.

I use the "vegan" package a lot for permutational statistics, as well as 
code developed at Pierre Legendre's lab, and I cite them accordingly, 
just like I believe I did with lda in MASS in the present e-mail.  
Thanks for your advice on multiple comparisons and normality.  By the 
way, the s-plus procedure also outputs normality and co-variance tests.  
I do have multiple normality, but for now (!), I have covariance 
heterogeneity.   I was of course planning on a Dunn-Sidak correction for 
multiple comparisons.

Thank you for the quick reply,
Alain


Prof Brian Ripley a ??crit :

On Mon, 20 Feb 2006, Alain Paquette wrote:

Hello R people

I now know how to run my discriminant analysis with the lda function in
MASS:
lda.alain=lda(Groupes ~ Ht.D0 + Lc.Dc + Ram + IDF, gr, CV = FALSE)
and it works fine.

CV=FALSE is the default and so not needed.

But I am missing a test and cannot find any help on how to get it, if it
exist.

The "S" equivalent:

There is no such function in S, and I rather object as the S 
equivalent is lda() (and as the author of both I should know).  Credit 
where credit is due: discrim() is an S-PLUS function, indebted to lda().

discrim(structure(.Data = Groupes ~ Ht.D0 + Lc.Dc + Ram + IDF, class =
"formula"), data = gr, family = Canonical(cov.structure =
"homoscedastic"), na.action = na.omit, prior = "proportional")
outputs a nice matrix of Mahalanobis distances between groups and even
tests (Hotelling's T Squared) for significant distances.

Well, it seems not to.  That is part of the output of the summary() 
method, which itself calls the multicomp() method.

Why don't I just take the "S" output you say?  Because like you, I'd
rather put in my paper that I did it using R of course!

No `of course' applies. If you learnt of this output from S-PLUS, I 
urge you to credit it honestly and accurately.  (If you refer to lda, 
you should credit that, not just R.)

Does anyone know of a way to get this test out of lda?  Or of another R
package that does it?

Mahalanobis distance between groups is easy, as this is just Euclidean 
distance between group centres in the scaled space.  The test 
statistics can be produced, but

- they are critically dependent on the unrealistic assumptions of 
multivariate normality and variance homogeneity and

- there needs to be an adjustment for multiple comparisons.

Alain Paquette
Laboratoire d'??cologie v??g??tale
Institut de recherche en biologie v??g??tale
Universit?? de Montr??al
4101 rue Sherbrooke Est
Montr??al (Qu??bec) H1X 2B2
 
alain.paquette at umontreal.ca
labo (514) 872-8488
fax (514) 872-9406
http://www.irbv.umontreal.ca/francais/personnel/cogliastro-paquette.htm