Doubt about CCA and PCA

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20091123/da18b8b1/attachment-0001.pl>
Dear Francisco, 

CCA and PCA are quite different methods. CCA regresses your 'response' data onto a set of explanatory variables. This needs to invert the matrix of covariances of the predictors, which is only possible if n>p, where n is the number of observations and p the number of explanatory variables.

PCA is defined in any case. The ratio between n and p is then relevant only if you intend to infer principal axes / component of the population (as opposed to using the PA/PC as mere descriptors of the sample). I would recommend reading :
Joliffe, I. T. Principal Component Analysis Springer, 2004
which tackles the latter point very clearly.

Best regards,

Thibaut.
--
######################################
Dr Thibaut JOMBART
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - Faculty of Medicine
St Mary?s Campus
Norfolk Place
London W2 1PG
United Kingdom
Tel. : 0044 (0)20 7594 3658
t.jombart at imperial.ac.uk
http://www1.imperial.ac.uk/medicine/people/t.jombart/
http://adegenet.r-forge.r-project.org/
Dear R community,

I'm working with PCA and CCA methods, and I have a theoretical question.

Why is it necesary to have more temporal values than variables when the CCA
O PCA are going to be used?

Could you advise to me some any paper about it?

Thanks in advance,

        [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Jombart, Thibaut <t.jombart <at> imperial.ac.uk> writes:
Dear Francisco, 

CCA and PCA are quite different methods. CCA regresses your 'response' data
onto a set of explanatory
variables. This needs to invert the matrix of covariances of the predictors,
which is only possible if
n>p, where n is the number of observations and p the number of explanatory
variables.
PCA is defined in any case. The ratio between n and p is then relevant only if
you intend to infer principal
axes / component of the population (as opposed to using the PA/PC as mere
descriptors of the sample). I
would recommend reading :
Joliffe, I. T. Principal Component Analysis Springer, 2004
which tackles the latter point very clearly.

Dear R community,

I'm working with PCA and CCA methods, and I have a theoretical question.

Why is it necesary to have more temporal values than variables when the CCA
O PCA are going to be used?

Could you advise to me some any paper about it?

Francisco,

First assumption: "temporal values" refers to the number of rows. With that
assumption, it is *not* necessary to have more rows than columns in PCA (more
about CCA below). It depends on the implementation, and in R function prcomp()
is implemented so that this is not necessary whereas princomp() is implemented
so that you indeed need more rows (observations) than columns (variables). The
number of eigenvalues will be less than number of variables if you have rank
deficit data with lower number of rows than columns.

Then about CCA. First thing is that you should tell us what is CCA. This is an
ambiguous acronym which usually refers either to constrained ("canonical")
correspondence analysis or canonical correlation analysis. The first is simpler
and does not have the constraint you mentioned, but the latter is
computationally more complicated and may need a special implementation for rank
deficit data. There are further complications, but I won't guess anything about
them before I get more details. 

Cheers, Jari Oksanen