Skip to content

Doubt about CCA and PCA

3 messages · Francisco Javier Santos Alamillos, Jombart, Thibaut, Jari Oksanen

#
Dear Francisco, 

CCA and PCA are quite different methods. CCA regresses your 'response' data onto a set of explanatory variables. This needs to invert the matrix of covariances of the predictors, which is only possible if n>p, where n is the number of observations and p the number of explanatory variables.

PCA is defined in any case. The ratio between n and p is then relevant only if you intend to infer principal axes / component of the population (as opposed to using the PA/PC as mere descriptors of the sample). I would recommend reading :
Joliffe, I. T. Principal Component Analysis Springer, 2004
which tackles the latter point very clearly.

Best regards,

Thibaut.
--
######################################
Dr Thibaut JOMBART
MRC Centre for Outbreak Analysis and Modelling
Department of Infectious Disease Epidemiology
Imperial College - Faculty of Medicine
St Mary?s Campus
Norfolk Place
London W2 1PG
United Kingdom
Tel. : 0044 (0)20 7594 3658
t.jombart at imperial.ac.uk
http://www1.imperial.ac.uk/medicine/people/t.jombart/
http://adegenet.r-forge.r-project.org/
#
Jombart, Thibaut <t.jombart <at> imperial.ac.uk> writes:
onto a set of explanatory
which is only possible if
variables.
you intend to infer principal
descriptors of the sample). I
Francisco,

First assumption: "temporal values" refers to the number of rows. With that
assumption, it is *not* necessary to have more rows than columns in PCA (more
about CCA below). It depends on the implementation, and in R function prcomp()
is implemented so that this is not necessary whereas princomp() is implemented
so that you indeed need more rows (observations) than columns (variables). The
number of eigenvalues will be less than number of variables if you have rank
deficit data with lower number of rows than columns.

Then about CCA. First thing is that you should tell us what is CCA. This is an
ambiguous acronym which usually refers either to constrained ("canonical")
correspondence analysis or canonical correlation analysis. The first is simpler
and does not have the constraint you mentioned, but the latter is
computationally more complicated and may need a special implementation for rank
deficit data. There are further complications, but I won't guess anything about
them before I get more details. 

Cheers, Jari Oksanen