Back to formatted view
Raw Message

Message-ID: <CAMVOpmPEoxKFn-Xmx01gWr0yqi3ubmbm75WstENM_7ot3ACSew@mail.gmail.com>
Date: 2017-01-18T18:35:31Z
From: Josh Mitteldorf
Subject: PCA in Q- and R-modes

I'm working with proteomic data, helping a student who knows biology and
has done analysis in R without understanding it in depth.

We have 3000 protein levels for 6 ages.  I can treat this as 6 vectors in
3000-dimensional space, diagonalize a 6x6 covariance matrix and find 5
principal components, one zero eigenvalue.  My student has worked with R in
"Q mode" and he enters the transposed matrix as 3000 vectors in
6-dimensional space.  In just a few seconds, R diagonalizes a 3000x3000
matrix!  I can't imagine what that means, to diagonalize a 3000x3000
matrix.  But, of course, there are only 5 degrees of freedom in the data,
so only 5 of the eigenvalues are non-zero, and the other 2995 vectors are
junk.

   Questions:  a) Is there a relationship between the principal components
of the 3000*6 matrix and the principal components of the transposed 6*3000
matrix?
                     b) Is there a way to find the 5 meaningful
eigenvectors without carrying the baggage of diagonalizing the huge
3000-dimensional matrix?
                     c) The big question is which version to analyze and
publish? My student tells me the transposed matrix is the common
procedure.  The two yield very different-looking plots.

Thanks for your help.
- Josh Mitteldorf

	[[alternative HTML version deleted]]