Skip to content
Prev 256437 / 398506 Next

In need of help with correlations

On Sat, Apr 9, 2011 at 10:24 AM, Sean Farris <farrissp2 at vcu.edu> wrote:
Sean,

I'm the maintainer of the package WGCNA that does correlation network
analysis of gene expression data. I recommend you check out the
package and the tutorials at

http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA/Tutorials/index.html

The package contains a couple useful functions for correlation
p-values. Unlike cor.test which only takes two vectors (not matrices),
you can use the function corAndPvalue to calculate Pearson
correlations and the corresponding p-values for matrices. If you
already have the correlation matrix pre-calculated AND you have no
missing data (i.e., constant number of observations), you can also use
corPvalueStudent to calculate the p-values.

We don't use Spearman correlations much (we prefer the biweight
midcorrelation, functions bicor and bicorAndPvalue, as a robust
alternative to Pearson correlation), but you can approximate the
Spearman p-values by the Student p-values (that are used for Pearson
correlations). Statisticians who read this, please don't execute me
for this suggestion :)

To use the function cor(), you need to transpose the data so that
genes are in columns and samples in rows.
Just be aware that to correlate all probe sets at a time you need a
40k+ times 40k+ matrix to hold the result. Only a large computer (at
least 32GB of memory, possibly needing 64GB) will be able to handle
such a matrix and the necessary manipulations. The WGCNA package
contains methods to construct co-expression networks on such big sets
if necessary.

HTH,

Peter