Can someone help me calculate correlations for grouped values?
Heres what my first few line of data look like:
LLID GMID CEXPR MEXPR
1 1005 10831 2.057462 -0.08486
2 1005 10831 2.057515 -0.08486
3 1005 10831 2.057462 0.01209
4 1005 10831 2.057515 0.01209
5 1005 10836 2.050980 0.17237
6 1005 10836 2.018576 0.17237
LLID is gene id, GMID is cell line id, the EXPR columns are gene expression is two different microarray experiments.
I'd like to get correlations for each gene id (1005, 1006, etc)
Heres what I've tried so far:
sapply(by(cmexpr[3:4],cmexpr$LLID, function(x) cor(cmexpr$CEXPR,cmexpr$MEXPR, use = "pairwise.complete.obs")), function(x) x,simplify=T)
I think this is close to what I need, but its giving me the same corr for each gene, which is not right:
1005 10083 10146 10158 10174 10206
-0.04751543 -0.04751543 -0.04751543 -0.04751543 -0.04751543 -0.04751543
10211 10212 10219 10363 10484 10492
-0.04751543 -0.04751543 -0.04751543 -0.04751543 -0.04751543 -0.04751543
Sorry if this sounds like a newbie question; it seems like it ought to be pretty straightforward, but I've wrestled with both the R documenation and mailing list archive and can't seem to get it right.