Odp: In need of help with correlations
Hi r-help-bounces at r-project.org napsal dne 09.04.2011 19:24:38:
I am in need of someone's help in correlating gene expression. I'm
somewhat
new to R, and can't seem to find anyone local to help me with what I
think
is a simple problem. I need to obtain pearson and spearman correlation coefficients, and corresponding p-values for all of the genes in my dataset that correlate
to
one specific gene of interest. I'm working with mouse Affymetrix Mouse
430
2.0 arrays, so I've got about 45,000 probesets (rows; with 1st column containing identifiers) and 30 biological replicates (columns; with the
top
row containing the header information).
I've looked through several Intro manuals and the R help files.
I know that "cor(x,y, use ="everything", method = c("pearson")) " can
help
obtain the coefficients. I also know that "cor.test()" is supposed to test the significance of a single correlation coefficients. I've also found the bioconductor package "genefilter" / "genefinder"
that
looks for correlations to a given gene (although I can't get it to
work).
So far I've been able to:
#Read in the csv file
data<-read.csv("my data.csv")
#Check the dimensions, names, class, fix(data) to ensure the file was
loaded properly
dim(data)
names(data)
class(data)
fix(data)
#So far I've been able to successfully correlate the entire 'column'
matrix
through:
x <- data[,2:30]
y <- data[,2:30]
corr.data<-cor(x,y, use = "everything", method = c("pearson"))
write.csv(corr.data, file = "correlation of my data by columns.csv")
-----------------------------------
Now if I try and run the 'cor.test()' function on the same matrix, I get
and
error message with 'x' must be a numeric vector. This I don't
understand.
In cor.test help page it is said
x, y: numeric vectors of data values. ?x? and ?y? must have the
same length.
however your data[,2:30] is most probably data frame, see
str(data[,2:20])
To be able to do cor.test you need to do cor.test like
cor.test(data[,2], data[,3])
or to do it in some cycle (untested)
result <- matrix(NA, 20,20)
for( i in 2:20) {
for(j in i+1:20) {
result[i,j] <- cor.test(data[,i], data[,j])
}}
But most probably there are other ways.
Regards
Petr
And this is not my goal, but rather me trying to learn how to go about
doing
correlation analysis in R. I've also tried transposing the data.frame using
"as.data.frame(t(data))"
and doing so gives the same error message as above. Can anyone help me with figuring out how to conduct a correlation
analysis
for specific gene/probeset, and help me understand why I get the above
error
message? I know it probably is a simple analysis, that is probably just
over
my head right now since I'm still new to R. But I can't figure it out
and
have been trying with a bunch of different variations for the past week. Thank you in advance for your help. [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.