inconsistency with cor() - "x must be numeric"
On Dec 13, 2010, at 23:23 , Justin Fincher wrote:
I apologize for the lack of example. I was trying not to be too long winded. Below is the first portion of my function that is causing the error. (I'm including both calls to cor(), though it quits after the first throws an error). I do not believe he has redefined cor() as he is a novice user and we tried this after starting a fresh session. And I will look into upgrading. I realize it is a little out of date since it is the version in the repository for my distribution and not the latest-and-greatest from R. I just didn't realize a change like that would be made that would (seemingly to me) reduce functionality. Thank you again for your help.
Well, let me put it this way: Once you realize what you are doing, you will appreciate that R is not letting you do that anymore...
- Fincher
# As they don't change, hard code gene density values
gene_densities =
data.frame(chrom=c("chr1","chr2","chr3","chr4","chr5","chr6","chr7",
"chr8","chr9","chr10","chr11","chr12","chr13",
"chr14","chr15","chr16","chr17","chr18","chr19",
"chr20","chr21","chr22","chrX","chrY"),
avg_density=c(10.19,6.457,6.71,4.917,6.083,7.491,7.453,
5.939,7.27,7.132,11.38,9.429,3.757,
7.607,8.455,11.81,17.84,4.649,26.52,
11.19,6.51,11.28,7.535,2.931))
acc_averages = c()
# subset out relevant data
accessibility_data = subset(accessibility_data,
accessibility_data$V9==";color=000000")
# calculate mean accessibility value for each chromosome
for(i in seq(1,22)){
sub = paste("chr",i,sep="")
temp = subset(accessibility_data,accessibility_data$V1==sub)
acc_averages = rbind(acc_averages,c(sub,as.double(mean(temp$V6))))
}
temp = subset(accessibility_data,accessibility_data$V1=="chrX")
acc_averages = rbind(acc_averages,c("chrX",as.double(mean(temp$V6))))
This and the similar line 3 lines earlier is the culprit. The c() construct creates a character vector because its 1st argument is character. Hence, acc_averages is a character matrix. Now, are you _sure_ you know what happens if you correlate something with the character vector acc_averages[,2]? It may have given you the right thing for Pearson correlations, but it certainly did not for rank correlations pre 2.11.0, leading to a "non-bug report" and the subsequent check for numeric data. What happened then was that ranks were based on the _alphabetical_ ordering of data! I'm fairly confident that you'd really want to do the whole thing with a suitable aggregate() call, but for now, how about just keeping the labels and the values in two separate vectors?
Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com