An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101213/76d2a82b/attachment.pl>
inconsistency with cor() - "x must be numeric"
6 messages · Justin Fincher, Erik Iverson, Joshua Wiley +1 more
Please provide a reproducible example! E.g., use ?dput to dump a minimal data.frame that exhibits this issue on the newest version of R.
Justin Fincher wrote:
Howdy,
I have written a small function to generate a simple plot and my
colleague is having an error when attempting to run it. Essentially I loop
through categories in a data frame and take the average value for each
category The categories are in $V1, subset first then mean taken and
concatenated to previous values using rbind(c("label",mean(data$V6)). The
result is a two-column matrix with labels in column one and values in column
two. Within the function I calculate the correlation of column two and
another set of values that are part of the function. On my computer (linux
box running R 2.8.1) the function runs correctly. On my colleague's
computer (Windows box running R 2.12) the function throws an error at the
cor() function call saying that "x must be numeric." We are running on the
exact same data set and source'ing the same function definition. Any help
would be appreciated.
- Fincher
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Fincher, cor() only works on numeric arguments now (as of R 2.11 or 2.10 if memory serves). So, I would update your function to ensure that you are only passing numeric data to cor() and the error should go away (it will probably be easier on you if you can update your version of R to the latest and greatest...quite a bit has changed since 2.8.1). If you post a reproducible example of your function, I'm sure we can help update it. Cheers, Josh
On Mon, Dec 13, 2010 at 1:56 PM, Justin Fincher <fincher at cs.fsu.edu> wrote:
Howdy,
? I have written a small function to generate a simple plot and my
colleague is having an error when attempting to run it. ?Essentially I loop
through categories in a data frame and take the average value for each
category The categories are in $V1, subset first then mean taken and
concatenated to previous values using rbind(c("label",mean(data$V6)). ?The
result is a two-column matrix with labels in column one and values in column
two. ?Within the function I calculate the correlation of column two and
another set of values that are part of the function. ?On my computer (linux
box running R 2.8.1) the function runs correctly. ?On my colleague's
computer (Windows box running R 2.12) the function throws an error at the
cor() function call saying that "x must be numeric." ?We are running on the
exact same data set and source'ing the same function definition. ?Any help
would be appreciated.
- Fincher
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101213/8aa1a40c/attachment.pl>
Hi, I can certainly understand not wanting to be long winded, and no damage done. Here's a link to the R news file: http://cran.stat.ucla.edu/src/base/NEWS and if you search in your browser for "cor() and cov()" you should find what happened. At any rate, I could not fully check your code because: object 'accessibility_data' not found, but my guess would be that you created a matrix (if inadvertently), and at least one of the columns had some character data in it, which would push *all* the data to character class (even though a particular column may be numeric data it is not stored as character). Previously I think cor() did not check this, and would silently convert using as.numeric(). I would look at: str(acc_averages) and I bet you will find that it is not numeric. If this is the case, one fix would be: correlation = cor(as.numeric(acc_averages[,2]), gene_densities$avg_density[1:23]) probably a better fix would be to initiate acc_averages as a data.frame rather than with c(), that way it can store different types of data without moving everything up the hierarchy of classes. To see what I mean look at ?rbind under the heading "Values" the second paragraph. Cheers, Josh
On Mon, Dec 13, 2010 at 2:23 PM, Justin Fincher <fincher at cs.fsu.edu> wrote:
I apologize for the lack of example. ?I was trying not to be too long
winded. ?Below is the first portion of my function that is causing the
error. (I'm including both calls to cor(), though it quits after the first
throws an error). ?I do not believe he has redefined cor() as he is a novice
user and we tried this after starting a fresh session. ?And I will look into
upgrading. ?I realize it is a little out of date since it is the version in
the repository for my distribution and not the latest-and-greatest from R.
?I just didn't realize a change like that would be made that would
(seemingly to me) reduce functionality. Thank you again for your help.
- Fincher
?? # As they don't change, hard code gene density values
?? gene_densities =
data.frame(chrom=c("chr1","chr2","chr3","chr4","chr5","chr6","chr7",
"chr8","chr9","chr10","chr11","chr12","chr13",
"chr14","chr15","chr16","chr17","chr18","chr19",
"chr20","chr21","chr22","chrX","chrY"),
?avg_density=c(10.19,6.457,6.71,4.917,6.083,7.491,7.453,
?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 5.939,7.27,7.132,11.38,9.429,3.757,
?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 7.607,8.455,11.81,17.84,4.649,26.52,
?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 11.19,6.51,11.28,7.535,2.931))
?? acc_averages = c()
?? # subset out relevant data
?? accessibility_data = subset(accessibility_data,
accessibility_data$V9==";color=000000")
?? # calculate mean accessibility value for each chromosome
?? for(i in seq(1,22)){
?? ? ?sub = paste("chr",i,sep="")
?? ? ?temp = subset(accessibility_data,accessibility_data$V1==sub)
?? ? ?acc_averages = rbind(acc_averages,c(sub,as.double(mean(temp$V6))))
?? }
?? temp = subset(accessibility_data,accessibility_data$V1=="chrX")
?? acc_averages = rbind(acc_averages,c("chrX",as.double(mean(temp$V6))))
?? # Output the correlation without including chromosome Y
?? correlation = cor(acc_averages[,2],gene_densities$avg_density[1:23])
?? cat("Correlation w/o chrY:",correlation,'\n')
?? temp = subset(accessibility_data,accessibility_data$V1=="chrY")
?? acc_averages = rbind(acc_averages,c("chrY",mean(temp$V6)))
?? # Output overall correlation
?? correlation = cor(acc_averages[,2],gene_densities$avg_density)
?? cat("Correlation w/chrY:",correlation,'\n')
On Mon, Dec 13, 2010 at 17:06, Joshua Wiley <jwiley.psych at gmail.com> wrote:
Hi Fincher, cor() only works on numeric arguments now (as of R 2.11 or 2.10 if memory serves). ?So, I would update your function to ensure that you are only passing numeric data to cor() and the error should go away (it will probably be easier on you if you can update your version of R to the latest and greatest...quite a bit has changed since 2.8.1). ?If you post a reproducible example of your function, I'm sure we can help update it. Cheers, Josh On Mon, Dec 13, 2010 at 1:56 PM, Justin Fincher <fincher at cs.fsu.edu> wrote:
Howdy,
? I have written a small function to generate a simple plot and my
colleague is having an error when attempting to run it. ?Essentially I
loop
through categories in a data frame and take the average value for each
category The categories are in $V1, subset first then mean taken and
concatenated to previous values using rbind(c("label",mean(data$V6)).
?The
result is a two-column matrix with labels in column one and values in
column
two. ?Within the function I calculate the correlation of column two and
another set of values that are part of the function. ?On my computer
(linux
box running R 2.8.1) the function runs correctly. ?On my colleague's
computer (Windows box running R 2.12) the function throws an error at
the
cor() function call saying that "x must be numeric." ?We are running on
the
exact same data set and source'ing the same function definition. ?Any
help
would be appreciated.
- Fincher
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
On Dec 13, 2010, at 23:23 , Justin Fincher wrote:
I apologize for the lack of example. I was trying not to be too long winded. Below is the first portion of my function that is causing the error. (I'm including both calls to cor(), though it quits after the first throws an error). I do not believe he has redefined cor() as he is a novice user and we tried this after starting a fresh session. And I will look into upgrading. I realize it is a little out of date since it is the version in the repository for my distribution and not the latest-and-greatest from R. I just didn't realize a change like that would be made that would (seemingly to me) reduce functionality. Thank you again for your help.
Well, let me put it this way: Once you realize what you are doing, you will appreciate that R is not letting you do that anymore...
- Fincher
# As they don't change, hard code gene density values
gene_densities =
data.frame(chrom=c("chr1","chr2","chr3","chr4","chr5","chr6","chr7",
"chr8","chr9","chr10","chr11","chr12","chr13",
"chr14","chr15","chr16","chr17","chr18","chr19",
"chr20","chr21","chr22","chrX","chrY"),
avg_density=c(10.19,6.457,6.71,4.917,6.083,7.491,7.453,
5.939,7.27,7.132,11.38,9.429,3.757,
7.607,8.455,11.81,17.84,4.649,26.52,
11.19,6.51,11.28,7.535,2.931))
acc_averages = c()
# subset out relevant data
accessibility_data = subset(accessibility_data,
accessibility_data$V9==";color=000000")
# calculate mean accessibility value for each chromosome
for(i in seq(1,22)){
sub = paste("chr",i,sep="")
temp = subset(accessibility_data,accessibility_data$V1==sub)
acc_averages = rbind(acc_averages,c(sub,as.double(mean(temp$V6))))
}
temp = subset(accessibility_data,accessibility_data$V1=="chrX")
acc_averages = rbind(acc_averages,c("chrX",as.double(mean(temp$V6))))
This and the similar line 3 lines earlier is the culprit. The c() construct creates a character vector because its 1st argument is character. Hence, acc_averages is a character matrix. Now, are you _sure_ you know what happens if you correlate something with the character vector acc_averages[,2]? It may have given you the right thing for Pearson correlations, but it certainly did not for rank correlations pre 2.11.0, leading to a "non-bug report" and the subsequent check for numeric data. What happened then was that ranks were based on the _alphabetical_ ordering of data! I'm fairly confident that you'd really want to do the whole thing with a suitable aggregate() call, but for now, how about just keeping the labels and the values in two separate vectors?
Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com