Hello, I would like to get a correlation coefficient (R-squared) for my model. I don't know how to calculate it in R. What I've done so far: x<-8.5:32.5 #Vektor x y<-c(NA ,5.88 , 6.95 , 7.2 , 7.66 , 8.02 , 8.44 , 9.06, 9.65, 10.22 , 10.63 ,11.06, 11.37, 11.91 ,12.28, 12.69 ,13.07 , 13.5 , 13.3 ,14.14 , NA , NA , NA , NA , NA) #Vektor y plot(y~x,col="green",pch=16,ylim=c(0,20),xlim=c(0,50)) (mod1<-nls(y~a+b*log(x,base=exp(1)),start=list(a=1,b=1),trace=TRUE)) xx<-seq(min(x),max(x),length=100) yy<-6.2456*log(xx)-7.7822 lines(xx,yy,col="blue1") summary(mod1) This way I don't get R-squared like I do using the command "lm" for linear models. Would appreciate your help, Benedikt N. --
correlation coefficient
6 messages · Benedikt Niesterok, Martin Maechler, Bert Gunter +2 more
"BN" == Benedikt Niesterok <KleinerHaifisch at gmx.net>
on Tue, 28 Apr 2009 15:33:02 +0200 writes:
BN> Hello,
BN> I would like to get a correlation coefficient (R-squared) for my model.
{{ arrrgh... how many people think they "need" an R^2 when they
fit a model ?? }}
BN> I don't know how to calculate it in R.
BN> What I've done so far:
BN> x<-8.5:32.5 #Vektor x
BN> y<-c(NA ,5.88 , 6.95 , 7.2 , 7.66 , 8.02 , 8.44 , 9.06, 9.65, 10.22 ,
BN> 10.63 ,11.06, 11.37, 11.91 ,12.28, 12.69 ,13.07 , 13.5 , 13.3 ,14.14 , NA , NA , NA , NA , NA) #Vektor y
BN> plot(y~x,col="green",pch=16,ylim=c(0,20),xlim=c(0,50))
BN> (mod1<-nls(y~a+b*log(x,base=exp(1)),start=list(a=1,b=1),trace=TRUE))
This is a very *LINEAR* model.
Why don't you use lm()?
Then you'd even get your beloved R-squared ...
BN> xx<-seq(min(x),max(x),length=100)
BN> yy<-6.2456*log(xx)-7.7822
BN> lines(xx,yy,col="blue1")
BN> summary(mod1)
BN> This way I don't get R-squared like I do using the command "lm" for linear
BN> models.
In general, R^2 is *NOT* easily defined for non-linear models.
R^2 is only defined if you have a nested sub-model, aka "null-model".
For linear models (*WITH* an intercept (!)), the sub-model is
naturally y ~ 1.
For general nonlinear models, the only simple sub-model is
'y ~ 0' which is often ridiculous to take as null-model, and
hence not taken by default.
More more on this, e.g. almost 7 years ago on R-help:
https://stat.ethz.ch/pipermail/r-help/2002-July/023461.html
Martin
Dear Colleagues: Martin's reply provides an appropriate response, so nothing to add. But my questions dig deeper: Why do so many (presumably nonstatisticians, but ?) belong to this R^2 religion? Is it because: 1) This is what they are taught in their Stat 101 courses by statisticians? 2) ... by "pseudo"statisticians in their own professions (no disrespect intended here -- just want to make a clear distinction)? 3) It's the prevailing culture of their discipline (journal requirements, part of their standard texts, etc.)? 4) What all "standard" statistical textbooks say? 5) ... ? Also, if one believes this religious practice is counterproductive, how would one go about changing it? FEEL FREE TO REPLY OFF-LIST, AS IT IS PROBABLY INAPPROPRIATE TO WASTE R-HELP BANDWIDTH ON THIS. ALSO FEEL FREE TO REFER ME INSTEAD TO ANOTHER DISCUSSION SITE (E.G. ON STATISTICAL TEACHING) WHERE THIS HORSE MAY HAVE ALREADY BEEN FLAYED. Thanks. Bert Gunter Genentech Nonclinical Biostatistics
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Martin Maechler Sent: Tuesday, April 28, 2009 8:22 AM To: Benedikt Niesterok Cc: r-help at r-project.org Subject: Re: [R] correlation coefficient
"BN" == Benedikt Niesterok <KleinerHaifisch at gmx.net>
on Tue, 28 Apr 2009 15:33:02 +0200 writes:
BN> Hello,
BN> I would like to get a correlation coefficient
(R-squared) for my model.
{{ arrrgh... how many people think they "need" an R^2 when they
fit a model ?? }}
BN> I don't know how to calculate it in R.
BN> What I've done so far:
BN> x<-8.5:32.5 #Vektor x
BN> y<-c(NA ,5.88 , 6.95 , 7.2 , 7.66 , 8.02 , 8.44 ,
9.06, 9.65, 10.22 ,
BN> 10.63 ,11.06, 11.37, 11.91 ,12.28, 12.69 ,13.07 ,
13.5 , 13.3 ,14.14 , NA , NA , NA , NA , NA) #Vektor y
BN> plot(y~x,col="green",pch=16,ylim=c(0,20),xlim=c(0,50))
BN>
(mod1<-nls(y~a+b*log(x,base=exp(1)),start=list(a=1,b=1),trace=TRUE))
This is a very *LINEAR* model.
Why don't you use lm()?
Then you'd even get your beloved R-squared ...
BN> xx<-seq(min(x),max(x),length=100)
BN> yy<-6.2456*log(xx)-7.7822
BN> lines(xx,yy,col="blue1")
BN> summary(mod1)
BN> This way I don't get R-squared like I do using the
command "lm" for linear
BN> models.
In general, R^2 is *NOT* easily defined for non-linear models.
R^2 is only defined if you have a nested sub-model, aka "null-model".
For linear models (*WITH* an intercept (!)), the sub-model is
naturally y ~ 1.
For general nonlinear models, the only simple sub-model is
'y ~ 0' which is often ridiculous to take as null-model, and
hence not taken by default.
More more on this, e.g. almost 7 years ago on R-help:
https://stat.ethz.ch/pipermail/r-help/2002-July/023461.html
Martin
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter <gunter.berton <at> gene.com> writes:
Martin's reply provides an appropriate response, so nothing to add. But my questions dig deeper: Why do so many (presumably nonstatisticians, but ?) belong to this R^2 religion? Is it because: 1) This is what they are taught in their Stat 101 courses by statisticians? 2) ... by "pseudo"statisticians in their own professions (no disrespect intended here -- just want to make a clear distinction)? 3) It's the prevailing culture of their discipline (journal requirements, part of their standard texts, etc.)?
Good point. Speaking from a clinical perspective: It is because many journals (British are the exception) ask medical reviewers to do the statistical reviewing within 5 minutes. They use the following formula to assess the quality of the paper (weights may vary): q(paper) = 10* n(pvalues) + 5*n(R^2) + 3.5*n(Error Bars) Values above 300 qualify for immediate acceptance, and Journals like Lancet, New English and British Journal of XXX provide professional advice. The first two are well known, the last is my special combat area. Glucose values measured every 2 minutes look like lice-comb, and nobody cares about the meaning. Dieter
Dieter Menne wrote:
Bert Gunter <gunter.berton <at> gene.com> writes:
Martin's reply provides an appropriate response, so nothing to add. But my
questions dig deeper: Why do so many (presumably nonstatisticians, but ?)
belong to this R^2 religion? Is it because:
1) This is what they are taught in their Stat 101 courses by statisticians?
2) ... by "pseudo"statisticians in their own professions (no disrespect
intended here -- just want to make a clear distinction)?
3) It's the prevailing culture of their discipline (journal requirements,
part of their standard texts, etc.)?
Good point. Speaking from a clinical perspective: It is because many journals (British are the exception) ask medical reviewers to do the statistical reviewing within 5 minutes. They use the following formula to assess the quality of the paper (weights may vary): q(paper) = 10* n(pvalues) + 5*n(R^2) + 3.5*n(Error Bars) Values above 300 qualify for immediate acceptance, and Journals like Lancet, New English and British Journal of XXX provide professional advice. The first two are well known, the last is my special combat area. Glucose values measured every 2 minutes look like lice-comb, and nobody cares about the meaning. Dieter
A very good reply, and the quality formula is probably too close to the truth to be funny. Some of the answers given to the people who petition the list for help seem to loftily ignore the fact that the petitioners are more concerned with getting their paper accepted or their salary paid or their dinner cooked than with the opinions of those not so motivated about the existential significance of R^2. They may, like your humble correspondent, be well aware of the failings of R^2, but be unable to conduct a just and noble campaign against the editor, boss or chef who demands it. I have just returned from a meeting in which the chief investigator was demanding more "user friendliness", despite the fact that this clever marketing ploy had turned much of her previous data to random numbers. I dunno, it beats me. Jim
Dieter Menne <dieter.menne <at> menne-biomed.de> writes:
q(paper) = 10* n(pvalues) + 5*n(R^2) + 3.5*n(Error Bars) Values above 300 qualify for immediate acceptance, and Journals like Lancet, New English and British Journal of XXX provide professional advice.
I noted the "and" was misleading. Read: Good journals like Lancet, New English and many British Journal of XXX really help you to do better. Dieter