Skip to content

correlation coefficient

6 messages · Benedikt Niesterok, Martin Maechler, Bert Gunter +2 more

#
Hello,
I would like to get a correlation coefficient (R-squared) for my model.
I don't know how to calculate it in R.
What I've done so far:

x<-8.5:32.5   #Vektor x
y<-c(NA ,5.88 , 6.95  , 7.2 , 7.66 , 8.02 , 8.44 , 9.06,  9.65, 10.22 ,
10.63 ,11.06, 11.37, 11.91 ,12.28, 12.69 ,13.07 , 13.5 , 13.3 ,14.14  ,  NA  ,  NA ,   NA  ,  NA  ,  NA) #Vektor y
plot(y~x,col="green",pch=16,ylim=c(0,20),xlim=c(0,50))

(mod1<-nls(y~a+b*log(x,base=exp(1)),start=list(a=1,b=1),trace=TRUE))
xx<-seq(min(x),max(x),length=100)
yy<-6.2456*log(xx)-7.7822
lines(xx,yy,col="blue1")
summary(mod1)

This way I don't get R-squared like I do using the command "lm" for linear
models.
Would appreciate your help,

Benedikt N.
--
#
BN> Hello,
    BN> I would like to get a correlation coefficient (R-squared) for my model.

{{ arrrgh... how many people think they "need" an R^2 when they
   	     fit a model ?? }}

    BN> I don't know how to calculate it in R.
    BN> What I've done so far:

    BN> x<-8.5:32.5   #Vektor x
    BN> y<-c(NA ,5.88 , 6.95  , 7.2 , 7.66 , 8.02 , 8.44 , 9.06,  9.65, 10.22 ,
    BN> 10.63 ,11.06, 11.37, 11.91 ,12.28, 12.69 ,13.07 , 13.5 , 13.3 ,14.14  ,  NA  ,  NA ,   NA  ,  NA  ,  NA) #Vektor y
    BN> plot(y~x,col="green",pch=16,ylim=c(0,20),xlim=c(0,50))

    BN> (mod1<-nls(y~a+b*log(x,base=exp(1)),start=list(a=1,b=1),trace=TRUE))

This is a very *LINEAR* model.
Why don't you use  lm()?

Then you'd even get your beloved R-squared ...

    BN> xx<-seq(min(x),max(x),length=100)
    BN> yy<-6.2456*log(xx)-7.7822
    BN> lines(xx,yy,col="blue1")
    BN> summary(mod1)

    BN> This way I don't get R-squared like I do using the command "lm" for linear
    BN> models.

In general,  R^2 is *NOT* easily defined for non-linear models.
R^2 is only defined if you have a nested sub-model, aka "null-model". 
For linear models (*WITH* an intercept (!)), the sub-model is
naturally  y ~ 1.
For general nonlinear models, the only simple sub-model is  
'y ~ 0' which is often ridiculous to take as null-model, and
hence not taken by default.

More more on this, e.g. almost 7 years ago on R-help:

  https://stat.ethz.ch/pipermail/r-help/2002-July/023461.html

Martin
#
Dear Colleagues:

Martin's reply provides an appropriate response, so nothing to add. But my
questions dig deeper: Why do so many (presumably nonstatisticians, but ?)
belong to this R^2 religion? Is it because:

1) This is what they are taught in their Stat 101 courses by statisticians?
2) ... by "pseudo"statisticians in their own professions (no disrespect
intended here -- just want to make a clear distinction)?
3) It's the prevailing culture of their discipline (journal requirements,
part of their standard texts, etc.)?
4) What all "standard" statistical textbooks say?
5) ... ?

Also, if one believes this religious practice is counterproductive, how
would one go about changing it?

FEEL FREE TO REPLY OFF-LIST, AS IT IS PROBABLY INAPPROPRIATE TO WASTE R-HELP
BANDWIDTH ON THIS. ALSO FEEL FREE TO REFER ME INSTEAD TO ANOTHER DISCUSSION
SITE (E.G. ON STATISTICAL TEACHING) WHERE THIS HORSE MAY HAVE ALREADY BEEN
FLAYED.

Thanks.

Bert Gunter
Genentech Nonclinical Biostatistics
#
Bert Gunter <gunter.berton <at> gene.com> writes:
Good point. Speaking from a clinical perspective: It is because many 
journals (British are the exception) ask medical reviewers to do the
statistical reviewing within 5 minutes. They use the following formula 
to assess the quality of the paper (weights may vary):

q(paper) = 10* n(pvalues) + 5*n(R^2) + 3.5*n(Error Bars)

Values above 300 qualify for immediate acceptance, and Journals
like Lancet, New English and British Journal of XXX provide
professional advice.

The first two are well known, the last is my special combat area.
Glucose values measured every 2 minutes look like lice-comb, and nobody
cares about the meaning.

Dieter
#
Dieter Menne wrote:
A very good reply, and the quality formula is probably too close to the 
truth to be funny. Some of the answers given to the people who petition 
the list for help seem to loftily ignore the fact that the petitioners 
are more concerned with getting their paper accepted or their salary 
paid or their dinner cooked than with the opinions of those not so 
motivated about the existential significance of R^2. They may, like your 
humble correspondent, be well aware of the failings of R^2, but be 
unable to conduct a just and noble campaign against the editor, boss or 
chef who demands it. I have just returned from a meeting in which the 
chief investigator was demanding more "user friendliness", despite the 
fact that this clever marketing ploy had turned much of her previous 
data to random numbers. I dunno, it beats me.

Jim
#
Dieter Menne <dieter.menne <at> menne-biomed.de> writes:
I noted the "and" was misleading. Read: Good journals like Lancet, 
New English and many British Journal of XXX really help you to do 
better.

Dieter