Skip to content

Correlation question

4 messages · Jonathan Thayn, Kehl Dániel, David L Carlson

#
I recently compared two different approaches to calculating the correlation of two variables, and I cannot explain the different results: 

data(cars)
model <- lm(dist~speed,data=cars)
coef(model)
fitted.right <- model$fitted
fitted.wrong <- -17+5*cars$speed


When using the OLS fitted values, the lines below all return the same R2 value:

1-sum((cars$dist-fitted.right)^2)/sum((cars$dist-mean(cars$dist))^2)
cor(cars$dist,fitted.right)^2
(sum((cars$dist-mean(cars$dist))*(fitted.right-mean(fitted.right)))/(49*sd(cars$dist)*sd(fitted.right)))^2


However, when I use my estimated parameters to find the fitted values, "fitted.wrong", the first equation returns a much lower R2 value, which I would expect since the fit is worse, but the other lines return the same R2 that I get when using the OLS fitted values.

1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2)
cor(x=cars$dist,y=fitted.wrong)^2
(sum((cars$dist-mean(cars$dist))*(fitted.wrong-mean(fitted.wrong)))/(49*sd(cars$dist)*sd(fitted.wrong)))^2


I'm sure I'm missing something simple, but can someone explain the difference between these two methods of finding R2? Thanks.

Jon
#
Hi,

try

cor(fitted.right,fitted.wrong)

should give 1 as both are a linear function of speed! Hence cor(cars$dist,fitted.right)^2 and cor(x=cars$dist,y=fitted.wrong)^2 must be the same.

HTH
d
________________________________________
Felad?: R-help [r-help-bounces at r-project.org] ; meghatalmaz&#243;: Jonathan Thayn [jthayn at ilstu.edu]
K?ldve: 2015. febru?r 21. 22:42
To: r-help at r-project.org
T?rgy: [R] Correlation question

I recently compared two different approaches to calculating the correlation of two variables, and I cannot explain the different results:

data(cars)
model <- lm(dist~speed,data=cars)
coef(model)
fitted.right <- model$fitted
fitted.wrong <- -17+5*cars$speed


When using the OLS fitted values, the lines below all return the same R2 value:

1-sum((cars$dist-fitted.right)^2)/sum((cars$dist-mean(cars$dist))^2)
cor(cars$dist,fitted.right)^2
(sum((cars$dist-mean(cars$dist))*(fitted.right-mean(fitted.right)))/(49*sd(cars$dist)*sd(fitted.right)))^2


However, when I use my estimated parameters to find the fitted values, "fitted.wrong", the first equation returns a much lower R2 value, which I would expect since the fit is worse, but the other lines return the same R2 that I get when using the OLS fitted values.

1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2)
cor(x=cars$dist,y=fitted.wrong)^2
(sum((cars$dist-mean(cars$dist))*(fitted.wrong-mean(fitted.wrong)))/(49*sd(cars$dist)*sd(fitted.wrong)))^2


I'm sure I'm missing something simple, but can someone explain the difference between these two methods of finding R2? Thanks.

Jon

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Of course! Thank you, I knew I was missing something painfully obvious. Its seems, then, that this line

1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2)

is finding something other than the traditional correlation. I found this in a lecture introducing correlation, but , now, I'm not sure what it is. It does do a better job of showing that the fitted.wrong variable is not a good prediction of the distance.
On Feb 21, 2015, at 4:36 PM, Kehl D?niel wrote:

            
#
As Kehl pointed out, any linear function of the independent variable (speed) will have the same squared correlation with the dependent variable (dist), but only one linear function minimizes the squared deviations between the fitted values and the original values. The equation you are using is only applicable to that function, not to any of the others. In fact, some linear functions will produce negative values:
fitted.new fitted.right fitted.wrong          
fitted.new    1.0000000    1.0000000    1.0000000 0.8068949
fitted.right  1.0000000    1.0000000    1.0000000 0.8068949
fitted.wrong  1.0000000    1.0000000    1.0000000 0.8068949
              0.8068949    0.8068949    0.8068949 1.0000000
[1] -3.281849

David L. Carlson
Department of Anthropology
Texas A&M University

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Jonathan Thayn
Sent: Sunday, February 22, 2015 12:01 AM
To: Kehl D?niel
Cc: r-help at r-project.org
Subject: Re: [R] Correlation question

Of course! Thank you, I knew I was missing something painfully obvious. Its seems, then, that this line

1-sum((cars$dist-fitted.wrong)^2)/sum((cars$dist-mean(cars$dist))^2)

is finding something other than the traditional correlation. I found this in a lecture introducing correlation, but , now, I'm not sure what it is. It does do a better job of showing that the fitted.wrong variable is not a good prediction of the distance.
On Feb 21, 2015, at 4:36 PM, Kehl D?niel wrote:

            
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.