Consider this code fragment: --------------------------------------------------------------------------- set.seed(42) x <- runif(20) y <- 2 + 3*x + rnorm(20) m1 <- lm(y ~ x) m2 <- lm(y ~ -1 + x) summary(m1) summary(m2) cor(y, fitted.values(m1))^2 cor(y, fitted.values(m2))^2 --------------------------------------------------------------------------- m1 is the true model and all is well. m2 is a false model, the intercept is truly 2 but it's been omitted. The R2 for m1 shows as 0.4953 while for m2 it shows 0.8983. I am aware that there are difficulties with standard formulas for R2 when there is no intercept. So the fact that the R2 of m2 is much higher (even though it's a wrong model) probably flows from that. What surprised me was that both correlations (between y and the fitted values of either m1 or m2) are identical. I am unable to understand how this could be. The estimated coefficient of x is quite different between the two cases. There must be an interesting theoretical angle to this. I would greatly appreciate some help in understanding this, and (more generally) in interpreting the R2 of regressions where the intercept is absent.
Ajay Shah http://www.mayin.org/ajayshah ajayshah at mayin.org http://ajayshahblog.blogspot.com <*(:-? - wizard who doesn't know the answer.