lm without intercept
Hi, R actually uses a different formula for calculating the R square depending on whether the intercept is in the model or not. You may also find this discussion helpful: http://stats.stackexchange.com/questions/7948/when-is-it-ok-to-remove-the-intercept-in-lm/ If you conceptualize R^2 as the squared correlation between the oberserved and fitted values, it is easy to get: summary(m0 <- lm(mpg ~ 0 + disp, data = mtcars)) summary(m1 <- lm(mpg ~ disp, data = mtcars)) cor(mtcars$mpg, fitted(m0))^2 cor(mtcars$mpg, fitted(m1))^2 but that is not how R calculates R^2. Cheers, Josh
On Sat, Jul 28, 2012 at 10:40 AM, citynorman <citynorman at hotmail.com> wrote:
I've just picked up R (been using Matlab, Eviews etc) and I'm having the same
issue. Running reg=lm(ticker1~ticker2) gives R^2=50% while running
reg=lm(ticker1~0+ticker2) gives R^2=99%!! The charts suggest the fit is
worse not better and indeed Eviews/Excel/Matlab all say R^2=15% with
intercept=0. How come R calculates a totally different value?!
Call:
lm(formula = ticker1 ~ ticker2)
Residuals:
Min 1Q Median 3Q Max
-0.22441 -0.03380 0.01099 0.04891 0.16688
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.57062 0.08187 19.18 <2e-16 ***
ticker2 0.61722 0.02699 22.87 <2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 0.07754 on 530 degrees of freedom
Multiple R-squared: 0.4967, Adjusted R-squared: 0.4958
F-statistic: 523.1 on 1 and 530 DF, p-value: < 2.2e-16
Call:
lm(formula = ticker1 ~ 0 + ticker2)
Residuals:
Min 1Q Median 3Q Max
-0.270785 -0.069280 -0.007945 0.087340 0.268786
Coefficients:
Estimate Std. Error t value Pr(>|t|)
ticker2 1.134508 0.001441 787.2 <2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 0.1008 on 531 degrees of freedom
Multiple R-squared: 0.9991, Adjusted R-squared: 0.9991
F-statistic: 6.197e+05 on 1 and 531 DF, p-value: < 2.2e-16
Jan private wrote
Hi, thanks for your help. I'm beginning to understand things better.
If you plotted your data, you would realize that whether you fit the 'best' least squares model or one with a zero intercept, the fit is not going to be very good Do the data cluster tightly around the dashed line?
No, and that is why I asked the question. The plotted fit doesn't look
any better with or without intercept, so I was surprised that the
R-value etc. indicated an excellent regression (which I now understood
is the wrong interpretation).
One of the references you googled suggests that intercepts should never
be omitted. Is this true even if I know that the physical reality behind
the numbers suggests an intercept of zero?
Thanks,
Jan
______________________________________________ R-help@ mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- View this message in context: http://r.789695.n4.nabble.com/lm-without-intercept-tp3312429p4638204.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/