I've just picked up R (been using Matlab, Eviews etc) and I'm having the same
issue. Running reg=lm(ticker1~ticker2) gives R^2=50% while running
reg=lm(ticker1~0+ticker2) gives R^2=99%!! The charts suggest the fit is
worse not better and indeed Eviews/Excel/Matlab all say R^2=15% with
intercept=0. How come R calculates a totally different value?!
Call:
lm(formula = ticker1 ~ ticker2)
Residuals:
Min 1Q Median 3Q Max
-0.22441 -0.03380 0.01099 0.04891 0.16688
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.57062 0.08187 19.18 <2e-16 ***
ticker2 0.61722 0.02699 22.87 <2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 0.07754 on 530 degrees of freedom
Multiple R-squared: 0.4967, Adjusted R-squared: 0.4958
F-statistic: 523.1 on 1 and 530 DF, p-value: < 2.2e-16
Call:
lm(formula = ticker1 ~ 0 + ticker2)
Residuals:
Min 1Q Median 3Q Max
-0.270785 -0.069280 -0.007945 0.087340 0.268786
Coefficients:
Estimate Std. Error t value Pr(>|t|)
ticker2 1.134508 0.001441 787.2 <2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 0.1008 on 531 degrees of freedom
Multiple R-squared: 0.9991, Adjusted R-squared: 0.9991
F-statistic: 6.197e+05 on 1 and 531 DF, p-value: < 2.2e-16
Jan private wrote
Hi,
thanks for your help. I'm beginning to understand things better.
If you plotted your data, you would realize that whether you fit the
'best' least squares model or one with a zero intercept, the fit is
not going to be very good
Do the data cluster tightly around the dashed line?
No, and that is why I asked the question. The plotted fit doesn't look
any better with or without intercept, so I was surprised that the
R-value etc. indicated an excellent regression (which I now understood
is the wrong interpretation).
One of the references you googled suggests that intercepts should never
be omitted. Is this true even if I know that the physical reality behind
the numbers suggests an intercept of zero?
Thanks,
Jan