An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20050405/01884544/attachment.pl
two methods for regression, two different results
2 messages · John Sorkin, Jari Oksanen
On Tue, 2005-04-05 at 22:54 -0400, John Sorkin wrote:
Please forgive a straight stats question, and the informal notation. let us say we wish to perform a liner regression: y=b0 + b1*x + b2*z There are two ways this can be done, the usual way, as a single regression, fit1<-lm(y~x+z) or by doing two regressions. In the first regression we could have y as the dependent variable and x as the independent variable fit2<-lm(y~x). The second regrssion would be a regression in which the residuals from the first regression would be the depdendent variable, and the independent variable would be z. fit2<-lm(fit2$residuals~z) I would think the two methods would give the same p value and the same beta coefficient for z. The don't. Can someone help my understand why the two methods do not give the same results. Additionally, could someone tell me when one method might be better than the other, i.e. what question does the first method anwser, and what question does the second method answer. I have searched a number of textbooks and have not found this question addressed.
John, Bill Venables already told you that they don't do that, because they are not orthogonal. Here is a simpler way of getting the same result as he suggested for the coefficients of z (but only for z):
x <- runif(100) z <- x + rnorm(100, sd=0.4) y <- 3 + x + z + rnorm(100, sd=0.3) mod <- lm(y ~ x + z) mod2 <- lm(residuals(lm(y ~ x)) ~ x + z) summary(mod)
Call:
lm(formula = y ~ x + z)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.96436 0.06070 48.836 < 2e-16 ***
x 0.96272 0.11576 8.317 5.67e-13 ***
z 1.08922 0.06711 16.229 < 2e-16 ***
---
Residual standard error: 0.2978 on 97 degrees of freedom
summary(mod2)
Call:
lm(formula = residuals(lm(y ~ x)) ~ x + z)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.15731 0.06070 -2.592 0.0110 *
x -0.84459 0.11576 -7.296 8.13e-11 ***
z 1.08922 0.06711 16.229 < 2e-16 ***
---
Residual standard error: 0.2978 on 97 degrees of freedom
You can omit x from the outer lm only if x and z are orthogonal,
although you already removed the effect of x... In orthogonal case the
coefficient for x would be 0.
Residuals are equal in these two models:
range(residuals(mod) - residuals(mod2))
[1] -2.797242e-17 5.551115e-17 But, of course, fitted values are not equal, since you fit the mod2 to the residuals after removing the effect of x... cheers, jari oksanen
Jari Oksanen <jarioksa at sun3.oulu.fi>