If do a scattrplot of data ( x and y ) and there are two clouds of
points. One cloud is in the left
bottom corner of the plot and the other cloud is in the upper right.
If I fit a regression line to this data ( or equivalently , calculate a
correlation ), then obviously, it is going to seem like
x and y are related because a line has to be connected between the 2
clouds. But, there must be a regression assumption that
is violated here because if the regressions are done separately on each
cloud, then there really isn't
a relationship between x and y. I was just wondering 1) what assumption
in regression is being violated in
the first case or 2) possibly if the regression is valid and the results
just have some different interpreation ?
Thanks.
Mark
--------------------------------------------------------
This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
stat question - not R question so ignore if not interested
2 messages · Leeds, Mark (IED), Michael Kubovy
On Dec 5, 2006, at 3:42 PM, Leeds, Mark ((IED)) wrote:
If do a scattrplot of data ( x and y ) and there are two clouds of points. One cloud is in the left bottom corner of the plot and the other cloud is in the upper right. If I fit a regression line to this data ( or equivalently , calculate a correlation ), then obviously, it is going to seem like x and y are related because a line has to be connected between the 2 clouds. But, there must be a regression assumption that is violated here because if the regressions are done separately on each cloud, then there really isn't a relationship between x and y. I was just wondering 1) what assumption in regression is being violated in the first case or 2) possibly if the regression is valid and the results just have some different interpreation ?
One needs only to look at diagnostic plots:
Suppose
set.seed(2)
xy <- data.frame(y = c(rnorm(300), rnorm(300, 5)), x = c(rnorm(300),
rnorm(300, 5)))
op <- par(mfrow = c(2,2))
plot(lm(y ~ x, xy))
par(op)
The model does not fit well because the residuals aren't flat as a
function of fit and because homoscedasticity is violated.
When this happens we might try a different approach:
require(sm)
xy.sm <- sm.regression(xy$x, xy$y)
Whenever there's a big discrepancy between an OLS fit and a robust
one, we should not pursue the OLS one w/o reinterpretation, which
others have discussed in their replies.
_____________________________
Professor Michael Kubovy
University of Virginia
Department of Psychology
USPS: P.O.Box 400400 Charlottesville, VA 22904-4400
Parcels: Room 102 Gilmer Hall
McCormick Road Charlottesville, VA 22903
Office: B011 +1-434-982-4729
Lab: B019 +1-434-982-4751
Fax: +1-434-982-4766
WWW: http://www.people.virginia.edu/~mk9y/