From: John Fox
Dear Andy,
At the risk of muddying the waters (and certainly without wanting to
advocate the use of normality tests for residuals), I believe
that your
point #4 is subject to misinterpretation: That is, while it
is true that t-
and F-tests for regression coefficients in large sample retain their
validity well when the errors are non-normal, the efficiency of the LS
estimates can (depending upon the nature of the
non-normality) be seriously
compromised, not only absolutely but in relation to
alternatives (e.g.,
robust regression).
Regards,
John
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Liaw, Andy
Sent: Friday, October 15, 2004 11:55 AM
To: 'Federico Gherardini'; Berton Gunter
Cc: R-help mailing list
Subject: RE: [R] Testing for normality of residuals in a
regression model
Let's see if I can get my stat 101 straight:
We learned that linear regression has a set of assumptions:
1. Linearity of the relationship between X and y.
2. Independence of errors.
3. Homoscedasticity (equal error variance).
4. Normality of errors.
Now, we should ask: Why are they needed? Can we get away
with less? What if some of them are not met?
It should be clear why we need #1.
Without #2, I believe the least squares estimator is still
unbias, but the usual estimate of SEs for the coefficients
are wrong, so the t-tests are wrong.
Without #3, the coefficients are, again, still unbiased, but
not as efficient as can be. Interval estimates for the
prediction will surely be wrong.
Without #4, well, it depends. If the residual DF is
sufficiently large, the t-tests are still valid because of
CLT. You do need normality if you have small residual DF.
The problem with normality tests, I believe, is that they
usually have fairly low power at small sample sizes, so that
doesn't quite help. There's no free lunch: A normality test
with good power will usually have good power against a fairly
narrow class of alternatives, and almost no power against
others (directional test). How do you decide what to use?
Has anyone seen a data set where the normality test on the
residuals is crucial in coming up with appriate analysis?
Cheers,
Andy
From: Federico Gherardini
Berton Gunter wrote:
Exactly! My point is that normality tests are useless for
reasons that are beyond what I can take up here.
Thanks for your suggestions, I undesrtand that! Could you
give me some (not too complicated!) links so that I can
this matter further?
Cheers,
Federico
Hints: Balanced designs are
robust to non-normality; independence (especially
due to systematic effects), not normality is usually the
statistical problem; hypothesis tests will always reject
large -- so what!; "trust" refers to prediction validity
with study design and the validity/representativeness of
future.
I know that all the stats 101 tests say to test for
full of baloney!
Of course, this is "free" advice -- so caveat emptor!
Cheers,
Bert