normality test
On 29-Apr-05 roger bos wrote:
I looked carefully at ?shapiro.test and I did not see it state anywhere what the null hypothesis is or what a low p-value means. I understand that I can run the example "shapiro.test(rnorm(100, mean = 5, sd = 3))" and deduce from its p-value of 0.0988 that the null-hypothesis must be normality, but why can't the help page explicitly state what the null hypothesis is.
Hi Roger,
Well, the opening line is
Description:
Performs the Shapiro-Wilk test for normality.
which does pretty strongly suggest that the hypothesis being
tested by shapiro.test(X) is normality of the distribution of X.
It might be just a shade more unambiguous of it were worded
Performs the Shapiro-Wilk test of normality
or
Performs the Shapiro-Wilk test for non-normality.
since testing "for" something, like testing "for" contamination
tends to suggest testing for something exceptional, and testing
"for" contamination could equally be seen as a test "of" purity.
("Excuse me, sir. I just need to test your data for normality.
And you're in trouble if they are.")
But all that is on the very margin of semantic finesse!
I also understand that the help pages are not meant to "teach" statistics, but stating the null hypothesis doesn't seem very difficult given the already considerable amount of time that probably went into creating these otherwise very good help pages. Many people who use this software took stats classes 10 or more years ago and this stuff is easily forgotten. Students frequently have trouble keeping the null and alternative hypothesis straight. Just my $0.02.
I think there's a general approach in the help pages that users understand the basics of what the function is about, and it is there to specify what is necessary in order to get it to work correctly. One can take your point about stating explicitly what the null hypothesis of a test is, that it would be useful for people who are not sure about that sort of thing, and would advance their statistical understanding at the same time as their proficiency in R. However, while this might be feasible for simple matters like the null hypothesis being tested by a simple function like shapiro.test or t.test (which, by the way, does not even hint at what the null hypothesis might be: you have to infer it from the options available for the alternative hypothesis), it could get out of hand for tests applicable to more complex situations like ANOVA, mixed models, and so on. There is a dangert, if the hypotheis were to be spelled out, that the help page might become a small (or not so small) book on that aspect of statistics. A better place for such things is in documents like "Introductory Statistics with R" and so on. Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 29-Apr-05 Time: 17:54:19 ------------------------------ XFMail ------------------------------