Assumptions for ANOVA: the right way to check the normality

Greg Snow · 2011-01-11T21:13:34Z

> From: Frodo Jedi [mailto:frodo.jedi at yahoo.com] > Sent: Monday, January 10, 2011 5:44 PM > To: Greg Snow > Cc: r-help at r-project.org > Subject: Re: [R] Assumptions for ANOVA: the right way to check the normality > > Dear Greg, > first of all thanks for your reply. And I add also many thanks to all of you guys who are helping me, sorry for the amount of questions I recently posted ;-) > > I don?t have a solid statistics background (I am not a statician) and I am basically learning every

Greg Snow

Tue, Jan 11, 2011 1:13 PM

Isn't there a single statistician anywhere in the University?  Does your committee have any experience with any of this?

A general run of anova procedures will produce multiple p-values addressing multiple null hypotheses addressing many different questions (often many of which are uninteresting).  Which terms are you really trying to test and which are included because you already know that they have an effect.

Are you including interactions because you find them actually interesting? Or just because that is what everyone else does?

[snip]

[imagine best Mom voice] and if everyone in your field jumped off a cliff . . .

Do you want to do what everyone else is doing, or something new and different?

What does your committee chair say about this?

Repeated measures are one type of random effect analysis, but random and mixed effects is more general than just repeated measures.

Statisticians developed those methods because they worked for simple cases, made some sense for more complicated cases, and they did not have anything that was both better and practical.  Now with modern computers we can see when those do work (unfortunately not as often as had been hoped) and what was once impractical is now much simpler (but inertia is to do it the old way, even though the people who developed the old way would have preferred to do it our way).  The article: 

Why Permutation Tests Are Superior to t and F Tests in Biomedical Research
John Ludbrook and Hugh Dudley
The American Statistician
Vol. 52, No. 2 (May, 1998), pp. 127-132

May be enlightening here (and give possible alternatives).

Also see: 
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2009q1/001819.html

for some simulation involving mixed models.  One shows that the normal theory works fine for that particular case, the next one shows a case where the normal theory does not work, then shows how to use simulation (parametric bootstrap) to get a more appropriate p-value.  You can adapt those examples for your own situation.

There is a function in the TeachingDemos package that will produce p-values if that is all your want, these are independent of any normality assumptions, independent of any data in fact.  However they don't really help with understanding.

Graphing the data (I think you have done this already) is the best route to understanding.  If you need more than that, then consider the following article:

     Buja, A., Cook, D. Hofmann, H., Lawrence, M. Lee, E.-K., Swayne,
     D.F and Wickham, H. (2009) Statistical Inference for exploratory
     data analysis and model diagnostics Phil. Trans. R. Soc. A 2009
     367, 4361-4383 doi: 10.1098/rsta.2009.0120

Some of the tests there are implemented in the vis.test function in the TeachingDemos package (you need to understand your null hypothesis and what you are testing).

George Box is often quoted as saying: "Essentially, all models are wrong, but some are useful."

So the question in not if they are wrong or not, but if they are useful (some of the other techniques mentioned may be more useful, or what you have done may be useful enough).

Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111

Assumptions for ANOVA: the right way to check the normality

Thread (5 messages)